Benchmarking Ollama on Various Older Machines

For professional application software developers and AI Engineers, it’s important to explore running various Large Language Models (LLMs) on a typical developer machine. Ollama is a state-of-the-art AI and machine learning tool, which allows you to run various open-source LLMs on your laptop without having to have Internet or Cloud connectivity. Using Ollama, you can simply turn any machine (regardless of operating system, MacOS, Windows, or Linux) into a model-inference machine. However, each machine with its compute, memory and GPU capabilities will behave differently while running models.

This blog post provides a benchmarking analysis of Ollama, specifically focusing on its performance across various older machines. By understanding how Ollama performs under different hardware conditions, businesses and developers can make informed decisions about integrating this tool into their workflows, even with older or less powerful hardware.

Benchmarking Methodology

For all tests, we used a typical question through the Chat Completions interface in the command line. We used Meta’s Llama3 as the LLM. Specifically the 8 billion parameter model quantized to Q4_0. We turned on —-verbose to see the inference statistics:

echo "why is the sky blue?" | ollama run llama3 --verbose

Test Environment and Setup

To ensure a comprehensive evaluation, we conducted our benchmarks on a variety of older hardware configurations, including machines with different CPUs, memory capacities, and GPU capabilities. Here are the details of the test environment:

  1. Lenovo Laptop (2010s era):

    • CPU: Intel Core i5 2.5GHz

    • RAM: 4GB

    • GPU: None

    • Operating System: Ubuntu Linux

  2. Mac Mini (2010s era):

    • CPU: Intel Core i5-4278U 2.60GHz

    • RAM: 8GB

    • GPU: None

    • Operating System: Ubuntu Linux

  3. Mac M1 Max (2020):

    • CPU: Apple M1 Max

    • RAM: 32GB

    • GPU: Integrated

    • Operating System: MacOS

Results

Benchmarking Llama3:8b on legacy hardware using Ollama

Based on the results, it is evident that the performance of Ollama varies significantly across different hardware configurations. Machines with GPU capabilities and an ample amount of RAM showed improved processing speeds. However, even on less powerful machines, Ollama managed to run the Llama3 model, demonstrating its versatility and efficiency.

Additional Resources

For more information on Ollama and detailed documentation, visit Ollama’s official website. To explore use-cases leveraging on-prem, behind firewall, local Large Language Models please get in touch with our engineering team here.

Previous
Previous

Building Content Safety into Burgundy, a Blog Writing Copilot

Next
Next

Essential Software Tools for Search Funders - While Searching and Post-Acquisition