Benchmarking Ollama on Various Older Machines
For professional application software developers and AI Engineers, it’s important to explore running various Large Language Models (LLMs) on a typical developer machine. Ollama is a state-of-the-art AI and machine learning tool, which allows you to run various open-source LLMs on your laptop without having to have Internet or Cloud connectivity. Using Ollama, you can simply turn any machine (regardless of operating system, MacOS, Windows, or Linux) into a model-inference machine. However, each machine with its compute, memory and GPU capabilities will behave differently while running models.
This blog post provides a benchmarking analysis of Ollama, specifically focusing on its performance across various older machines. By understanding how Ollama performs under different hardware conditions, businesses and developers can make informed decisions about integrating this tool into their workflows, even with older or less powerful hardware.
Benchmarking Methodology
For all tests, we used a typical question through the Chat Completions interface in the command line. We used Meta’s Llama3 as the LLM. Specifically the 8 billion parameter model quantized to Q4_0. We turned on —-verbose
to see the inference statistics:
echo "why is the sky blue?" | ollama run llama3 --verbose
Test Environment and Setup
To ensure a comprehensive evaluation, we conducted our benchmarks on a variety of older hardware configurations, including machines with different CPUs, memory capacities, and GPU capabilities. Here are the details of the test environment:
Lenovo Laptop (2010s era):
CPU: Intel Core i5 2.5GHz
RAM: 4GB
GPU: None
Operating System: Ubuntu Linux
Mac Mini (2010s era):
CPU: Intel Core i5-4278U 2.60GHz
RAM: 8GB
GPU: None
Operating System: Ubuntu Linux
Mac M1 Max (2020):
CPU: Apple M1 Max
RAM: 32GB
GPU: Integrated
Operating System: MacOS
Results
Based on the results, it is evident that the performance of Ollama varies significantly across different hardware configurations. Machines with GPU capabilities and an ample amount of RAM showed improved processing speeds. However, even on less powerful machines, Ollama managed to run the Llama3 model, demonstrating its versatility and efficiency.
Additional Resources
For more information on Ollama and detailed documentation, visit Ollama’s official website. To explore use-cases leveraging on-prem, behind firewall, local Large Language Models please get in touch with our engineering team here.