In today’s landscape of privacy, speed, and cost-conscious development, running large language models (LLMs) locally is becoming essential for both enthusiasts and professionals. Whether you’re working on Mac, Windows, or Linux, this straightforward guide will show you how to set up Ollama—an intuitive platform that allows you to deploy and run powerful LLMs offline and for free. Discover the most popular models, their capabilities, and how to seamlessly integrate them with AI tools via Ollama’s API. Let’s unlock the potential of your own local AI hub!
What Is Ollama and Why Run LLMs Locally?
What Is Ollama?
Ollama is a user-friendly platform designed to simplify local deployment of large language models. It offers a clean graphical interface alongside a model marketplace, making it effortless to find, download, and manage models on your machine. Built for both casual hobbyists and serious developers, Ollama streamlines the process, removing the need for complex setup or cloud dependency. But most importantly, it is completely free to use.
Why Should You Run LLMs Locally?
- Privacy & Data Security: Keep your sensitive data on your device; no third-party servers involved.
- Offline Accessibility: Work anywhere—even in remote locations or low-bandwidth environments.
- Cost Efficiency: No subscription fees—just the cost of running your computer.
Popular Ollama Models and Hardware Considerations
Commonly Used Models Available in Ollama
Model Name | Parameters | Download Size | Context Size |
---|---|---|---|
llama3.1:8b | 8 billion | 4.9 GB | 128k |
deepseek-r1:8b | 8 billion | 4.9 GB | 128k |
mistral:7b | 7 billion | 4.1 GB | 32k |
gemma3:12b | 12 billion | 8.1 GB | 128k |
Note: the parameter sizes shown above are a demonstration of a reasonable sized model for running locally. Other options exist, so select a model name in the table above to see which other options might suit your use case better.
Hardware Requirements
Running these models requires varying machine resources such as processing power, RAM and storage. Generally, the smaller the number of parameters, the lower the resources required, but lower numbers of parameters tend to hallucinate more and get easily confused. Ollama will inform you if the model you’ve chosen will not run on your machine. The models are very RAM heavy, so the more RAM you have, the better. If Ollama detects that you have GPU resources available, it will use them automatically as these are the most efficient way to run an LLM.
Installing Ollama on Your System
On Mac
Visit Ollama’s website and download the macOS installer.
Open the downloaded file and follow the installation prompts.
Alternatively, for CLI enthusiasts, install via Homebrew:
brew install ollama
On Windows
- Download the Windows installer from Ollama’s site.
- Run the installer and follow the setup wizard.
- For advanced users, consider installing Windows Subsystem for Linux (WSL) and running Ollama within Linux environments.
On Linux
Run the command shown on the Ollama website:
curl -fsSL https://ollama.com/install.sh | sh
Verify installation:
ollama --help
Getting a Model Up and Running
Post-Installation: Browsing & Downloading Models
- Launch your terminal.
- Search Ollama’s model listing to find the model you want to download.
- Download the model by specifying the name in a command such as that shown:
ollama pull llama3.1:8b
Running a Model
Start interacting with the LLM via the terminal:
ollama run llama3.1:8b
This opens an interactive prompt:
> What is the capital of France?
Paris
Responses will be displayed inline to allow you to try different models before committing to one for a development project.
Using Ollama’s API for Seamless AI Integration
Overview of the API
Ollama exposes a local API to facilitate automation and integration with tools like AutoGen, CrewAI, and LangGraph. This turns your machine into a powerful AI development hub.
Ollama should already be running on your machine, and if so you can visit http://localhost:11434 in your local browser and it will tell you “Ollama is running”. If it turns out to not be running, you can start it in a terminal using the following command:
ollama serve
Basic API Usage
You can send requests to the API using a simple curl
command:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "Tell me a joke"
}'
The above streams output tokens one at a time. If you want to get the whole response at once, you can turn off streaming in your request:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "Tell me a joke",
"stream": false
}'
The API response could then be used for embedding or inclusion in a response to a user in your own applications.
Automate & Extend
Integrate the API into scripts or frameworks. An example usage in Python:
import requests
response = requests.post(
"http://localhost:11434/api/generate",
json={"model": "llama3.1:8b", "prompt": "Write a short story.", "stream": False}
)
print(response.json()['response'])
This opens automation possibilities, enabling complex workflows and AI-powered applications locally.
Wrapping Up
In this guide, you’ve learned how to set up Ollama on Mac, Windows, or Linux to run powerful LLMs locally. You now understand the model options, resource requirements, and how to operate and automate them via Ollama’s API. Running models locally offers unmatched privacy, offline accessibility, and cost savings—ideal for experimentation and deployment.
And the best part? It’s completely free! Take control of your AI environment to build flexible applications in a safe local environment.