Setting Up a Local LLM with Ollama on macOS

7/14/2025

Running large language models locally has become increasingly popular for developers and AI enthusiasts who want privacy, control, and offline access to powerful AI capabilities. Ollama makes this process remarkably straightforward on macOS, providing a simple command-line interface to download, run, and manage various open-source language models.

What is Ollama?

Ollama is an open-source tool that simplifies running large language models on your local machine. It handles model downloading, memory management, and provides a consistent API for interacting with different models like Llama 2, Code Llama, Mistral, and many others. Think of it as Docker for language models - it packages everything you need to run LLMs locally.

Hardware Requirements

Before diving into the setup, let's understand what hardware you'll need for optimal performance.

Minimum Requirements

Mac: Any Mac with Apple Silicon (M1, M2, M3, or M4) or Intel Mac with sufficient RAM
RAM: 8GB minimum (16GB recommended for better performance)
Storage: At least 10GB free space for models (some models require 20GB+)
macOS: macOS 11.0 (Big Sur) or later

Recommended Specifications

For the best experience running local LLMs, consider these specifications:

Apple Silicon Mac: M2 or M3 with at least 16GB unified memory
RAM: 32GB or more for running larger models smoothly
Storage: 50GB+ free SSD space for multiple models
Network: Fast internet connection for initial model downloads

Model Size Considerations

Different models have varying memory requirements:

7B models (like Llama 2 7B): ~4-8GB RAM
13B models: ~8-16GB RAM
34B models: ~20-40GB RAM
70B models: ~40-80GB RAM

Apple Silicon Macs with unified memory architecture perform exceptionally well for local LLM inference, often outperforming traditional GPU setups for this use case.

Installing Ollama

Method 1: Direct Download (Recommended)

Visit the official Ollama website at ollama.ai
Click the "Download" button for macOS
Once downloaded, open the .dmg file
Drag Ollama to your Applications folder
Launch Ollama from Applications

Method 2: Using Homebrew

If you prefer using Homebrew, you can install Ollama with:

brew install ollama

Method 3: Using curl

For a quick installation via terminal:

curl -fsSL https://ollama.ai/install.sh | sh

Initial Setup and Configuration

Once installed, Ollama runs as a background service. You can verify the installation by opening Terminal and running:

ollama --version

The Ollama service should start automatically, but you can manually start it with:

ollama serve

By default, Ollama stores models in ~/.ollama/models and runs on http://localhost:11434.

Running Your First Model

Downloading and Running Llama 2

Let's start with the popular Llama 2 7B model:

ollama pull llama2

This command downloads the model (approximately 3.8GB). Once complete, you can start chatting:

ollama run llama2

You'll see a prompt where you can start typing your questions or requests. Type /bye to exit the chat.

Exploring Available Models

Ollama supports numerous models. Here are some popular options:

# Code-focused models
ollama pull codellama
ollama pull deepseek-coder

# General purpose models
ollama pull mistral
ollama pull llama2:13b
ollama pull phi3

# Specialized models
ollama pull llava  # Vision-language model
ollama pull orca-mini

You can see all available models at ollama.ai/library.

Model Variants

Many models come in different sizes. You can specify the variant:

ollama pull llama2:7b      # 7 billion parameters
ollama pull llama2:13b     # 13 billion parameters
ollama pull llama2:70b     # 70 billion parameters

Managing Models

Listing Installed Models

ollama list

Removing Models

To free up space, you can remove models you no longer need:

ollama rm llama2:13b

Updating Models

Models are occasionally updated. To get the latest version:

ollama pull llama2  # Re-downloads if newer version available

Using the API

Ollama provides a REST API that you can use in your applications. Here's a simple example using curl:

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Python Example

import requests
import json

def chat_with_ollama(prompt, model="llama2"):
    url = "http://localhost:11434/api/generate"
    data = {
        "model": model,
        "prompt": prompt,
        "stream": False
    }
    
    response = requests.post(url, json=data)
    return response.json()["response"]

# Example usage
answer = chat_with_ollama("Explain quantum computing in simple terms")
print(answer)

Performance Optimization

Memory Management

Ollama automatically manages memory, but you can optimize performance:

Close unnecessary applications before running large models
Use appropriate model sizes for your hardware
Monitor Activity Monitor to track memory usage

Model Selection

Choose models based on your needs and hardware:

For coding: CodeLlama, DeepSeek-Coder
For general chat: Llama 2, Mistral, Phi-3
For vision tasks: LLaVA
For speed: Smaller 7B models
For quality: Larger 13B+ models (if hardware allows)

Troubleshooting Common Issues

Ollama Service Won't Start

If Ollama isn't responding:

# Check if service is running
ps aux | grep ollama

# Restart the service
ollama serve

Out of Memory Errors

If you encounter memory issues:

Try a smaller model variant
Close other applications
Restart your Mac to free up memory
Consider upgrading your hardware

Slow Performance

To improve performance:

Ensure you're using Apple Silicon if possible
Close unnecessary background applications
Use SSD storage for model files
Try smaller, more efficient models

Model Download Fails

If downloads are interrupted:

# Remove incomplete download and retry
ollama rm model_name
ollama pull model_name

Advanced Usage

Custom Model Files

You can create custom model configurations using Modelfiles:

# Create a custom model with specific parameters
echo 'FROM llama2
PARAMETER temperature 0.8
PARAMETER num_ctx 4096
SYSTEM "You are a helpful coding assistant."' > Modelfile

ollama create my-coding-assistant -f Modelfile
ollama run my-coding-assistant

Integration with Development Tools

Ollama integrates well with various development tools:

VS Code: Use extensions like "Ollama" for code completion
Cursor: Configure to use local Ollama models
Shell scripts: Automate tasks using the API

Privacy and Security Benefits

Running LLMs locally with Ollama offers several advantages:

Complete privacy: Your data never leaves your machine
No internet dependency: Work offline once models are downloaded
No API costs: No per-token charges or subscription fees
Full control: Customize models and parameters as needed
Compliance: Easier to meet data governance requirements

Conclusion

Ollama makes running powerful language models locally on macOS incredibly accessible. Whether you're a developer looking to integrate AI into your applications, a researcher experimenting with different models, or simply someone who values privacy and control over their AI interactions, Ollama provides an excellent solution.

The combination of Apple Silicon's unified memory architecture and Ollama's efficient model management creates a powerful platform for local AI development. Start with smaller models like Llama 2 7B to get familiar with the system, then experiment with larger models as your needs and hardware allow.

With the rapid pace of open-source AI development, new and improved models are constantly being released. Ollama's simple interface makes it easy to stay current with the latest developments while maintaining complete control over your AI infrastructure.

Remember to monitor your system resources, choose appropriate models for your hardware, and enjoy the freedom and privacy that comes with running AI models locally on your Mac.