Run Claude Code
100% Free & Locally
No API bill. No usage cap. No cloud. Your code never leaves your machine.
// In this guide
The Problem With Cloud-Dependent AI
Claude Code is one of the most capable AI coding assistants available — but the default setup comes with a recurring API cost and a hard dependency on Anthropic’s servers. Every autocomplete, every refactor, every “explain this function” call racks up tokens.
What if you could flip that model entirely? Run the AI brain locally. Keep your proprietary code off external servers. Pay nothing. That’s exactly what this guide shows you.
What We’re Building
A local pipeline where Claude Code’s interface talks to a locally-running Ollama server instead of Anthropic’s cloud. You get the familiar Claude Code UX powered by a local model you control.
Local vs Cloud: The Real Difference
| Feature | Cloud (Default) | Local (This Guide) |
|---|---|---|
| Cost | Pay-per-token | $0 |
| Privacy | Code sent to servers | Never leaves machine |
| Usage Limits | Rate limited | Unlimited |
| Internet Required | Always | Never |
| Model Quality | Claude 3.5+ | Depends on your GPU |
Figure 1 — Claude Code running locally with qwen3-coder model
Ollama is a lightweight, open-source application that lets your computer run large language models locally — no Docker, no Python environment complexity, no manual CUDA configuration. It abstracts all of that away into a single install.
Think of Ollama as the “server” that Claude Code will talk to. Instead of sending requests to Anthropic’s API, Claude Code will send them to http://localhost:11434 — a private server running entirely on your machine.
Download & Install
Head to ollama.com and download the installer for your platform. The installer handles everything including adding Ollama to your system PATH automatically.
# Linux alternativecurl -fsSL https://ollama.com/install.sh | sh
Figure 2 — Download Ollama from ollama.com for macOS or Windows
Verify the Installation
Once installed, Ollama runs as a background service. Verify it’s alive by navigating to http://localhost:11434 — you should see a plain-text confirmation that Ollama is running.
ollama --version# → ollama version 0.x.xcurl http://localhost:11434# → Ollama is running
Pro Tip
Ollama starts automatically on login and sits quietly in your system tray (macOS) or taskbar (Windows). You never need to manually start it again.
For coding tasks you want a model purpose-built for code generation and instruction-following. The right choice depends entirely on your hardware.
qwen3-coder:30b
Excellent reasoning, large context. Needs 16GB+ VRAM or M2 Pro+.
qwen2.5-coder:7b
Great balance of speed vs quality. Runs well on 8GB RAM/VRAM.
gemma:2b
Fastest option, works purely on CPU. Always responsive.
Tool/Function Call Support Required
Claude Code needs a model that supports tool calls. Always verify this before committing to a model. The qwen2.5-coder and qwen3-coder families both handle this reliably.
Download Your Model
# Recommended for most usersollama pull qwen2.5-coder:7b# High-end machines (16GB+ VRAM)ollama pull qwen3-coder:30b# CPU-only machinesollama pull gemma:2b
Figure 3 — Running ollama pull to download a coding model
This is the most important step. By default, Claude Code is hard-wired to talk to Anthropic’s API. We override that by setting environment variables that tell it to use your local Ollama server instead.
Ollama exposes an OpenAI-compatible API endpoint, so Claude Code can communicate with it seamlessly — you just need to set the correct base URL.
Set the Environment Variable
# Redirect Claude Code to your local Ollama instanceexport ANTHROPIC_BASE_URL=http://localhost:11434/v1export ANTHROPIC_API_KEY=ollama# To make permanent, add both lines to ~/.zshrc or ~/.bashrc
Why “ollama” as the API key?
Claude Code requires an API key field to be non-empty, but a local Ollama server has no real authentication. Setting it to any string (like “ollama”) satisfies the requirement without sending credentials anywhere.
Figure 4 — Setting the base URL to redirect Claude Code to your local Ollama server
Persist the Configuration
To avoid re-exporting these variables every session, add them permanently to your shell config file:
# For zsh (macOS default)echo 'export ANTHROPIC_BASE_URL=http://localhost:11434/v1' >> ~/.zshrcecho 'export ANTHROPIC_API_KEY=ollama' >> ~/.zshrcsource ~/.zshrc# For bash (Linux / Windows WSL)# Replace ~/.zshrc with ~/.bashrc above
Everything is in place. Navigate to any project directory and start Claude Code, passing the local model name with the --model flag.
# Navigate to your projectcd ~/your-project# Start Claude Code with your chosen local modelclaude --model qwen2.5-coder:7b# Or the 30B model for more powerful machinesclaude --model qwen3-coder:30b
Figure 5 — Claude Code initialised and running with a local model
Try Your First Prompt
Once Claude Code starts, try these to verify everything is wired up correctly:
# Basic smoke test"Make a hello world website"# Real-world test"Create a REST API endpoint for user authentication in Node.js"# Refactor test"Refactor this function to handle edge cases and add JSDoc"
You’ll see Claude Code read your files, propose edits, execute terminal commands — all the usual behaviour. The difference: every inference call hits localhost, not the cloud.
Figure 6 — Claude Code autonomously editing files and running commands, entirely offline
What Claude Code Can Do Locally
Browse your file tree, read source files, write and edit code, run terminal commands, install packages, and iterate on your requests — all without touching the internet or incurring a single API charge.
You’re Now Running
AI at Zero Cost
No subscription. No usage anxiety. No proprietary code leaving your machine. A genuine alternative to cloud-dependent AI tooling.
ollama.com · github.com/anthropics/claude-code


