nicedreamzapp/claude-code-local

217

+50/day

Python

Run Claude Code with local AI on Apple Silicon. 122B model at 41 tok/s with Google TurboQuant. No cloud, no API fees.

From the README

🧠 Claude Code Local

Run a 122 billion parameter AI on your MacBook.No cloud. No fees. No data leaves your machine.

🤔 What Is This?

Your MacBook has a powerful GPU built right into the chip. This project uses that GPU to run a massive AI model — the same kind that powers ChatGPT and Claude — entirely on your computer.

🚫 No internet needed 💰 No monthly subscription 🔒 No one sees your code or data ✅ Full Claude Code experience — write code, edit files, manage projects, control your browser

         📱 You (Mac or Phone)
          │
     🤖 Claude Code         ← the AI coding tool you know
          │
     ⚡ MLX Native Server    ← our server (200 lines of Python)
          │
     🧠 Qwen 3.5 122B       ← 122 billion parameter brain
          │
     🖥️ Apple Silicon GPU    ← your M-series chip does all the work

📱 Control From Your Phone

You don't have to be at your Mac to use this. We built a remote control pipeline:

📱 Your iPhone                    💻 Your Mac
     │                                │
     │── iMessage ──────────────────>│
     │                                │── Claude Code
     │                                │── MLX Server
     │                                │── Qwen 3.5 122B
     │                                │── (does the work)
     │ 💡 **Pro tip:** Anthropic's Dispatch doesn't read your CLAUDE.md. Mention it in your message or it'll miss your custom setup. Our iMessage system doesn't have this problem.

## 📊 Benchmarks

We built and tested three different approaches. Each one got faster.

### ⚡ Speed Comparison

                     Tokens per Second

🐌 Ollama (Gen 1) ██████████████████████████████ 30 tok/s 🏃 llama.cpp (Gen 2) █████████████████████████████████████████ 41 tok/s 🚀 MLX Native (Gen 3) ████████████████████████████████████████████████████████████████ 65 tok/s


### ⏱️ Real-World Claude Code Task

How long to ask Claude Code to write a function:

😴 Ollama + Proxy ████████████████████████████████████████████ 133 seconds 😐 llama.cpp + Proxy ████████████████████████████████████████████ 133 seconds 🔥 MLX Native (no proxy) ██████ 17.6 seconds

                          7.5x faster ⚡


### 📋 Side-by-Side

| | 🐌 Ollama | 🏃 llama.cpp + TurboQuant | 🚀 **MLX Native (ours)** |
|---|:---:|:---:|:---:|
| **Speed** | 30 tok/s | 41 tok/s | **65 tok/s** |
| **Claude Code task** | 133s | 133s | **17.6s** |
| **Needs a proxy?** | ❌ Yes | ❌ Yes | ✅ **No** |
| **Lines of code** | N/A | N/A (C++ fork) | **~200 Python** |
| **Apple native?** | ❌ Generic | ❌ Ported | ✅ **MLX** |

### ☁️ vs Cloud APIs

| | 🖥️ **Our Local Setup** | ☁️ Claude Sonnet | ☁️ Claude Opus |
|---|:---:|:---:|:---:|
| Speed | 65 tok/s | ~80 tok/

View on GitHub