Back to feed

nicedreamzapp/claude-code-local

nicedreamzapp/claude-code-local
217
+50/day
20
Python

Run Claude Code with local AI on Apple Silicon. 122B model at 41 tok/s with Google TurboQuant. No cloud, no API fees.

From the README

🧠 Claude Code Local

Run a 122 billion parameter AI on your MacBook.No cloud. No fees. No data leaves your machine.





šŸ¤” What Is This?

Your MacBook has a powerful GPU built right into the chip. This project uses that GPU to run a massive AI model — the same kind that powers ChatGPT and Claude — entirely on your computer.

🚫 No internet needed šŸ’° No monthly subscription šŸ”’ No one sees your code or data āœ… Full Claude Code experience — write code, edit files, manage projects, control your browser

         šŸ“± You (Mac or Phone)
          │
     šŸ¤– Claude Code         ← the AI coding tool you know
          │
     ⚔ MLX Native Server    ← our server (200 lines of Python)
          │
     🧠 Qwen 3.5 122B       ← 122 billion parameter brain
          │
     šŸ–„ļø Apple Silicon GPU    ← your M-series chip does all the work

šŸ“± Control From Your Phone

You don't have to be at your Mac to use this. We built a remote control pipeline:

šŸ“± Your iPhone                    šŸ’» Your Mac
     │                                │
     │── iMessage ──────────────────>│
     │                                │── Claude Code
     │                                │── MLX Server
     │                                │── Qwen 3.5 122B
     │                                │── (does the work)
     │ šŸ’” **Pro tip:** Anthropic's Dispatch doesn't read your CLAUDE.md. Mention it in your message or it'll miss your custom setup. Our iMessage system doesn't have this problem.

## šŸ“Š Benchmarks

We built and tested three different approaches. Each one got faster.

### ⚔ Speed Comparison

                     Tokens per Second

🐌 Ollama (Gen 1) ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ 30 tok/s šŸƒ llama.cpp (Gen 2) ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ 41 tok/s šŸš€ MLX Native (Gen 3) ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ 65 tok/s


### ā±ļø Real-World Claude Code Task

How long to ask Claude Code to write a function:

😓 Ollama + Proxy ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ 133 seconds 😐 llama.cpp + Proxy ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ 133 seconds šŸ”„ MLX Native (no proxy) ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ 17.6 seconds

                          7.5x faster ⚔

### šŸ“‹ Side-by-Side

| | 🐌 Ollama | šŸƒ llama.cpp + TurboQuant | šŸš€ **MLX Native (ours)** |
|---|:---:|:---:|:---:|
| **Speed** | 30 tok/s | 41 tok/s | **65 tok/s** |
| **Claude Code task** | 133s | 133s | **17.6s** |
| **Needs a proxy?** | āŒ Yes | āŒ Yes | āœ… **No** |
| **Lines of code** | N/A | N/A (C++ fork) | **~200 Python** |
| **Apple native?** | āŒ Generic | āŒ Ported | āœ… **MLX** |

### ā˜ļø vs Cloud APIs

| | šŸ–„ļø **Our Local Setup** | ā˜ļø Claude Sonnet | ā˜ļø Claude Opus |
|---|:---:|:---:|:---:|
| Speed | 65 tok/s | ~80 tok/