tashfeenahmed/freellmapi
tashfeenahmed/freellmapiOpenAI-compatible proxy that aggregates free-tier keys from ~14 AI providers with automatic failover. For personal experimentation only.
From the README
FreeLLMAPI
One OpenAI-compatible endpoint. Fourteen free LLM providers. ~1.3B+ tokens per month.
Aggregate the free tiers from Google, Groq, Cerebras, SambaNova, NVIDIA, Mistral, OpenRouter, GitHub Models, Hugging Face, Cohere, Cloudflare, Zhipu, Moonshot, and MiniMax behind a single /v1/chat/completions endpoint. Keys are stored encrypted. A router picks the best available model for each request, falls over to the next provider when one is rate-limited, and tracks per-key usage so you stay under every free-tier cap.
Contents
- Why this exists
- Supported providers
- Features
- Not yet supported
- Quick start
- Using the API
- Screenshots
- How it works
- Limitations
- Contributing
- Terms of Service review
- Disclaimer
Why this exists
Every serious AI lab now offers a free tier — a few million tokens a month, a few thousand requests a day. On its own each tier is a toy. Stacked together, they add up to roughly 1.3 billion tokens per month of working inference capacity, across dozens of models from small-and-fast to reasonably capable.
The problem is that stacking them by hand is painful: fourteen different SDKs, fourteen different rate limits, fourteen places a request can fail. FreeLLMAPI collapses that into one OpenAI-compatible endpoint. Point any OpenAI client library at your local server, and it routes transparently across whichever providers you've added keys for.
Supported providers
GoogleGemini 2.5 Pro / Flash GroqLlama 4, Qwen, Kimi CerebrasLlama 3.3, Qwen SambaNovaLlama 3.3 70B
NVIDIANIM catalog MistralLa Plateforme OpenRouterFree-tier models GitHub ModelsGPT-4o, Llama, Phi
Hugging FaceInference Providers CohereCommand R+ (trial) CloudflareWorkers AI ZhipuGLM-4 series
MoonshotKimi MiniMaxabab / hailuo Adding another? See Contributing.
Features
- OpenAI-compatible —
POST /v1/chat/completionsandGET /v1/modelswork with the official OpenAI SDKs and any OpenAI-compatible client (LangChain, LlamaIndex, Continue, Hermes, etc.). Just changebase_url. - Streaming and non-streaming — Server-Sent Events for `s