tashfeenahmed/freellmapi

591

+78/day

109

TypeScript

OpenAI-compatible proxy that aggregates free-tier keys from ~14 AI providers with automatic failover. For personal experimentation only.

From the README

FreeLLMAPI

One OpenAI-compatible endpoint. Fourteen free LLM providers. ~1.3B+ tokens per month.

Aggregate the free tiers from Google, Groq, Cerebras, SambaNova, NVIDIA, Mistral, OpenRouter, GitHub Models, Hugging Face, Cohere, Cloudflare, Zhipu, Moonshot, and MiniMax behind a single /v1/chat/completions endpoint. Keys are stored encrypted. A router picks the best available model for each request, falls over to the next provider when one is rate-limited, and tracks per-key usage so you stay under every free-tier cap.

Why this exists
Supported providers
Features
Not yet supported
Quick start
Using the API
Screenshots
How it works
Limitations
Contributing
Terms of Service review
Disclaimer

Why this exists

Every serious AI lab now offers a free tier — a few million tokens a month, a few thousand requests a day. On its own each tier is a toy. Stacked together, they add up to roughly 1.3 billion tokens per month of working inference capacity, across dozens of models from small-and-fast to reasonably capable.

The problem is that stacking them by hand is painful: fourteen different SDKs, fourteen different rate limits, fourteen places a request can fail. FreeLLMAPI collapses that into one OpenAI-compatible endpoint. Point any OpenAI client library at your local server, and it routes transparently across whichever providers you've added keys for.

Supported providers

GoogleGemini 2.5 Pro / Flash GroqLlama 4, Qwen, Kimi CerebrasLlama 3.3, Qwen SambaNovaLlama 3.3 70B

NVIDIANIM catalog MistralLa Plateforme OpenRouterFree-tier models GitHub ModelsGPT-4o, Llama, Phi

Hugging FaceInference Providers CohereCommand R+ (trial) CloudflareWorkers AI ZhipuGLM-4 series

MoonshotKimi MiniMaxabab / hailuo Adding another? See Contributing.

Features

OpenAI-compatible — POST /v1/chat/completions and GET /v1/models work with the official OpenAI SDKs and any OpenAI-compatible client (LangChain, LlamaIndex, Continue, Hermes, etc.). Just change base_url.
Streaming and non-streaming — Server-Sent Events for `s