Multi-LLM API Prompt Tester

Provider: ⓘ

Providers listed include capable free tiers. Paid providers may also be included.

API Key:

Enter API keys for all providers here. Keys are stored in memory only and cleared on page refresh. The active provider's key also appears in the main field above and stays in sync.

Keys saved as unencrypted JSON — do not share

Model: ⓘ

Temperature: ⓘ

Max Tokens: ⓘ

Prompt: ⓘ

🔄 Retry up to times ⓘ

— on retriable errors (rate limit, timeout, server error), waits 2 s / 4 s / 8 s… before each retry (exponential backoff, max 5 retries)

⛓️ Enable Auto-Fallback ⓘ (Sort)

— on failure, automatically try the next model or provider in order

🔁 Loop chain up to times before stopping

— after all models tried, waits 60 s then restarts from top of chain; runs up to the set maximum — API keys can be added or removed between loops

🔒 Privacy Mode

— clears prompt after sending; privacy headers sent where supported

Testing provider…

Tests send a minimal prompt to each model. A 1.5 s delay between models helps avoid rate-limit errors.

Requests remaining: —

Tokens remaining: —

Per-minute window · Live from API headers

API Response:

Awaiting input...

Provider Reference & Fallback Chain

Sort:

📋 Output Status Reference

Symbol	Meaning	Retriable?
✅	Success — response received	—
🚫	Auth error (401/403) — API key invalid or not authorised	Never — retrying won't fix an invalid key
🔄	Rate limited (429/503) — quota exceeded	Yes, if Retry is enabled
❌	Client error (other 4xx) — request invalid or model unavailable	Never — retrying would produce the same result
⚠️	Server error (5xx) or network/connection failure — provider-side or connection issue, may be transient	Yes, if Retry is enabled
⏱	Timeout — no response received within 60 s	Yes, if Retry is enabled

Context modifier	Shown when
`(attempt X of Y)`	Retry is enabled and Y > 1
`— all N retries exhausted`	Retry enabled; attempts used up; single-provider mode — process stops
`— all N retries exhausted, trying next`	Retry enabled; attempts used up; Auto-Fallback mode — advancing to next model
`— trying next`	Retry disabled; Auto-Fallback mode — advancing to next model after one attempt
`, skipping provider`	Auth error in Auto-Fallback — entire provider is skipped, not just the model

Chain-level messages (⛓️ chain start, ❌ all models exhausted, 🔁 loop) describe the overall chain outcome and carry no provider/model prefix.

⛓️ Fallback Chain — enter API keys to activate

Auto-Fallback rules (active when toggle is on):
✅ Success → stop chain, display result
🔄 Rate limit (429/503) → retry (if enabled) with exponential backoff (2 s / 4 s / 8 s…), then advance
🚫 Auth error (401/403) → skip model and provider immediately (no retry)
❌ Client error (4xx) → skip model immediately (no retry — request is invalid or model unavailable)
⚠️ Server error (5xx) or network/connection failure → retry (if enabled) with exponential backoff, then advance
⏱ Timeout (60 s) → retry (if enabled) with exponential backoff, then advance
❌ All exhausted → show final error with list of what was tried
🔘 Tier filters → only models whose tier is checked are tried
📋 Model order → within each provider, models are tried top-to-bottom as listed in the dropdown; order reflects each provider's recommended usage priority (★ Best first)
🔁 Loop → after chain exhausts, waits 60 s then restarts from top; stops at configured maximum — API keys can be added or removed during the wait
🔧 Retry count & backoff → configured by the Retry setting above the Generate button; applies in both single-provider and Auto-Fallback mode

ⓘ Field & Term Definitions

Provider: The AI company or service offering the language model API. Each provider has different models, rate limits, pricing, and data policies.
Model: The specific AI model variant to use. Different models from the same provider vary in capability, speed, context length, and cost. Larger models (70B+) are generally more capable but slower and more expensive.
API Key: A secret authentication credential issued by each provider. Keys are stored in memory only and cleared when you close or refresh the page. Never share your API keys publicly.
Prompt: The text instruction you send to the language model. The model reads your prompt and generates a response based on it. A clear, specific prompt produces better results. Prompts count toward your token usage. When Privacy Mode is on, the prompt field is cleared after each request.
Privacy Mode: When enabled: (1) the prompt field is cleared immediately after each request is sent, (2) privacy-related headers are included in API calls where supported (e.g. Anthropic's direct-browser header). Note: this does not guarantee your data is not logged by the provider — each provider's data retention policy governs actual storage. Always review the provider's privacy policy before sending sensitive information.
Privacy: Whether the provider may use your prompts to train or improve their models. Free tiers sometimes require accepting less restrictive data policies. Avoid sending sensitive or personal information via free-tier APIs.
Temperature: Controls randomness in responses. 0 = deterministic and focused. 2 = highly varied and creative. Most use cases work well between 0.5 and 1.0. Use lower values for factual tasks, higher for creative writing.
Max Tokens: Maximum number of tokens the model may generate in its response. One token ≈ ¾ of a word. 250 tokens ≈ ~190 words. Increase for longer responses; some providers charge per token generated.
RPM: Requests Per Minute — how many API calls you can make per minute before being rate limited. Exceeding this returns a 429 error. For free tiers this is typically 10–30 RPM.
Daily Limit: Maximum requests per day on the free tier. Most limits reset at midnight UTC. Exceeding this returns a 429 error and requires waiting for the daily reset.
Context: The maximum combined size of your input prompt plus the model's response, measured in tokens. Larger context windows allow longer conversations and documents. 1,000 tokens ≈ 750 words.
Speed: Inference speed in tokens per second (TPS). Higher TPS = faster responses. Cerebras uses custom Wafer-Scale Engine (WSE) silicon to achieve speeds significantly faster than GPU-based providers.
Developer / Owner: The organization that created and trained the model (developer) and the entity that holds the legal rights to it (owner). For inference providers like Cerebras or NVIDIA NIM, the model developer and the infrastructure provider are different organizations.
Country of Origin: The country where the model was primarily developed. Relevant for data sovereignty and regulatory compliance (e.g., EU AI Act, GDPR). Current providers span: United States (most) and France (Mistral).
License: Terms under which the model can be used. Apache 2.0 is permissive open-source. Proprietary models restrict redistribution. Meta's Llama Community License permits commercial use with restrictions above 700M MAU.
Free (Limited): A model or provider with ongoing free access subject to rate limits (requests per minute or per day). No payment is required to stay within the limits. When limits are reached you receive a 429 error and must wait for the quota to reset — you are never billed automatically. Most free-tier API models fall into this category (Gemini, Cerebras, Mistral free models, GitHub Models, etc.).
Free (Billing Above Quota): A subset of Free (Limited) where exceeding the free quota does not block you but instead charges your billing account at standard rates. Gemini (Google AI Studio) is the primary example: 1,500 requests/day are free; requests beyond that are billed automatically. Always monitor usage on the provider dashboard to avoid unexpected charges.
Paid: A model that requires payment for every request — either pay-per-token/credit (e.g. OpenAI, Anthropic) or a subscription. There is no ongoing free quota. Some paid providers offer one-time sign-up credits (e.g. NVIDIA prototyping credits) — these are one-time bonuses, not a recurring free tier.
Payment Method: How a provider charges you when you go beyond any free tier. Credits — you purchase tokens or API credits which are deducted per request (e.g. OpenAI API, Anthropic API). Subscription — a flat monthly or annual fee (e.g. Perplexity Pro). Blocked — the provider simply refuses requests above quota with no billing option on the free plan (e.g. Cerebras).
Token: A token is the basic unit of text that language models process. One token is roughly ¾ of a word in English (or about 4 characters). "Hello world" is approximately 2 tokens. Longer words, non-English text, and code can use more tokens per word. Both your input prompt and the model's output count toward token limits. API pricing is almost always calculated in tokens (e.g. "$X per 1M tokens"). A GitHub Personal Access Token or an API Token is a different use of the word — it refers to a credential string used for authentication, not a unit of text.
Testing API Keys: The "Test Key" and "Test All Keys" functions send a real API request to each model asking it to reply with the single word "OK". These are live requests — they consume the same tokens, quotas, and spend as a regular prompt. Paid providers will deduct from your credits or billing account. Free (Limited) providers will use one request from your daily quota per model tested. A 1.5-second delay between models helps reduce the chance of hitting rate limits.
Safety / Content Filtering: Some providers apply server-side content filtering to every request. If a prompt is flagged, the API returns a 4xx error (typically 400 or 451) regardless of whether your prompt was harmful — borderline prompts may be rejected. The aggressiveness varies by provider: Gemini and OpenAI apply moderate to aggressive filtering; others (e.g. Cerebras, Mistral open models) apply lighter filtering. The "Safety" field in each provider's details describes the filtering level. If you receive unexpected 4xx errors that are not auth or rate-limit related, the prompt content may have triggered the provider's filter.
Fallback Chain / Auto-Fallback: When Auto-Fallback is enabled, the tool automatically tries each model in each provider in turn until one succeeds or all are exhausted. The chain only includes providers that have an API key entered and whose tier (Free or Paid) is checked in the model filter. Model order within each provider: models are tried in the order listed in the dropdown, which reflects each provider's own recommended usage priority — the ★ Best (recommended) model is always first, followed by remaining models in the order the provider specifies. Starting point: the chain always starts with the provider and model currently selected in the dropdowns; if that selection is not eligible (no key or tier filtered), it is silently skipped and the chain starts from the next eligible candidate. The chain runs forward-only from the starting point — it does not wrap back to providers or models that appeared earlier in the order. Retry behaviour: controlled by the separate Retry setting above the Generate button — if Retry is enabled, each model is retried up to the configured number of times before the chain advances to the next candidate; if Retry is disabled, each model gets exactly one attempt before advancing. See the Retry definition for full details on backoff, error types, and counting. Loop: when Loop is enabled, after the entire chain is exhausted without success the tool waits 60 seconds and then restarts the chain from the top. This repeats up to the configured maximum (1–10 loops). Looping stops when the maximum loop count is reached. The loop checkbox is not automatically unchecked — the chain simply stops and the page is static until Generate is clicked again. API keys can be added or removed during the 60-second wait between loops — the next pass will pick up any changes. Provider and model order is set using the ▲/▼ buttons or drag-and-drop in the provider sections.
Retry: When enabled, failed requests are automatically retried before giving up or (in Auto-Fallback mode) advancing to the next model. Setting "Retry up to N times" means N retries after the first attempt — total attempts = N + 1 (e.g. 3 retries = 4 total attempts). Maximum 5 retries (6 total attempts). Exponential backoff: each retry waits progressively longer — 2 s before retry 1, 4 s before retry 2, 8 s before retry 3, 16 s before retry 4, 30 s before retry 5 (capped). If the provider returns a Retry-After header indicating exactly how long to wait, that value is used instead of the computed backoff. Retriable errors (retried up to the configured maximum): rate limit (429), service unavailable (503), timeout (60 s), server errors (5xx), network/connection failures (e.g. connection refused, DNS failure — transient and worth retrying). Non-retriable errors (never retried): auth errors (401/403 — retrying won't fix a wrong key), and client errors (other 4xx such as 400, 404, 422 — the request itself is invalid and would produce the same result). When retries are exhausted: in single-provider mode the process stops and the error is shown; in Auto-Fallback mode the chain advances to the next candidate. Works independently of Auto-Fallback — Retry can be enabled with or without Auto-Fallback; Auto-Fallback can be enabled with or without Retry.

Looking for another provider or model? Request a Provider or Model ↗

Found a bug or have a suggestion? Submit Feedback ↗