Choosing a Model

The right model depends on your task, budget, and latency requirements. Always start from the Model Library or GET /v1/models — only live models are callable, and IDs change as the catalog grows.

Quick decision tree

Do you need Arabic as the primary language? → Filter the Model Library for live chat models. Multilingual models (Qwen-class, etc.) are a common starting point.

Do you need a context window larger than 8K tokens? → Sort the Model Library by context window and pick a live model that fits your document.

Is speed and cost your top priority? → Sort by input price ascending and test the cheapest live chat model first.

Do you need the best quality for complex tasks? → Try a larger live model (70B class or similar). Benchmark on your own prompts before committing.

By use case

Use case	Where to look	Why
Customer support chatbot	Cheapest live chat model	Fast loops; upgrade if quality falls short
Arabic customer support	Live multilingual chat models	Strong Arabic without a separate integration
Code generation	Larger live chat models	Better instruction-following on code
Document summarization	Small/mid chat models under 8K context	Fast; fits most short docs
Long document analysis	Highest context window in catalog	Fit the full doc in one request
Creative writing	Larger chat models	More nuanced tone
Data extraction / JSON	Models with Tools badge (if using function calling)	Structured output
Translation	Multilingual chat models	Cross-language quality
High-throughput batch	Lowest input price per 1M tokens	Minimise cost at scale
Embeddings / RAG	Live embed models	Vectors for retrieval — separate endpoint from chat
Research / reasoning	Largest live chat model you can afford	Test on your hardest prompts

Cost optimization tips

Start cheap. Test with the lowest-priced live chat model; upgrade only when quality isn't enough.
Develop on a small model. Iterate fast, then switch for production if needed.
Set max_tokens. Don't let the model generate more than you need.
Reuse system prompts. Keep a stable system message so repeated context isn't re-sent unnecessarily.
Batch when possible. Parallel requests use your RPM budget efficiently — see Rate Limits.

Switching models

The API is identical across models — change one string:

response = client.chat.completions.create(
    model="...",  # any live id from GET /v1/models
    messages=same_messages,
    temperature=same_temperature,
)

See Models Overview for the live comparison table.