Choosing a Model

The right model depends on your task, budget, and latency requirements. Always start from the Model Library or GET /v1/models — only live models are callable, and IDs change as the catalog grows.

Quick decision tree

Do you need Arabic as the primary language? → Filter the Model Library for live chat models. Multilingual models (Qwen-class, etc.) are a common starting point.

Do you need a context window larger than 8K tokens? → Sort the Model Library by context window and pick a live model that fits your document.

Is speed and cost your top priority? → Sort by input price ascending and test the cheapest live chat model first.

Do you need the best quality for complex tasks? → Try a larger live model (70B class or similar). Benchmark on your own prompts before committing.

By use case

Use caseWhere to lookWhy
Customer support chatbotCheapest live chat modelFast loops; upgrade if quality falls short
Arabic customer supportLive multilingual chat modelsStrong Arabic without a separate integration
Code generationLarger live chat modelsBetter instruction-following on code
Document summarizationSmall/mid chat models under 8K contextFast; fits most short docs
Long document analysisHighest context window in catalogFit the full doc in one request
Creative writingLarger chat modelsMore nuanced tone
Data extraction / JSONModels with Tools badge (if using function calling)Structured output
TranslationMultilingual chat modelsCross-language quality
High-throughput batchLowest input price per 1M tokensMinimise cost at scale
Embeddings / RAGLive embed modelsVectors for retrieval — separate endpoint from chat
Research / reasoningLargest live chat model you can affordTest on your hardest prompts

Cost optimization tips

  1. Start cheap. Test with the lowest-priced live chat model; upgrade only when quality isn't enough.
  2. Develop on a small model. Iterate fast, then switch for production if needed.
  3. Set max_tokens. Don't let the model generate more than you need.
  4. Reuse system prompts. Keep a stable system message so repeated context isn't re-sent unnecessarily.
  5. Batch when possible. Parallel requests use your RPM budget efficiently — see Rate Limits.

Switching models

The API is identical across models — change one string:

response = client.chat.completions.create(
    model="...",  # any live id from GET /v1/models
    messages=same_messages,
    temperature=same_temperature,
)

See Models Overview for the live comparison table.