Rate Limits

Rate limits protect the platform and ensure fair usage. SovereignEG enforces one request-rate limit per API key: RPM (requests per minute).

Spending is governed by your organization's EGP credit balance plus the per-model sell rate — not by a separate tokens-per-minute cap. When your balance hits zero on a priced model the API returns 402 quota_exceeded (see Error Codes).

Limits by tier

Each plan tier has a requests-per-minute (RPM) ceiling:

Your effective limit is reported on every response via the X-RateLimit-Limit-Requests header (see Response headers).

Higher limits

The table above shows tier defaults. Your organization may have a custom ceiling that applies to every API key you own.

If you need a higher limit, contact us with your expected peak RPM and a one-line reason. Most increases are turned around the same business day.

Per-key and per-project narrowing

Owners can narrow further than the org default:

  • Per-project RPM override — set in Dashboard → Projects → Limits. Useful for isolating an internal experiment from production traffic.
  • Per-key RPM limit — set on the Create API key screen. Useful when you mint a key for a CI job or partner integration.

The API applies the strictest value across plan, org, project, and key. The result is reported on the response headers below.

How it works

SovereignEG uses a sliding window for RPM:

  • Counts requests in a rolling 60-second window.
  • Limits are enforced per API key. If your org has multiple keys, each key gets its own bucket — the limits do not pool.

Response headers

Every response includes rate-limit information:

X-RateLimit-Limit-Requests: 60
X-RateLimit-Remaining-Requests: 58

When you hit the limit

You receive HTTP 429 with a Retry-After header:

{
  "error": {
    "type": "rate_limit_error",
    "message": "Rate limit exceeded: 60 requests per minute. Please retry after the window resets.",
    "code": "rate_limit_exceeded"
  }
}

Handling rate limits

Python

import time
from openai import RateLimitError
 
def call_with_retry(client, **kwargs):
    max_retries = 3
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError as e:
            retry_after = int(e.response.headers.get("Retry-After", 60))
            print(f"Rate limited. Waiting {retry_after}s...")
            time.sleep(retry_after)
    raise Exception("Max retries exceeded")

Node.js

async function callWithRetry(client, params, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await client.chat.completions.create(params);
    } catch (error) {
      if (error.status === 429) {
        const retryAfter = parseInt(error.headers?.['retry-after'] || '60');
        await new Promise(r => setTimeout(r, retryAfter * 1000));
      } else throw error;
    }
  }
}

Best practices

  1. Implement exponential backoff — don't hammer the API on 429.
  2. Monitor your headers — check remaining quota before making requests.
  3. Batch when possible — one request with a longer prompt uses fewer RPM than many short requests.
  4. Use streaming — streaming counts as one request regardless of output length.
  5. Cache responses — if the same prompt is sent often, cache the result.