Rate Limits
Rate limits protect the platform and ensure fair usage. SovereignEG enforces one request-rate limit per API key: RPM (requests per minute).
Spending is governed by your organization's EGP credit balance plus
the per-model sell rate — not by a separate tokens-per-minute cap.
When your balance hits zero on a priced model the API returns
402 quota_exceeded (see Error Codes).
Limits by tier
Each plan tier has a requests-per-minute (RPM) ceiling:
Your effective limit is reported on every response via the
X-RateLimit-Limit-Requests header (see Response headers).
Higher limits
The table above shows tier defaults. Your organization may have a custom ceiling that applies to every API key you own.
If you need a higher limit, contact us with your expected peak RPM and a one-line reason. Most increases are turned around the same business day.
Per-key and per-project narrowing
Owners can narrow further than the org default:
- Per-project RPM override — set in Dashboard → Projects → Limits. Useful for isolating an internal experiment from production traffic.
- Per-key RPM limit — set on the Create API key screen. Useful when you mint a key for a CI job or partner integration.
The API applies the strictest value across plan, org, project, and key. The result is reported on the response headers below.
How it works
SovereignEG uses a sliding window for RPM:
- Counts requests in a rolling 60-second window.
- Limits are enforced per API key. If your org has multiple keys, each key gets its own bucket — the limits do not pool.
Response headers
Every response includes rate-limit information:
X-RateLimit-Limit-Requests: 60
X-RateLimit-Remaining-Requests: 58
When you hit the limit
You receive HTTP 429 with a Retry-After header:
{
"error": {
"type": "rate_limit_error",
"message": "Rate limit exceeded: 60 requests per minute. Please retry after the window resets.",
"code": "rate_limit_exceeded"
}
}Handling rate limits
Python
import time
from openai import RateLimitError
def call_with_retry(client, **kwargs):
max_retries = 3
for attempt in range(max_retries):
try:
return client.chat.completions.create(**kwargs)
except RateLimitError as e:
retry_after = int(e.response.headers.get("Retry-After", 60))
print(f"Rate limited. Waiting {retry_after}s...")
time.sleep(retry_after)
raise Exception("Max retries exceeded")Node.js
async function callWithRetry(client, params, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await client.chat.completions.create(params);
} catch (error) {
if (error.status === 429) {
const retryAfter = parseInt(error.headers?.['retry-after'] || '60');
await new Promise(r => setTimeout(r, retryAfter * 1000));
} else throw error;
}
}
}Best practices
- Implement exponential backoff — don't hammer the API on 429.
- Monitor your headers — check remaining quota before making requests.
- Batch when possible — one request with a longer prompt uses fewer RPM than many short requests.
- Use streaming — streaming counts as one request regardless of output length.
- Cache responses — if the same prompt is sent often, cache the result.