Rate Limits

Understand per-plan rate limits, response headers, and best practices for high-throughput usage.

Rate limits by plan

Each plan has a per-minute request limit applied to the /v1/send endpoint:

| Plan | Rate limit | Monthly quota | Overage | |---|---|---|---| | Starter | 100 requests/min | 500 voice notes | $0.04 per note | | Growth | 500 requests/min | 2,000 voice notes | $0.03 per note | | Scale | 2,000 requests/min | 10,000 voice notes | $0.015 per note | | Enterprise | Custom | Custom | Custom |

Rate limits apply per API key. If you have multiple keys, each has its own independent limit.

The /v1/status and /v1/usage endpoints have a shared limit of 1,000 requests per minute across all plans.

Rate limit headers

Every response from the API includes rate limit information:

X-RateLimit-Limit: 500
X-RateLimit-Remaining: 342
X-RateLimit-Reset: 1710422460

| Header | Description | |---|---| | X-RateLimit-Limit | Maximum requests allowed per minute | | X-RateLimit-Remaining | Requests remaining in the current 60-second window | | X-RateLimit-Reset | Unix timestamp (seconds) when the window resets |

What happens when you exceed the limit

When X-RateLimit-Remaining reaches 0, subsequent requests receive a 429 Too Many Requests response:

{
  "error": {
    "code": "rate_limited",
    "message": "Rate limit exceeded. Retry after 8 seconds.",
    "details": {
      "limit": 500,
      "window": "1m",
      "retry_after": 8
    }
  }
}

The response includes a Retry-After header with the number of seconds to wait before your next request will succeed.

Best practices

1. Monitor your remaining quota

Check X-RateLimit-Remaining after each request. If it drops below 10% of your limit, slow down or queue requests.

const response = await fetch("https://api.svarapi.io/v1/send", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SVARA_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify(payload),
});

const remaining = parseInt(response.headers.get("X-RateLimit-Remaining"));
const limit = parseInt(response.headers.get("X-RateLimit-Limit"));

if (remaining < limit * 0.1) {
  // Slow down — you're using more than 90% of your rate limit
  await new Promise((resolve) => setTimeout(resolve, 1000));
}

2. Use exponential backoff on 429 responses

Never retry immediately after a rate limit error. Use the Retry-After header or exponential backoff:

import time
import requests

def send_with_backoff(payload, max_retries=5):
    for attempt in range(max_retries):
        response = requests.post(
            "https://api.svarapi.io/v1/send",
            headers={"Authorization": f"Bearer {api_key}"},
            json=payload,
        )

        if response.status_code != 429:
            return response

        retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
        print(f"Rate limited. Waiting {retry_after}s (attempt {attempt + 1})")
        time.sleep(retry_after)

    raise Exception("Max retries exceeded")

3. Batch requests with a queue

For high-volume sending, use a local queue that respects your rate limit:

class RateLimitedQueue {
  constructor(maxPerMinute) {
    this.maxPerMinute = maxPerMinute;
    this.queue = [];
    this.sentThisWindow = 0;
    this.windowStart = Date.now();
  }

  async add(payload) {
    this.queue.push(payload);
    this.process();
  }

  async process() {
    while (this.queue.length > 0) {
      // Reset window if needed
      if (Date.now() - this.windowStart > 60000) {
        this.sentThisWindow = 0;
        this.windowStart = Date.now();
      }

      // Wait if at capacity
      if (this.sentThisWindow >= this.maxPerMinute) {
        const waitMs = 60000 - (Date.now() - this.windowStart);
        await new Promise((r) => setTimeout(r, waitMs));
        continue;
      }

      const payload = this.queue.shift();
      await sendVoiceNote(payload);
      this.sentThisWindow++;
    }
  }
}

// Usage
const queue = new RateLimitedQueue(450); // Leave 10% headroom
for (const recipient of recipients) {
  queue.add({ platform: "telegram", recipient, audio_url, session });
}

4. Use webhooks instead of polling

Instead of polling /v1/status for delivery updates, set up webhooks to receive real-time notifications. This eliminates unnecessary API calls and reduces your rate limit consumption.

5. Cache usage data

The /v1/usage endpoint returns billing period data that changes slowly. Cache the response for at least 60 seconds to avoid unnecessary requests:

let usageCache = null;
let usageCacheTime = 0;

async function getUsage() {
  if (usageCache && Date.now() - usageCacheTime < 60000) {
    return usageCache;
  }

  const response = await fetch("https://api.svarapi.io/v1/usage", {
    headers: { "Authorization": `Bearer ${process.env.SVARA_API_KEY}` },
  });

  usageCache = await response.json();
  usageCacheTime = Date.now();
  return usageCache;
}
Ask Svara

Hey! I'm the Svara assistant. Ask me anything about integrating voice notes into your product.

Powered by Svara