0/10

advancedrate-limitingapimulti-regionredisconsistency

API Rate-Limiting as a Multi-Region Service

Design a globally consistent rate limiting service with low latency and multi-region enforcement.

What You'll Learn

•Token bucket and sliding window log algorithms: implementation trade-offs, burst handling, and precision guarantees
•Distributed counter synchronization using gossip and windowed CRDT consumed counters for bounded global quota drift
•Edge-cached policy evaluation with push-based invalidation for sub-5ms decision latency
•Hot-key mitigation through counter sharding, local pre-aggregation, and adaptive splitting
•Lua scripting in Redis for atomic token bucket operations (check-and-decrement in single round-trip)
•Graceful degradation strategies: local-only mode, fail-open vs fail-closed policies during network partitions
•Multi-tier rate limiting: per-user, per-API-key, per-IP, per-organization with hierarchical quota inheritance

Interview Simulation

Run a timed mock interview for this project and get a scored debrief.

Quick Context

Problem

A multi-region rate limiting service sits on the critical path of every API request, making sub-millisecond decisions about whether to allow or throttle traffic based on configurable quotas. The core tension is between global accuracy (enforcing exact quotas across all regions) and local latency (making decisions without cross-region round trips). At companies like Cloudflare, Stripe, and AWS API Gateway, rate limiters handle millions of decisions per second across dozens of geographic regions. The key challenges are: (1) choosing the right algorithm that balances burst tolerance with precision, (2) synchronizing distributed counters across regions without adding latency to the hot path, (3) handling hot keys where a single tenant generates disproportionate traffic, (4) gracefully degrading when regions lose connectivity while preventing quota abuse, and (5) supporting hierarchical rate limits (per-user within per-org within global) with efficient evaluation. Success metrics: p99 decision latency <5ms, global quota accuracy within 1%, zero false rejections during regional failover, and <100ms policy propagation for kill-switch scenarios.

Constraints & Assumptions8 items

•
Handle 2 million rate checks per second globally with p99 decision latency under 5ms
Why?
•
Enforce global quotas across 8 regions with at most 1% overshoot during normal operation
Why?
•
Support 1 million active rate limit policies with hierarchical inheritance (user < org < global)
Why?
•
Propagate policy changes (especially kill switches) to all regions within 100ms
Why?

Key Numbers(hover for details)

2M/sec

Rate Checks

Peak global check throughput across all regions

p99 <5ms

Decision Latency

End-to-end rate check including policy lookup

Policies

Active rate limit policies across all tenants

100M

Counters

Active counter keys in memory across all regions

99.99%

Availability

Rate limiting service uptime SLA

Regions

Geographic deployment regions

<2s

Sync Lag

Cross-region counter synchronization delay

99%

Accuracy

Global quota enforcement accuracy (1% overshoot tolerance)

Requirements

Multi-Algorithm Support

Support token bucket (with configurable burst), sliding window log (exact counting), sliding window counter (approximate but memory-efficient), and fixed window algorithms

Why it matters: Different use cases need different precision/performance trade-offs - API quotas need burst tolerance while abuse prevention needs exact counting

Hierarchical Rate Limits

Enforce rate limits at multiple levels: per-user, per-API-key, per-IP, per-organization, and global, with configurable inheritance and override rules

Why it matters: Enterprise customers need organization-level quotas that subdivide among their users while respecting global platform limits

Global Quota Enforcement

Synchronize counters across all 8 regions so that a tenant consuming quota in US-East cannot exceed their global limit by also consuming in EU-West

Why it matters: Without cross-region sync, a tenant with a 1000/min limit could effectively get 8000/min by distributing requests across regions

Burst Control

Allow configurable burst sizes above the sustained rate, with automatic recovery based on token replenishment rate

Why it matters: Legitimate traffic is bursty - denying short spikes degrades user experience and causes unnecessary retry storms

Policy Management API

CRUD operations for rate limit policies with versioning, audit trail, approval workflows, and bulk import/export

Why it matters: Operations teams need to quickly adjust limits during incidents or onboard new customers with custom quotas

Response Headers

Return standard rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After) on every response

Why it matters: Clients need visibility into their quota usage to implement proper backoff and retry logic

Override & Bypass

Support per-request bypass tokens for internal services and temporary limit overrides for incident response

Why it matters: Internal health checks and monitoring should not count against quotas; incident response may need temporary limit increases

Architecture Evolution

Show deltas

Single-region rate limiter with Redis backend. Token bucket implemented via Lua script for atomic check-and-decrement. Suitable for <100K checks/sec with single-region deployment.

Click a component for details, or run Trace Flow to inspect end-to-end movement.

Legend

Gateway

Checker

Counter Store

Policy

Audit

What Changed & Why

•Single Redis instance with token bucket Lua script
•Policy stored in PostgreSQL, cached in application memory
•Fixed window counters as fallback for simple limits
•Standard rate limit response headers on every response
•Basic admin API for CRUD on policies

Key Decisions

5 decisions

Rate Limiting Algorithm: Token Bucket vs Sliding Window vs Fixed Window4 alternatives

Token bucket with configurable burst size and refill rate, with sliding window counter as secondary algorithm

Counter Store: Redis Cluster vs DynamoDB vs Custom In-Memory3 alternatives

Redis Cluster with hash-slot sharding and Lua scripting for atomic operations

Cross-Region Sync: Gossip Protocol vs Strong Consensus vs Async Replication3 alternatives

Gossip-based windowed CRDT consumed counters with adaptive sync frequency per counter velocity

Policy Distribution: Push-Based vs Pull-Based vs Hybrid3 alternatives

Push-based via Kafka with local edge cache and pull-based fallback

Hot-Key Mitigation: Counter Sharding vs Local Pre-Aggregation vs Dedicated Path3 alternatives

Automatic counter sharding with local pre-aggregation for detected hot keys

API Design

The Rate Limiting API provides endpoints for rate check decisions, policy management, and quota monitoring. The check endpoint is designed for ultra-low latency on the hot path, while management endpoints support comprehensive CRUD operations with audit trails.

Base URL

https://api.ratelimit.io/v1

Authentication

API Key + mTLS

Rate check endpoints use mTLS for zero-overhead authentication. Management endpoints use API keys with RBAC. Admin operations require MFA-verified tokens.

Endpoints

POST/checkCheck rate limit (hot path)

Webhooks

EVENT

policy.updated

Fired when a rate limit policy is created, updated, or deleted

Payload

{
  "event": "policy.updated",
  "policy_id": "pol_xyz789",
  "action": "updated",
  "version": 2,
  "changed_by": "admin@example.com",
  "timestamp": "2024-01-15T10:30:00Z"
}

EVENT

quota.exhausted

Fired when a key exhausts its quota (useful for alerting customers)

Payload

{
  "event": "quota.exhausted",
  "key": "user:usr_abc123:api:/v1/orders",
  "policy_id": "pol_xyz789",
  "limit": 1000,
  "consumed": 1000,
  "reset_at": "2024-01-15T10:31:00Z",
  "timestamp": "2024-01-15T10:30:42Z"
}

EVENT

hotkey.detected

Fired when automatic hot-key detection triggers counter sharding

Payload

{
  "event": "hotkey.detected",
  "key": "org:org_456:api:*",
  "checks_per_second": 15000,
  "shard_count": 4,
  "region": "us-east-1",
  "timestamp": "2024-01-15T10:30:00Z"
}

Code Samples

Redis Lua: Atomic Token BucketProduction

Lua script implementing token bucket with atomic check-and-consume. Runs entirely within Redis for single round-trip latency.

Data Model & Queries

Schema

SQL

-- Rate Limiting Platform Schema

-- Rate limit policies with hierarchical inheritance
CREATE TABLE rate_policies (
    policy_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    description TEXT,
    algorithm VARCHAR(50) NOT NULL DEFAULT 'token_bucket',
    -- CHECK (algorithm IN ('token_bucket', 'sliding_window_log', 'sliding_window_counter', 'fixed_window')),
    limit_value INT NOT NULL,
    window_seconds INT NOT NULL,
    burst_size INT NOT NULL DEFAULT 0,
    refill_rate DECIMAL(10,4), -- tokens per second for token bucket
    parent_policy_id UUID REFERENCES rate_policies(policy_id),
    hierarchy_level VARCHAR(20) NOT NULL DEFAULT 'user',
    -- CHECK (hierarchy_level IN ('user', 'api_key', 'organization', 'global')),
    targeting_rules JSONB DEFAULT '{}',
    is_active BOOLEAN NOT NULL DEFAULT true,
    version INT NOT NULL DEFAULT 1,
    created_by VARCHAR(255) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_policies_active ON rate_policies (is_active, hierarchy_level);
CREATE INDEX idx_policies_parent ON rate_policies (parent_policy_id) WHERE parent_policy_id IS NOT NULL;
CREATE INDEX idx_policies_targeting ON rate_policies USING GIN (targeting_rules);

-- Policy audit log for compliance and debugging
CREATE TABLE policy_audit_log (
    audit_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    policy_id UUID NOT NULL REFERENCES rate_policies(policy_id),
    action VARCHAR(20) NOT NULL, -- 'created', 'updated', 'deleted', 'activated', 'deactivated'
    old_values JSONB,
    new_values JSONB,
    changed_by VARCHAR(255) NOT NULL,
    change_reason TEXT,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_audit_policy ON policy_audit_log (policy_id, created_at DESC);
CREATE INDEX idx_audit_time ON policy_audit_log (created_at DESC);

-- Counter sync state for cross-region reconciliation
CREATE TABLE counter_sync_state (
    counter_key TEXT NOT NULL,
    region VARCHAR(50) NOT NULL,
    local_count BIGINT NOT NULL DEFAULT 0,
    last_sync_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    sync_generation BIGINT NOT NULL DEFAULT 0,
    PRIMARY KEY (counter_key, region)
);

CREATE INDEX idx_sync_stale ON counter_sync_state (last_sync_at) WHERE last_sync_at < NOW() - INTERVAL '10 seconds';

-- Hot key tracking for automatic sharding decisions
CREATE TABLE hot_key_events (
    event_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    counter_key TEXT NOT NULL,
    region VARCHAR(50) NOT NULL,
    checks_per_second INT NOT NULL,
    shard_count INT NOT NULL DEFAULT 1,
    action VARCHAR(20) NOT NULL, -- 'detected', 'sharded', 'unsharded'
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_hotkey_recent ON hot_key_events (counter_key, created_at DESC);

-- Sampled decision log for troubleshooting and quota analytics
CREATE TABLE rate_check_log (
    check_id BIGSERIAL PRIMARY KEY,
    checked_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    counter_key TEXT NOT NULL,
    policy_id UUID REFERENCES rate_policies(policy_id),
    region VARCHAR(50) NOT NULL,
    decision VARCHAR(16) NOT NULL, -- 'allowed', 'denied', 'degraded_allow'
    checks_per_second INT,
    observed_latency_ms DECIMAL(10,3)
);

CREATE INDEX idx_rate_check_denied_recent ON rate_check_log (checked_at DESC, counter_key)
WHERE decision = 'denied';

-- Redis key schemas (documented for operations)
-- rate:{level}:{id} => HASH {tokens: float, last_refill: timestamp}
-- rate:{level}:{id}:shard:{n} => HASH (sharded counter)
-- policy:{policy_id} => JSON (cached policy)
-- hotkey:{key} => INT (check count in current window)

Why This Schema

•Hierarchical rate_policies with parent_policy_id enables user -> org -> global quota inheritance
•JSONB targeting_rules with GIN index allows flexible policy matching without schema changes
•Separate policy_audit_log provides immutable compliance trail for every policy change
•counter_sync_state tracks consumed counter state per region and generation for cross-region reconciliation
•hot_key_events captures sharding decisions for post-incident analysis
•Sampled rate_check_log supports short-horizon troubleshooting without storing every decision event
•Redis HASH for token bucket state enables atomic Lua script operations in single key
•Auto-TTL on Redis keys prevents memory bloat from expired counters

Common Queries

Resolve policy hierarchy for a rate check

SQL

WITH RECURSIVE policy_chain AS (
    SELECT policy_id, parent_policy_id, limit_value, burst_size, hierarchy_level, 1 as depth
    FROM rate_policies WHERE policy_id = $1 AND is_active = true
    UNION ALL
    SELECT p.policy_id, p.parent_policy_id, p.limit_value, p.burst_size, p.hierarchy_level, pc.depth + 1
    FROM rate_policies p JOIN policy_chain pc ON p.policy_id = pc.parent_policy_id
    WHERE p.is_active = true
)
SELECT * FROM policy_chain ORDER BY depth DESC;

Atomic token bucket check via Redis Lua

Redis/Bash

EVALSHA <sha> 1 rate:user:usr_abc123 1000 16.67 <now_us> 1 50

Find stale counter sync state (potential drift)

SQL

SELECT counter_key, region,
    EXTRACT(EPOCH FROM NOW() - last_sync_at) as seconds_stale,
    local_count
FROM counter_sync_state
WHERE last_sync_at < NOW() - INTERVAL '10 seconds'
ORDER BY last_sync_at ASC
LIMIT 100;

Top over-limit keys in last hour

SQL

SELECT counter_key, COUNT(*) as deny_count,
    MAX(checks_per_second) as peak_rate
FROM rate_check_log
WHERE decision = 'denied' AND checked_at > NOW() - INTERVAL '1 hour'
GROUP BY counter_key
ORDER BY deny_count DESC
LIMIT 20;

Index Rationale

idx_policies_activeFast lookup of active policies by hierarchy level during rate checks

idx_policies_parentTraverse policy hierarchy for inherited limits (partial index excludes root)

idx_policies_targetingMatch requests to policies using JSONB containment queries

idx_audit_policyRetrieve audit history for a specific policy (compliance, debugging)

idx_sync_staleFind counter keys with stale sync state for reconciliation

idx_hotkey_recentTrack hot-key sharding history for capacity planning

idx_rate_check_denied_recentInvestigate top denied keys quickly during incidents

Scaling & Bottlenecks

Now

Primary Bottleneck

Hot-key counters cause single Redis node to saturate while other nodes are idle

Mitigation

Automatic counter sharding distributes hot keys across multiple Redis hash slots

What You'd Change

•Deploy hot-key detector monitoring checks/sec per counter key
•Implement automatic counter splitting when key exceeds 1000 checks/sec
•Add in-process L1 cache with 100ms TTL to absorb repeated checks

2× Scale

Primary Bottleneck

Cross-region counter drift allows up to 15% overshoot during sync windows

Mitigation

Reduce sync interval for high-velocity counters and implement generation-aware merge of per-region consumed deltas

What You'd Change

•Migrate from fixed-interval gossip to adaptive sync based on counter velocity
•Implement windowed consumed counters with generation IDs for conflict-free cross-region merging
•Add priority sync channel for counters approaching their limits (>80% utilization)
•Deploy regional counter aggregators to reduce fan-out overhead

10× Scale

Primary Bottleneck

Counter store memory exceeds single-cluster capacity with 1B+ active keys

Mitigation

Implement tiered counter storage with hot/warm partitioning and probabilistic counting for long-tail

What You'd Change

•Partition counters: hot (>10 checks/sec) in Redis, warm (<10/sec) in DynamoDB
•Use Count-Min Sketch for approximate counting of long-tail keys
•Deploy counter pre-allocation for predictable daily traffic patterns
•Implement counter compaction: merge idle sub-counters back to single key

Failure Scenarios

Monitoring & Observability

Comprehensive monitoring should tie serving latency, quota correctness, sync freshness, and region-level resilience into one operational view with fast diagnosis paths.

Key Metrics

ratelimit_check_latencyhistogram

End-to-end latency of rate check decisions

Good: p99 < 5ms

Warning: p99 > 10ms

Critical: p99 > 50ms

ratelimit_decision_totalcounter

Total rate limit decisions by result (allowed, denied, degraded_allow, error)

Good: error_rate < 0.01%

Warning: error_rate > 0.1%

Critical: error_rate > 1%

policy_cache_hit_rategauge

Percentage of policy lookups served from cache

Good: > 99%

Warning: < 95%

Critical: < 90%

counter_sync_laggauge

Global p99 lag between local consumed counters and merged cross-region view

Good: < 1000ms

Warning: > 3000ms

Critical: > 10000ms

sync_lag_per_regiongauge

Per-region lag to latest sync generation

Good: all regions < 1500ms

Warning: any region > 3000ms

Critical: any region > 8000ms

quota_overshoot_pctgauge

Percentage by which global consumption exceeds quota limit

Good: < 1%

Warning: > 5%

Critical: > 15%

hot_key_countgauge

Number of counter keys classified as hot (>1K checks/sec)

Good: < 100

Warning: > 500

Critical: > 2000

checks_per_regiongauge

Request checks served per second in each region

Good: within 20% of expected traffic split

Warning: deviation >35%

Critical: deviation >60%

redis_health_per_regiongauge

Composite region score from primary health, replica lag, and failover state

Good: > 0.99

Warning: 0.95-0.99

Critical: < 0.95

deny_ratio_by_policygauge

Per-policy denied/total ratio used to detect misconfigured quotas

Good: baseline +/- 20%

Warning: spike > 2x baseline

Critical: spike > 4x baseline

Alert Rules

HighCheckLatencywarning

Rate check latency exceeding SLO, may impact API response times

ratelimit_check_latency_p99 > 10ms for 2m

Runbook: Check Redis cluster health, policy cache hit rate, and counter shard balance

CounterSyncStalecritical

Cross-region sync stale, global quotas may be inaccurate

counter_sync_lag > 5000ms for 1m

Runbook: Inspect sync transport, regional gossip fanout, and generation advancement

RegionalSyncSkewcritical

One or more regions are lagging badly and can over-consume quota

max(sync_lag_per_region) > 8000ms for 3m

Runbook: Isolate lagging region, force full sync replay, temporarily tighten local quota budget

QuotaOvershootwarning

Global quota overshoot exceeding tolerance, possible abuse or sync issue

quota_overshoot_pct > 10% for 5m

Runbook: Trigger immediate counter reconciliation, check hot-key traffic, and escalate high-priority sync lane

RedisUnavailablecritical

Redis counter store unavailable, rate limiter in degraded mode

redis_connection_errors > 10 in 1m

Runbook: Check Redis failover status, enable conservative in-process fallback counters, and watch degraded_allow ratio

PolicyCacheCollapsewarning

Policy cache hit rate collapse likely causing DB thundering herd

policy_cache_hit_rate < 90 for 2m

Runbook: Enable stale-while-revalidate, inject cache jitter, and rate-limit bulk invalidations

DenySpikeByPolicywarning

One or more policies are denying unexpectedly high traffic

max(deny_ratio_by_policy) > 0.7 for 10m

Runbook: Compare current policy version to previous, review rollout intent, and simulate policy impact

Dashboard Layout

Serving SLO

ratelimit_check_latencyratelimit_decision_totalpolicy_cache_hit_rate

Quota Correctness

quota_overshoot_pctcounter_sync_lagdeny_ratio_by_policy

Regional Reliability

checks_per_regionsync_lag_per_regionredis_health_per_region

Hot Key Pressure

hot_key_countdeny_ratio_by_policyratelimit_check_latency

Scale Calculator

Estimate quota-check throughput, Redis memory footprint, sync bandwidth, and monthly spend for multi-region rate limiting. Inputs separate key cardinality from active traffic so hot-key cost behavior is realistic.

Configuration

Rate Checks/Second2.00M checks/sec

10.00K checks/sec10.00M checks/sec

Active Counters100.00M counters

100.00K counters1.00B counters

Active Keys/Second1.80M keys/sec

1.00K keys/sec5.00M keys/sec

Counter Size64.00 bytes

32.00 bytes256.00 bytes

Regions8.00 regions

1.00 regions20.00 regions

Redis Replication

Sync Interval

Hot Key Traffic %3.00 %

0.10 %40.00 %

Inter-Region Replication %70.00 %

0.00 %100.00 %

Calculated Results

Daily Checks

Total rate checks per day

172.80B checks

Hot Keys/Second

Active keys routed through high-priority synchronization

54.00Kkeys/sec

Logical Counter Memory

Counter state across all regions before replication

47.68 GB

Physical Counter Memory

Replicated Redis memory footprint

95.37 GB

Sync Bandwidth

Cross-region sync network usage for hot keys

0.00MB/s

Redis Nodes/Region

Recommended Redis cluster nodes per region

Rate Checker Instances

Stateless checker nodes handling inbound traffic

Decision Workers

Async workers for policy refresh, sync merge, and background reconciliation

Storage Cost

Redis memory fleet cost

$858

Compute Cost

Redis + checker fleet compute

$4,880

Network Cost

Cross-region replication and sync transfer

Estimated Monthly Cost

ElastiCache + compute + inter-region transfer

$5,738

Cost per Million Checks

Effective blended serving cost at configured load

Estimated Monthly Cost (AWS)

$5,738/month

Compute Cost$4,880

Storage Cost$858

Network Cost$0

* Estimates based on simplified AWS pricing. Actual costs may vary.

Cost & Capacity

Traffic Estimates

Peak Rate Checks

Global peak across all regions

2M/sec

Avg Rate Checks

Average sustained throughput

850K/sec

Daily Checks

Total rate checks per day at average rate

73B

Policy Lookups

Matches check rate (1:1)

2M/sec

Counter Syncs

Cross-region counter deltas for hot keys

12K/sec

Storage Estimates

Active Counters (Logical)

100M counters x 64 bytes x 8 regions (regional local state)

51.2 GB

Counter Replica Footprint

Logical counter state with replication factor 2

102.4 GB

Policy Store

1M policies with targeting rules

500 MB

Audit + Decision Logs

Policy audit plus sampled denied/slow decisions

120 GB/month

Sync State

Counter sync metadata across regions

2 GB

Test Your Understanding

Knowledge Check

Test your understanding of distributed rate limiting with real-world failure scenarios and design decisions.

Failure Diagnosis

Architecture Decisions

Summary & Takeaways

Key Takeaways

1.Token bucket is the best general-purpose algorithm: it naturally handles bursts, is implementable atomically via Lua script, and has O(1) memory per key
2.The core tension in distributed rate limiting is global accuracy vs local latency - windowed consumed counters with adaptive sync offer a practical bounded-drift trade-off
3.Hot-key mitigation through automatic counter sharding is essential - a single large tenant should not degrade the experience for others
4.Graceful degradation must be per-policy configurable: billing quotas should fail closed while abuse prevention can fail open
5.Cross-region sync lag means global quotas are inherently approximate - design for 1-5% overshoot tolerance rather than exact enforcement
6.Rate limiting is on the critical path of every API request - the rate limiter itself must never become the bottleneck or single point of failure

If I Had More Time

•Implement adaptive quotas that learn from traffic patterns and automatically adjust limits based on usage trends
•Add fairness controls using weighted fair queuing to prevent quota starvation among sub-limits
•Build a self-service quota management portal where customers can view usage, request increases, and configure alerts
•Implement rate limit simulation that predicts the impact of policy changes before deployment
•Add cost-based rate limiting where different endpoints consume different amounts of quota based on resource usage

0/10

advancedrate-limitingapimulti-regionredisconsistency

API Rate-Limiting as a Multi-Region Service

Design a globally consistent rate limiting service with low latency and multi-region enforcement.

What You'll Learn

•Token bucket and sliding window log algorithms: implementation trade-offs, burst handling, and precision guarantees
•Distributed counter synchronization using gossip and windowed CRDT consumed counters for bounded global quota drift
•Edge-cached policy evaluation with push-based invalidation for sub-5ms decision latency
•Hot-key mitigation through counter sharding, local pre-aggregation, and adaptive splitting
•Lua scripting in Redis for atomic token bucket operations (check-and-decrement in single round-trip)
•Graceful degradation strategies: local-only mode, fail-open vs fail-closed policies during network partitions
•Multi-tier rate limiting: per-user, per-API-key, per-IP, per-organization with hierarchical quota inheritance

Interview Simulation

Run a timed mock interview for this project and get a scored debrief.

Quick Context

Problem

Constraints & Assumptions8 items

•
Handle 2 million rate checks per second globally with p99 decision latency under 5ms
Why?
•
Enforce global quotas across 8 regions with at most 1% overshoot during normal operation
Why?
•
Support 1 million active rate limit policies with hierarchical inheritance (user < org < global)
Why?
•
Propagate policy changes (especially kill switches) to all regions within 100ms
Why?

Key Numbers(hover for details)

2M/sec

Rate Checks

Peak global check throughput across all regions

p99 <5ms

Decision Latency

End-to-end rate check including policy lookup

Policies

Active rate limit policies across all tenants

100M

Counters

Active counter keys in memory across all regions

99.99%

Availability

Rate limiting service uptime SLA

Regions

Geographic deployment regions

<2s

Sync Lag

Cross-region counter synchronization delay

99%

Accuracy

Global quota enforcement accuracy (1% overshoot tolerance)

Requirements

Multi-Algorithm Support

Support token bucket (with configurable burst), sliding window log (exact counting), sliding window counter (approximate but memory-efficient), and fixed window algorithms

Why it matters: Different use cases need different precision/performance trade-offs - API quotas need burst tolerance while abuse prevention needs exact counting

Hierarchical Rate Limits

Enforce rate limits at multiple levels: per-user, per-API-key, per-IP, per-organization, and global, with configurable inheritance and override rules

Why it matters: Enterprise customers need organization-level quotas that subdivide among their users while respecting global platform limits

Global Quota Enforcement

Synchronize counters across all 8 regions so that a tenant consuming quota in US-East cannot exceed their global limit by also consuming in EU-West

Why it matters: Without cross-region sync, a tenant with a 1000/min limit could effectively get 8000/min by distributing requests across regions

Burst Control

Allow configurable burst sizes above the sustained rate, with automatic recovery based on token replenishment rate

Why it matters: Legitimate traffic is bursty - denying short spikes degrades user experience and causes unnecessary retry storms

Policy Management API

CRUD operations for rate limit policies with versioning, audit trail, approval workflows, and bulk import/export

Why it matters: Operations teams need to quickly adjust limits during incidents or onboard new customers with custom quotas

Response Headers

Return standard rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After) on every response

Why it matters: Clients need visibility into their quota usage to implement proper backoff and retry logic

Override & Bypass

Support per-request bypass tokens for internal services and temporary limit overrides for incident response

Why it matters: Internal health checks and monitoring should not count against quotas; incident response may need temporary limit increases

Architecture Evolution

Show deltas

Single-region rate limiter with Redis backend. Token bucket implemented via Lua script for atomic check-and-decrement. Suitable for <100K checks/sec with single-region deployment.

Click a component for details, or run Trace Flow to inspect end-to-end movement.

Legend

Gateway

Checker

Counter Store

Policy

Audit

What Changed & Why

•Single Redis instance with token bucket Lua script
•Policy stored in PostgreSQL, cached in application memory
•Fixed window counters as fallback for simple limits
•Standard rate limit response headers on every response
•Basic admin API for CRUD on policies