Why Rate Limiting Is Hard
The naive approach — count requests in a fixed window — has a thundering herd problem. 100 requests allowed per minute means someone can fire 100 at 00:59 and 100 at 01:01, getting 200 requests in 2 seconds.
Token Bucket
A bucket holds N tokens. Each request consumes one token. Tokens refill at a fixed rate. Burst traffic is absorbed up to bucket capacity, then throttled.
class TokenBucket {
private tokens: number
private lastRefill: number
constructor(private capacity: number, private refillRate: number) {
this.tokens = capacity
this.lastRefill = Date.now()
}
consume(): boolean {
this.refill()
if (this.tokens < 1) return false
this.tokens -= 1
return true
}
private refill() {
const now = Date.now()
const elapsed = (now - this.lastRefill) / 1000
this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate)
this.lastRefill = now
}
}
Sliding Window
Tracks exact timestamps of recent requests. Accurate but memory-heavy at scale — each user needs a sorted set of timestamps.
What Real Systems Use
Redis uses token bucket for its own rate limiting. Nginx uses leaky bucket. Stripe uses a sliding window log for billing API limits.