Rate Limiting at Scale: Token Bucket vs Sliding Window

Why Rate Limiting Is Hard

The naive approach — count requests in a fixed window — has a thundering herd problem. 100 requests allowed per minute means someone can fire 100 at 00:59 and 100 at 01:01, getting 200 requests in 2 seconds.

Token Bucket

A bucket holds N tokens. Each request consumes one token. Tokens refill at a fixed rate. Burst traffic is absorbed up to bucket capacity, then throttled.

class TokenBucket {
  private tokens: number
  private lastRefill: number
  
  constructor(private capacity: number, private refillRate: number) {
    this.tokens = capacity
    this.lastRefill = Date.now()
  }
  
  consume(): boolean {
    this.refill()
    if (this.tokens < 1) return false
    this.tokens -= 1
    return true
  }
  
  private refill() {
    const now = Date.now()
    const elapsed = (now - this.lastRefill) / 1000
    this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate)
    this.lastRefill = now
  }
}

Sliding Window

Tracks exact timestamps of recent requests. Accurate but memory-heavy at scale — each user needs a sorted set of timestamps.

What Real Systems Use

Redis uses token bucket for its own rate limiting. Nginx uses leaky bucket. Stripe uses a sliding window log for billing API limits.