API Rate Limiting Implementation Guide

Implement Token Bucket and Sliding Window algorithms with Redis, and apply rate limiting in Express middleware and Nginx for production API protection

API rate limitingrate limiter implementationToken Bucket algorithmSliding WindowRedis rate limiterExpress middlewareAPI protectionDDoS defense

Problem

You are running a public REST API service, and a specific user is sending hundreds of requests per second, overloading the server. Free tier users should be limited to 60 requests per minute, and paid users to 1000 requests per minute. A simple counter approach has a burst problem at time boundaries, requiring a more sophisticated algorithm. You also need to implement distributed rate limiting using Redis and provide remaining quota information to clients via response headers.

Required Tools

Redis

An in-memory data store. Manages rate limiting counters and timestamps atomically, ensuring consistency in distributed environments.

Express.js

Node.js web framework. Insert a rate limiter into the middleware chain to block requests preemptively.

Nginx

Apply rate limiting at the reverse proxy level to throttle requests before they reach the application server.

Lua Script (Redis)

Lua scripts that ensure atomic operations in Redis. Implement race-condition-free rate limiting logic.

Solution Steps

Understanding Fixed Window Counter (basic approach)

The simplest rate limiting approach: increment a counter for each fixed time window (e.g., 1 minute) and block when the limit is exceeded. Simple to implement, but has the drawback of allowing double the requests at window boundaries (boundary burst problem). Understanding this limitation provides the foundation for advancing to Sliding Window.

import Redis from 'ioredis';

const redis = new Redis();

// Fixed Window Counter implementation
async function fixedWindowRateLimit(
  key: string,
  limit: number,
  windowSec: number
): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
  const now = Math.floor(Date.now() / 1000);
  const windowKey = `ratelimit:${key}:${Math.floor(now / windowSec)}`;

  // Atomic execution with MULTI
  const pipeline = redis.pipeline();
  pipeline.incr(windowKey);
  pipeline.ttl(windowKey);
  const results = await pipeline.exec();

  const count = results![0][1] as number;
  const ttl = results![1][1] as number;

  // Set expiration on first request
  if (ttl === -1) {
    await redis.expire(windowKey, windowSec);
  }

  const resetAt = (Math.floor(now / windowSec) + 1) * windowSec;

  return {
    allowed: count <= limit,
    remaining: Math.max(0, limit - count),
    resetAt,
  };
}

// Usage example
const result = await fixedWindowRateLimit('user:123', 60, 60);
// { allowed: true, remaining: 59, resetAt: 1705282260 }

Implement Sliding Window Log algorithm

Store each request's timestamp in a Redis Sorted Set, remove timestamps outside the window, then count. This solves the Fixed Window boundary burst problem, but uses more memory as it stores timestamps per request. Execute ZRANGEBYSCORE and ZCARD atomically using a Redis Lua script.

// Sliding Window Log - Redis Lua Script
const SLIDING_WINDOW_SCRIPT = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])

-- Remove old requests outside the window
redis.call('ZREMRANGEBYSCORE', key, 0, now - window * 1000)

-- Count requests in current window
local count = redis.call('ZCARD', key)

if count < limit then
  -- Allowed: add current timestamp
  redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
  redis.call('EXPIRE', key, window)
  return {1, limit - count - 1}
else
  -- Rejected
  return {0, 0}
end
`;

async function slidingWindowLog(
  userId: string,
  limit: number,
  windowSec: number
): Promise<{ allowed: boolean; remaining: number }> {
  const key = `ratelimit:swl:${userId}`;
  const now = Date.now();

  const result = await redis.eval(
    SLIDING_WINDOW_SCRIPT,
    1,
    key,
    now.toString(),
    windowSec.toString(),
    limit.toString()
  ) as [number, number];

  return {
    allowed: result[0] === 1,
    remaining: result[1],
  };
}

Implement Token Bucket algorithm

Token Bucket refills tokens at a constant rate, and each request consumes a token. It allows burst traffic while limiting the average rate, making it the most widely used algorithm for API rate limiting. Handle token refill and consumption atomically with a Redis Lua script.

// Token Bucket - Redis Lua Script
const TOKEN_BUCKET_SCRIPT = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])       -- tokens refilled per second
local capacity = tonumber(ARGV[3])   -- max bucket capacity
local requested = tonumber(ARGV[4])  -- tokens to consume

-- Query current bucket state
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

-- Refill tokens based on elapsed time
local elapsed = math.max(0, now - last_refill) / 1000
local new_tokens = math.min(capacity, tokens + elapsed * rate)

-- Check if tokens are available
local allowed = 0
local remaining = new_tokens

if new_tokens >= requested then
  new_tokens = new_tokens - requested
  allowed = 1
  remaining = new_tokens
end

-- Update state
redis.call('HMSET', key, 'tokens', new_tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / rate) * 2)

return {allowed, math.floor(remaining)}
`;

async function tokenBucket(
  userId: string,
  rate: number,
  capacity: number,
  tokensRequested = 1
): Promise<{ allowed: boolean; remaining: number }> {
  const key = `ratelimit:tb:${userId}`;
  const now = Date.now();

  const result = await redis.eval(
    TOKEN_BUCKET_SCRIPT, 1, key,
    now.toString(), rate.toString(),
    capacity.toString(), tokensRequested.toString()
  ) as [number, number];

  return { allowed: result[0] === 1, remaining: result[1] };
}

// Usage: 10 tokens/sec refill, 60 max bucket capacity
const result = await tokenBucket('user:123', 10, 60);

Apply Rate Limiter as Express middleware

Wrap the implemented Rate Limiter as Express middleware and apply it to routes. Include X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset response headers so clients can manage their quota. Implement the pattern of applying different limits based on user tiers.

import { Request, Response, NextFunction } from 'express';

// Rate limit config type
interface RateLimitConfig {
  windowSec: number;
  limit: number;
  keyGenerator?: (req: Request) => string;
}

// Tier-based default settings
const TIER_LIMITS: Record<string, RateLimitConfig> = {
  free:       { windowSec: 60, limit: 60 },
  basic:      { windowSec: 60, limit: 300 },
  pro:        { windowSec: 60, limit: 1000 },
  enterprise: { windowSec: 60, limit: 5000 },
};

// Rate Limiter middleware factory
function createRateLimiter(defaultConfig: RateLimitConfig) {
  return async (req: Request, res: Response, next: NextFunction) => {
    const apiKey = req.headers['x-api-key'] as string;
    const identifier = apiKey || req.ip || 'anonymous';

    const userTier = await getUserTier(apiKey);
    const config = TIER_LIMITS[userTier] || defaultConfig;

    const key = defaultConfig.keyGenerator
      ? defaultConfig.keyGenerator(req)
      : identifier;

    const result = await tokenBucket(key, config.limit / config.windowSec, config.limit);

    // Set rate limit headers
    res.set({
      'X-RateLimit-Limit': config.limit.toString(),
      'X-RateLimit-Remaining': result.remaining.toString(),
      'X-RateLimit-Reset': Math.ceil(Date.now() / 1000 + config.windowSec).toString(),
    });

    if (!result.allowed) {
      const retryAfter = Math.ceil(1 / (config.limit / config.windowSec));
      res.set('Retry-After', retryAfter.toString());
      return res.status(429).json({
        error: 'Too Many Requests',
        message: `Rate limit exceeded. Try again in ${retryAfter} seconds.`,
      });
    }

    next();
  };
}

// Apply to routes
const apiLimiter = createRateLimiter({ windowSec: 60, limit: 60 });
app.use('/api/', apiLimiter);

// Stricter limits for auth endpoints
const authLimiter = createRateLimiter({
  windowSec: 900,
  limit: 5,
  keyGenerator: (req) => `auth:${req.ip}`,
});
app.post('/api/auth/login', authLimiter, loginHandler);

Nginx-level Rate Limiting configuration

Applying rate limiting at the Nginx level before requests reach the application server significantly reduces server load. Define shared memory zones with limit_req_zone and apply them to specific locations with limit_req. Use burst and nodelay options to configure policies for sudden traffic spikes.

# /etc/nginx/nginx.conf (inside http block)

# Rate Limit Zone definitions
# $binary_remote_addr: client IP based (IPv4: 4 bytes, memory efficient)
# zone=api:10m: 10MB shared memory (~160,000 IP states)
# rate=10r/s: 10 requests per second limit

limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=auth_limit:5m rate=1r/s;

# API Key based Rate Limiting (extracted from header)
map $http_x_api_key $api_key_or_ip {
    default        $binary_remote_addr;
    "~.+"          $http_x_api_key;
}
limit_req_zone $api_key_or_ip zone=api_key_limit:20m rate=60r/m;

server {
    listen 443 ssl;
    server_name api.example.com;

    # General API endpoints
    location /api/ {
        # burst=20: queue up to 20 requests
        # nodelay: process immediately (within burst range)
        limit_req zone=api_limit burst=20 nodelay;
        limit_req_status 429;

        proxy_pass http://backend;
    }

    # Auth endpoints (stricter limits)
    location /api/auth/ {
        limit_req zone=auth_limit burst=5;
        limit_req_status 429;

        proxy_pass http://backend;
    }

    # Custom 429 error page
    error_page 429 = @rate_limited;
    location @rate_limited {
        default_type application/json;
        return 429 '{"error":"Too Many Requests","retry_after":1}';
    }
}

Algorithm comparison and monitoring setup

Compare algorithm characteristics to make the right choice for your situation, and monitor rate limiting behavior to tune thresholds. Track the 429 response ratio, per-user request patterns, and Redis memory usage.

// === Algorithm Comparison Table ===
//
// | Algorithm          | Accuracy | Memory    | Burst Allowed | Complexity |
// |--------------------|----------|-----------|---------------|------------|
// | Fixed Window       | Low      | Very low  | No            | Easy       |
// | Sliding Window Log | High     | High      | No            | Medium     |
// | Token Bucket       | High     | Low       | Yes           | Medium     |
// | Leaky Bucket       | High     | Low       | No            | Medium     |

// === Monitoring: Rate Limit metrics collection ===
import { Counter, Histogram } from 'prom-client';

const rateLimitCounter = new Counter({
  name: 'api_rate_limit_total',
  help: 'Total rate limit events',
  labelNames: ['status', 'tier', 'endpoint'],
});

const requestDuration = new Histogram({
  name: 'api_request_duration_seconds',
  help: 'API request duration',
  labelNames: ['method', 'path', 'status'],
});

// Record metrics in middleware
function rateLimitMetrics(req: Request, res: Response, next: NextFunction) {
  const end = requestDuration.startTimer();

  res.on('finish', () => {
    end({ method: req.method, path: req.route?.path || req.path, status: res.statusCode });

    if (res.statusCode === 429) {
      rateLimitCounter.inc({
        status: 'rejected',
        tier: (req as any).userTier || 'free',
        endpoint: req.route?.path || req.path,
      });
    }
  });

  next();
}

Core Code

Token Bucket Rate Limiter with atomicity guaranteed by Redis Lua script. 10 tokens/sec refill, 60 max capacity allows bursts while limiting average rate.

// === Core: Token Bucket Rate Limiter (Redis Lua) ===
const SCRIPT = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local capacity = tonumber(ARGV[3])

local bucket = redis.call('HMGET', key, 'tokens', 'ts')
local tokens = tonumber(bucket[1]) or capacity
local last = tonumber(bucket[2]) or now

-- Refill tokens
local elapsed = math.max(0, now - last) / 1000
tokens = math.min(capacity, tokens + elapsed * rate)

-- Consume token
if tokens >= 1 then
  tokens = tokens - 1
  redis.call('HMSET', key, 'tokens', tokens, 'ts', now)
  redis.call('EXPIRE', key, math.ceil(capacity / rate) * 2)
  return {1, math.floor(tokens)}
end

return {0, 0}
`;

// Express middleware
async function rateLimiter(req, res, next) {
  const key = req.headers['x-api-key'] || req.ip;
  const [allowed, remaining] = await redis.eval(SCRIPT, 1,
    `rl:${key}`, Date.now(), 10, 60);

  res.set('X-RateLimit-Remaining', remaining);
  if (!allowed) return res.status(429).json({ error: 'Too Many Requests' });
  next();
}

Common Mistakes

Executing Redis commands individually causing race conditions (GET -> compare -> SET gap allows interleaved requests)

Use Redis Lua scripts (EVAL) to ensure the entire script runs atomically. Alternatively, use MULTI/EXEC transactions.

Not setting TTL on rate limit keys causing unbounded Redis memory growth

Always set EXPIRE on all rate limit keys. Setting TTL to approximately twice the window size is a safe practice.

Not including Retry-After header in 429 responses causing clients to retry indefinitely

Always include a Retry-After header (in seconds) with 429 responses, along with X-RateLimit-Remaining and X-RateLimit-Reset headers so clients can proactively adjust their request rate.

Related liminfo Services

API Tester Redis Reference