SystemDesign Pro
ProjectsPathsKnowledgebaseAbout
PrivacyTermsRefundsCookiesContact
© 2026 SystemDesign Pro. All rights reserved.
0/10
intermediateurl-shortenercachingedgeid-generationanalyticsdistributed-systems

URL Shortener at Global Scale

Design a globally distributed URL shortener that handles billions of redirects per day with low latency, abuse controls, and reliable analytics.

What You'll Learn

  • •How to split create-link control plane from ultra-low-latency redirect data plane
  • •ID generation strategies for short-code uniqueness without central bottlenecks
  • •Cache hierarchy design: edge cache, regional cache, and metadata fallback
  • •Negative caching and tombstones to enforce deletions and abuse takedowns
  • •Asynchronous click analytics ingestion with backpressure and sampling
  • •Partitioning and indexing patterns for high-cardinality link metadata
  • •Safety patterns for custom alias collisions and idempotent create requests
  • •Cost modeling for compute, storage, and cross-region traffic

Interview Simulation

Run a timed mock interview for this project and get a scored debrief.

Quick Context

Problem

A URL shortener is mostly a read-heavy redirect service with strict latency requirements and untrusted input. The system must map short codes to long URLs, support custom aliases, enforce abuse policies, and propagate deletes quickly. Unlike toy designs, production systems cannot put analytics on the critical path, cannot rely on a single sequence generator, and cannot tolerate stale or deleted links being served for long periods. The main challenge is preserving low-latency global redirects while maintaining correctness, safety, and predictable operational costs.

Constraints & Assumptions8 items
  • •
    Serve 12,000,000,000 redirects/day with peak 180,000 req/s
    Why?
  • •
    Keep redirect path p99 under 20ms globally and p95 under 12ms in-region
    Why?
  • •
    Support 70M+ link creations/day with idempotent create semantics
    Why?
  • •
    Guarantee short-code uniqueness and custom alias collision detection

Key Numbers(hover for details)

12B/day
Redirect Volume
Average global redirect requests
180K/s
Peak Redirects
Peak read-path throughput
70M/day
Create Throughput
New shortened links created
p99 <20ms
Latency
Global redirect latency target
99.99%
Availability
Redirect service objective
8
Regions
Active production regions
160K/s
Event Ingest
Click analytics event throughput
$165K
Monthly Cost
Production baseline for global setup

Requirements

Create Short URL

Create a short code for a long URL with optional expiration, tags, and tenant metadata.

Why it matters: Core feature for product value and API integration.

Custom Alias

Allow users to request custom aliases with uniqueness checks and reserved-word policies.

Why it matters: Enterprise and marketing workflows depend on branded URLs.

Low-Latency Redirect

Resolve short code and issue HTTP redirect quickly from edge locations.

Why it matters: Redirect path is the user-facing SLO-critical workflow.

Link Update and Disable

Update destination URL, set expiry, disable, or hard-delete links safely.

Why it matters: Operations and compliance require lifecycle controls.

Click Analytics

Capture click events with geo/device/referrer metadata and provide aggregates.

Why it matters: Analytics is a key monetization and customer retention feature.

Abuse Prevention

Block malicious domains and enforce per-tenant rate limits and quota policies.

Why it matters: Prevents platform abuse and protects sender reputation.

Idempotent Create API

Support idempotency keys so retries do not create duplicate short links.

Why it matters: Client retries are common under network turbulence.

Bulk Link Create

Allow batch creation jobs for campaign pipelines with status tracking.

Why it matters: Large customers need operational throughput beyond single-item calls.

Architecture Evolution

Single-region API service with PostgreSQL and Redis cache. Suitable for 100-1,000 users and 2K-5K redirects/sec at $120-400/month.

Click any component to inspect details, or use Trace Flow to animate redirect and write paths.

Press enter or space to select a node.You can then use the arrow keys to move the node around. Press delete to remove it and escape to cancel.
Press enter or space to select an edge. You can then press delete to remove it or escape to cancel.

Legend

Clients
API
Cache
Database

What Changed & Why

  • •One API service handles create and redirect requests
  • •Redis cache for hot short-code lookups
  • •PostgreSQL stores link metadata and click counters
  • •Background worker computes simple daily analytics
  • •Basic rate limiting per API key

Key Decisions

5 decisions
ID Generation: DB Sequence vs Snowflake vs Random Tokens3 alternatives

Snowflake-style 64-bit IDs encoded to Base62 short codes

Redirect Path: Direct DB Reads vs Cache-First Edge3 alternatives

Edge redirect service with regional cache-first lookup

Delete Semantics: Hard Delete vs Tombstones3 alternatives

Tombstone-first delete propagation with delayed hard purge

Analytics Path: Inline Write vs Async Stream3 alternatives

Async event stream with sampled click events and rollups

Custom Aliases: Immediate Reservation vs Best-Effort Upsert3 alternatives

Transactional alias reservation with unique index + idempotency key

API Design

The API separates redirect serving from management operations. Redirect endpoint is unauthenticated but protected by abuse, bot, and rate controls. Management APIs require OAuth2/JWT and tenant scopes.

Base URL

https://api.shortly.dev/v1

Authentication

OAuth2 Bearer Token + HMAC key for server-to-server bulk jobs

Management endpoints require bearer tokens with tenant scopes (`links:write`, `links:read`, `analytics:read`). Bulk imports can use signed HMAC jobs for high-throughput ingestion.

Endpoints

POST/linksCreate a short link
POST/links/bulkCreate links in bulk
GET/links/{shortCode}/analyticsRead analytics aggregates
DELETE/links/{shortCode}Disable or delete link

Webhooks

EVENT
link.deleted

Triggered when a link is disabled or hard-deleted.

Payload

{
  "event": "link.deleted",
  "tenantId": "tnt_42",
  "shortCode": "b9Q4kT",
  "deletedAt": "2026-02-08T11:07:22Z",
  "actor": "user_91"
}
EVENT
abuse.detected

Triggered when destination or behavior exceeds abuse thresholds.

Payload

{
  "event": "abuse.detected",
  "shortCode": "x2ab91",
  "riskScore": 0.93,
  "action": "auto_disabled",
  "reason": "malware_domain"
}

Code Samples

Snowflake ID to Base62 Short CodeProduction

TypeScript utility converting monotonically increasing 64-bit IDs to compact Base62 codes.

Data Model & Queries

Schema
SQL
-- URL Shortener Core Schema (PostgreSQL)

CREATE TABLE tenants (
  tenant_id UUID PRIMARY KEY,
  name VARCHAR(128) NOT NULL,
  plan VARCHAR(32) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE links (
  link_id BIGINT PRIMARY KEY,
  tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),
  short_code VARCHAR(16) NOT NULL,
  long_url TEXT NOT NULL,
  long_url_hash BYTEA NOT NULL,
  canonical_domain VARCHAR(255) NOT NULL,
  status VARCHAR(24) NOT NULL CHECK (status IN ('active', 'disabled', 'deleted', 'expired')),
  is_custom_alias BOOLEAN NOT NULL DEFAULT FALSE,
  created_by UUID,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  expires_at TIMESTAMPTZ,
  deleted_at TIMESTAMPTZ,
  metadata JSONB NOT NULL DEFAULT '{}',
  UNIQUE (tenant_id, short_code)
);

CREATE TABLE alias_claims (
  tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),
  alias VARCHAR(64) NOT NULL,
  short_code VARCHAR(16) NOT NULL,
  claimed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (tenant_id, alias)
);

CREATE TABLE idempotency_keys (
  tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),
  idempotency_key UUID NOT NULL,
  request_hash BYTEA NOT NULL,
  status VARCHAR(16) NOT NULL CHECK (status IN ('in_progress', 'completed', 'failed')),
  response_payload JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  completed_at TIMESTAMPTZ,
  last_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (tenant_id, idempotency_key)
);

CREATE TABLE deletion_tombstones (
  tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),
  short_code VARCHAR(16) NOT NULL,
  tombstone_version BIGINT NOT NULL,
  reason VARCHAR(64) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (tenant_id, short_code)
);

-- High-volume click stream landing table (partition by day)
CREATE TABLE click_events (
  event_id UUID NOT NULL,
  tenant_id UUID NOT NULL,
  short_code VARCHAR(16) NOT NULL,
  clicked_at TIMESTAMPTZ NOT NULL,
  country_code CHAR(2),
  device_type VARCHAR(24),
  referrer_domain VARCHAR(255),
  ip_hash BYTEA,
  user_agent_hash BYTEA,
  edge_region VARCHAR(24),
  PRIMARY KEY (event_id, clicked_at)
) PARTITION BY RANGE (clicked_at);

CREATE TABLE click_events_2026_02 PARTITION OF click_events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

CREATE TABLE daily_link_metrics (
  tenant_id UUID NOT NULL,
  short_code VARCHAR(16) NOT NULL,
  metric_date DATE NOT NULL,
  country_code CHAR(2) NOT NULL DEFAULT 'ZZ',
  device_type VARCHAR(24) NOT NULL DEFAULT 'unknown',
  clicks BIGINT NOT NULL,
  uniques BIGINT,
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (tenant_id, short_code, metric_date, country_code, device_type)
);

CREATE TABLE blocked_domains (
  domain VARCHAR(255) PRIMARY KEY,
  risk_score NUMERIC(4,3) NOT NULL,
  source VARCHAR(64) NOT NULL,
  active BOOLEAN NOT NULL DEFAULT TRUE,
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_links_tenant_status_created
  ON links (tenant_id, status, created_at DESC);

CREATE INDEX idx_links_active_lookup
  ON links (tenant_id, short_code)
  WHERE status = 'active';

CREATE INDEX idx_links_expires_at
  ON links (expires_at)
  WHERE expires_at IS NOT NULL;

CREATE INDEX idx_alias_claims_short_code
  ON alias_claims (short_code);

CREATE INDEX idx_idempotency_last_seen
  ON idempotency_keys (tenant_id, last_seen_at DESC);

CREATE INDEX idx_tombstones_version
  ON deletion_tombstones (tombstone_version DESC);

CREATE INDEX idx_click_events_tenant_code_time
  ON click_events (tenant_id, short_code, clicked_at DESC);

CREATE INDEX idx_click_events_country_time
  ON click_events (country_code, clicked_at DESC);

CREATE INDEX idx_daily_metrics_date
  ON daily_link_metrics (metric_date DESC, tenant_id);
Why This Schema
  • •Separates link metadata from click event firehose so redirect lookups stay lightweight.
  • •Uses dedicated tombstone table to enforce deletion semantics across eventually consistent caches.
  • •Stores idempotency keys explicitly to guarantee safe client retries.
  • •Keeps alias claims isolated to simplify uniqueness and conflict management.
  • •Partitions click events by time for retention, pruning, and low-cost scans.
  • •Pre-aggregates daily metrics for dashboard performance and cost control.
Common Queries

Resolve active link for redirect path

SQL
EXPLAIN ANALYZE
SELECT long_url, expires_at
FROM links
WHERE tenant_id = $1
  AND short_code = $2
  AND status = 'active'
LIMIT 1;

Find links nearing expiration in next 24 hours

SQL
EXPLAIN ANALYZE
SELECT short_code, expires_at
FROM links
WHERE tenant_id = $1
  AND status = 'active'
  AND expires_at IS NOT NULL
  AND expires_at < NOW() + INTERVAL '24 hours'
ORDER BY expires_at ASC
LIMIT 2000;

Top links by clicks in past 7 days

SQL
EXPLAIN ANALYZE
SELECT short_code, SUM(clicks) AS total_clicks
FROM daily_link_metrics
WHERE tenant_id = $1
  AND metric_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY short_code
ORDER BY total_clicks DESC
LIMIT 100;

Identify unresolved tombstone propagation gaps

SQL
EXPLAIN ANALYZE
SELECT t.short_code, t.tombstone_version, t.created_at
FROM deletion_tombstones t
LEFT JOIN links l
  ON l.tenant_id = t.tenant_id
 AND l.short_code = t.short_code
WHERE t.tenant_id = $1
  AND (l.short_code IS NULL OR l.status <> 'deleted')
ORDER BY t.tombstone_version DESC
LIMIT 500;
Index Rationale
idx_links_active_lookup (tenant_id, short_code) WHERE status='active'Fast redirect path lookup for active links without scanning inactive rows.
idx_links_tenant_status_created (tenant_id, status, created_at DESC)Supports management UI and latest-link listing per tenant.
idx_tombstones_version (tombstone_version DESC)Efficient replication/invalidation catch-up by version.
idx_click_events_tenant_code_time (tenant_id, short_code, clicked_at DESC)Optimizes time-range analytics for a specific link.
idx_daily_metrics_date (metric_date DESC, tenant_id)Accelerates dashboard windows and billing-period summaries.

Scaling & Bottlenecks

Now

Primary Bottleneck

Regional cache miss spikes during campaign traffic and failover events

Mitigation

Increase hot-key replication and introduce adaptive prewarming for scheduled campaigns.

What You'd Change

  • •Maintain top-N global hot set in every region
  • •Prewarm campaign aliases based on scheduled launch windows
  • •Add cache miss circuit-breakers to protect metadata store
2× Scale

Primary Bottleneck

Metadata read replica lag and analytics consumer backlog

Mitigation

Split metadata read/write pools and scale stream partitions with tenant-aware balancing.

What You'd Change

  • •Promote read replica pools by tenant tier
  • •Increase stream partitions and isolate high-volume tenants
  • •Use pre-aggregated hourly rollups to reduce query scan volume
10× Scale

Primary Bottleneck

Cross-region replication egress and global invalidation consistency

Mitigation

Move to regional ownership cells with selective replication and compacted invalidation topics.

What You'd Change

  • •Adopt per-tenant region ownership and bounded-staleness failover
  • •Replicate control/tombstone state globally; keep raw events region-local
  • •Deploy consistency watchdog jobs for cache-vs-store parity

Failure Scenarios

Monitoring & Observability

Monitoring must separately track redirect SLO, metadata health, invalidation correctness, abuse pipeline latency, and analytics freshness. The key anti-pattern is relying only on API latency without observing cache parity and deletion propagation.

Key Metrics
redirect_p99_mshistogram

P99 end-to-end redirect latency from edge POP.

Good: <20ms
Warning: 20-35ms
Critical: >35ms
cache_hit_rate_pctgauge

Regional cache hit rate for redirect lookups.

Good: >95%
Warning: 90-95%
Critical: <90%
metadata_read_qpsgauge

Read QPS against metadata store from cache misses.

Good: <8K
Warning: 8K-15K
Critical: >15K
metadata_replica_lag_mshistogram

Replication lag for metadata read replicas.

Good: <800ms
Warning: 800-2000ms
Critical: >2000ms
tombstone_replication_lag_secondsgauge

Delay for deletion/takedown propagation across regions.

Good: <5s
Warning: 5-10s
Critical: >10s
create_api_p99_mshistogram

P99 latency of create-link control-plane API.

Good: <90ms
Warning: 90-180ms
Critical: >180ms
abuse_check_p95_mshistogram

Latency for domain safety/abuse checks in create path.

Good: <40ms
Warning: 40-120ms
Critical: >120ms
click_stream_lag_secondsgauge

Consumer lag of click analytics stream.

Good: <15s
Warning: 15-90s
Critical: >90s
analytics_freshness_delay_secondsgauge

Delay between click ingest and dashboard visibility.

Good: <60s
Warning: 60-300s
Critical: >300s
alias_conflict_rate_pctgauge

Custom alias conflict ratio over create attempts.

Good: <2%
Warning: 2-6%
Critical: >6%
Alert Rules
RedirectLatencySLOBreachcritical

Redirect latency SLO is violated globally or in one major region.

redirect_p99_ms > 35 for 10m

Runbook: Inspect cache hit drop, DB miss amplification, and edge saturation. Activate hotset prewarm + traffic shedding if needed.

CacheHitCollapsecritical

Cache hit rate collapse likely causing metadata overload.

cache_hit_rate_pct < 90 for 5m

Runbook: Check recent invalidation events, shard health, and failover state. Temporarily extend TTL for top stable keys.

DeletePropagationLagcritical

Deleted links may still be resolvable in some regions.

tombstone_replication_lag_seconds > 10 for 3m

Runbook: Force global invalidate for affected prefixes, replay invalidation topic, and run parity check job.

AnalyticsStalenesswarning

Dashboard freshness degraded; customers may see delayed analytics.

analytics_freshness_delay_seconds > 300 for 15m

Runbook: Scale stream consumers, inspect partition skew, and shift heavy tenants to dedicated partitions.

AbuseDependencyDegradedwarning

Create path safety checks are slow and risk API degradation.

abuse_check_p95_ms > 120 for 10m

Runbook: Enable cached risk fallback, throttle bulk create traffic, and investigate upstream threat intel service.

ReplicaLagHighcritical

Metadata replicas too stale for safe redirect miss handling.

metadata_replica_lag_ms > 2000 for 10m

Runbook: Reduce miss pressure, promote healthier replica, and tune replication bandwidth/QoS.

Dashboard Layout

Redirect SLO

redirect_p99_mscache_hit_rate_pctmetadata_read_qps

Consistency & Safety

tombstone_replication_lag_secondsmetadata_replica_lag_msabuse_check_p95_ms

Control Plane

create_api_p99_msalias_conflict_rate_pct

Analytics Pipeline

click_stream_lag_secondsanalytics_freshness_delay_seconds

Scale Calculator

Estimate redirect capacity, storage growth, event pipeline load, and monthly compute/storage/network cost for a multi-region URL shortener.

Configuration
12.00B req/day
1.00M req/day200.00B req/day
70.00M links/day
10.00K links/day2.00B links/day
320.00 bytes
64.00 bytes4.10K bytes
96.00 %
50.00 %99.90 %
8.00 regions
1.00 regions40.00 regions
365.00 days
30.00 days3.65K days
85.00 %
1.00 %100.00 %
3.00 copies
1.00 copies6.00 copies
1.80K req/s/node
200.00 req/s/node10.00K req/s/node
1.50K bytes
200.00 bytes10.00K bytes
Calculated Results
Average Redirect Throughput
Average redirects per second.
138.89Kreq/s
Metadata DB Read QPS
Redirect reads that bypass cache and hit metadata store.
5.56K qps
Redirect Serving Nodes
Required nodes for average load with no headroom applied.
78
Analytics Event Throughput
Events/sec emitted into stream after sampling.
118.06Kevents/s
Analytics Ingest Volume
Raw event ingest into analytics pipeline each month.
61.23TB/month
Metadata Storage
Primary metadata + tombstone footprint (before replicas).
12.55 TB
Replicated Metadata Storage
Metadata footprint after applying replication factor.
37.64 TB
Redirect Egress Volume
Monthly redirect response egress.
491.13TB/month
Stream Consumer Instances
Approximate analytics consumer instances for current volume.
5
Monthly Compute Cost
Edge, API, stream, and control-plane compute cost.
$67,110
Monthly Storage Cost
Metadata + analytics storage cost estimate.
$5,682
Monthly Network Cost
Estimated redirect and replication network spend.
$25,146
Total Monthly Cost
All-in estimated monthly infrastructure cost.
$97,938
Cost per 1M Redirects
Normalized cost efficiency metric for redirect traffic.
0.27USD/1M
Estimated Monthly Cost (AWS)
$97,938/month
Monthly Compute Cost$67,110
Monthly Storage Cost$5,682
Monthly Network Cost$25,146

* Estimates based on simplified AWS pricing. Actual costs may vary.

Cost & Capacity

Traffic Estimates
Redirect Reads
Global redirect request throughput
138.9K/s avg
Peak Redirects
Burst throughput at campaign peaks
180K/s
Create Writes
Average create requests per second
810/s avg
Click Events
Sampled analytics events entering stream
160K/s
Cache Miss Reads
Metadata reads at 96% cache hit rate
~5.6K/s
Storage Estimates
Link Metadata
365-day active + disabled metadata footprint
52 TB
Tombstones
Delete/takedown records with 180-day retention
2.8 TB
Raw Click Events
Before compaction and rollup pruning
520 TB/month
Aggregated Analytics
Daily/hourly aggregate tables
38 TB/month

Test Your Understanding

Knowledge Check
Test your understanding of low-latency redirects, correctness semantics, and scale trade-offs.
3

Failure Diagnosis

3

Architecture Decisions

Summary & Takeaways

Key Takeaways
  • 1.The core challenge is keeping redirect data plane fast while control and analytics planes evolve independently.
  • 2.Snowflake-style IDs + Base62 give high-throughput uniqueness without global DB contention.
  • 3.Deletion correctness requires tombstones and replication parity checks, not just cache TTL expiration.
  • 4.Analytics must be asynchronous and sampled to preserve redirect SLO and cost control.
  • 5.Negative caching and hotset prewarming are practical defenses against miss storms.
  • 6.At scale, network and analytics storage often dominate cost over raw compute.
If I Had More Time
  • •Add tenant-level routing policies and geo-fencing for data residency requirements.
  • •Implement online abuse model with feature store and near-real-time scoring.
  • •Introduce active-active metadata conflict resolution with CRDT-inspired merge policy.
  • •Build dynamic cache TTL policy based on link popularity decay curves.
  • •Add campaign launch scheduler that prewarms caches based on expected traffic envelopes.