0/10

intermediateurl-shortenercachingedgeid-generationanalyticsdistributed-systems

URL Shortener at Global Scale

Design a globally distributed URL shortener that handles billions of redirects per day with low latency, abuse controls, and reliable analytics.

What You'll Learn

•How to split create-link control plane from ultra-low-latency redirect data plane
•ID generation strategies for short-code uniqueness without central bottlenecks
•Cache hierarchy design: edge cache, regional cache, and metadata fallback
•Negative caching and tombstones to enforce deletions and abuse takedowns
•Asynchronous click analytics ingestion with backpressure and sampling
•Partitioning and indexing patterns for high-cardinality link metadata
•Safety patterns for custom alias collisions and idempotent create requests
•Cost modeling for compute, storage, and cross-region traffic

Interview Simulation

Run a timed mock interview for this project and get a scored debrief.

Quick Context

Problem

A URL shortener is mostly a read-heavy redirect service with strict latency requirements and untrusted input. The system must map short codes to long URLs, support custom aliases, enforce abuse policies, and propagate deletes quickly. Unlike toy designs, production systems cannot put analytics on the critical path, cannot rely on a single sequence generator, and cannot tolerate stale or deleted links being served for long periods. The main challenge is preserving low-latency global redirects while maintaining correctness, safety, and predictable operational costs.

Constraints & Assumptions8 items

•
Serve 12,000,000,000 redirects/day with peak 180,000 req/s
Why?
•
Keep redirect path p99 under 20ms globally and p95 under 12ms in-region
Why?
•
Support 70M+ link creations/day with idempotent create semantics
Why?
•
Guarantee short-code uniqueness and custom alias collision detection

Key Numbers(hover for details)

12B/day

Redirect Volume

Average global redirect requests

180K/s

Peak Redirects

Peak read-path throughput

70M/day

Create Throughput

New shortened links created

p99 <20ms

Latency

Global redirect latency target

99.99%

Availability

Redirect service objective

Regions

Active production regions

160K/s

Event Ingest

Click analytics event throughput

$165K

Monthly Cost

Production baseline for global setup

Requirements

Create Short URL

Create a short code for a long URL with optional expiration, tags, and tenant metadata.

Why it matters: Core feature for product value and API integration.

Custom Alias

Allow users to request custom aliases with uniqueness checks and reserved-word policies.

Why it matters: Enterprise and marketing workflows depend on branded URLs.

Low-Latency Redirect

Resolve short code and issue HTTP redirect quickly from edge locations.

Why it matters: Redirect path is the user-facing SLO-critical workflow.

Link Update and Disable

Update destination URL, set expiry, disable, or hard-delete links safely.

Why it matters: Operations and compliance require lifecycle controls.

Click Analytics

Capture click events with geo/device/referrer metadata and provide aggregates.

Why it matters: Analytics is a key monetization and customer retention feature.

Abuse Prevention

Block malicious domains and enforce per-tenant rate limits and quota policies.

Why it matters: Prevents platform abuse and protects sender reputation.

Idempotent Create API

Support idempotency keys so retries do not create duplicate short links.

Why it matters: Client retries are common under network turbulence.

Bulk Link Create

Allow batch creation jobs for campaign pipelines with status tracking.

Why it matters: Large customers need operational throughput beyond single-item calls.

Architecture Evolution

Show deltas

Single-region API service with PostgreSQL and Redis cache. Suitable for 100-1,000 users and 2K-5K redirects/sec at $120-400/month.

Click any component to inspect details, or use Trace Flow to animate redirect and write paths.

Legend

Clients

API

Cache

Database

What Changed & Why

•One API service handles create and redirect requests
•Redis cache for hot short-code lookups
•PostgreSQL stores link metadata and click counters
•Background worker computes simple daily analytics
•Basic rate limiting per API key

Key Decisions

5 decisions

ID Generation: DB Sequence vs Snowflake vs Random Tokens3 alternatives

Snowflake-style 64-bit IDs encoded to Base62 short codes

Redirect Path: Direct DB Reads vs Cache-First Edge3 alternatives

Edge redirect service with regional cache-first lookup

Delete Semantics: Hard Delete vs Tombstones3 alternatives

Tombstone-first delete propagation with delayed hard purge

Analytics Path: Inline Write vs Async Stream3 alternatives

Async event stream with sampled click events and rollups

Custom Aliases: Immediate Reservation vs Best-Effort Upsert3 alternatives

Transactional alias reservation with unique index + idempotency key

API Design

The API separates redirect serving from management operations. Redirect endpoint is unauthenticated but protected by abuse, bot, and rate controls. Management APIs require OAuth2/JWT and tenant scopes.

Base URL

https://api.shortly.dev/v1

Authentication

OAuth2 Bearer Token + HMAC key for server-to-server bulk jobs

Management endpoints require bearer tokens with tenant scopes (`links:write`, `links:read`, `analytics:read`). Bulk imports can use signed HMAC jobs for high-throughput ingestion.

Endpoints

POST/linksCreate a short link

POST/links/bulkCreate links in bulk

GET/links/{shortCode}/analyticsRead analytics aggregates

DELETE/links/{shortCode}Disable or delete link

Webhooks

EVENT

link.deleted

Triggered when a link is disabled or hard-deleted.

Payload

{
  "event": "link.deleted",
  "tenantId": "tnt_42",
  "shortCode": "b9Q4kT",
  "deletedAt": "2026-02-08T11:07:22Z",
  "actor": "user_91"
}

EVENT

abuse.detected

Triggered when destination or behavior exceeds abuse thresholds.

Payload

{
  "event": "abuse.detected",
  "shortCode": "x2ab91",
  "riskScore": 0.93,
  "action": "auto_disabled",
  "reason": "malware_domain"
}

Code Samples

Snowflake ID to Base62 Short CodeProduction

TypeScript utility converting monotonically increasing 64-bit IDs to compact Base62 codes.

Data Model & Queries

Schema

SQL

-- URL Shortener Core Schema (PostgreSQL)

CREATE TABLE tenants (
  tenant_id UUID PRIMARY KEY,
  name VARCHAR(128) NOT NULL,
  plan VARCHAR(32) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE links (
  link_id BIGINT PRIMARY KEY,
  tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),
  short_code VARCHAR(16) NOT NULL,
  long_url TEXT NOT NULL,
  long_url_hash BYTEA NOT NULL,
  canonical_domain VARCHAR(255) NOT NULL,
  status VARCHAR(24) NOT NULL CHECK (status IN ('active', 'disabled', 'deleted', 'expired')),
  is_custom_alias BOOLEAN NOT NULL DEFAULT FALSE,
  created_by UUID,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  expires_at TIMESTAMPTZ,
  deleted_at TIMESTAMPTZ,
  metadata JSONB NOT NULL DEFAULT '{}',
  UNIQUE (tenant_id, short_code)
);

CREATE TABLE alias_claims (
  tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),
  alias VARCHAR(64) NOT NULL,
  short_code VARCHAR(16) NOT NULL,
  claimed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (tenant_id, alias)
);

CREATE TABLE idempotency_keys (
  tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),
  idempotency_key UUID NOT NULL,
  request_hash BYTEA NOT NULL,
  status VARCHAR(16) NOT NULL CHECK (status IN ('in_progress', 'completed', 'failed')),
  response_payload JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  completed_at TIMESTAMPTZ,
  last_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (tenant_id, idempotency_key)
);

CREATE TABLE deletion_tombstones (
  tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),
  short_code VARCHAR(16) NOT NULL,
  tombstone_version BIGINT NOT NULL,
  reason VARCHAR(64) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (tenant_id, short_code)
);

-- High-volume click stream landing table (partition by day)
CREATE TABLE click_events (
  event_id UUID NOT NULL,
  tenant_id UUID NOT NULL,
  short_code VARCHAR(16) NOT NULL,
  clicked_at TIMESTAMPTZ NOT NULL,
  country_code CHAR(2),
  device_type VARCHAR(24),
  referrer_domain VARCHAR(255),
  ip_hash BYTEA,
  user_agent_hash BYTEA,
  edge_region VARCHAR(24),
  PRIMARY KEY (event_id, clicked_at)
) PARTITION BY RANGE (clicked_at);

CREATE TABLE click_events_2026_02 PARTITION OF click_events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

CREATE TABLE daily_link_metrics (
  tenant_id UUID NOT NULL,
  short_code VARCHAR(16) NOT NULL,
  metric_date DATE NOT NULL,
  country_code CHAR(2) NOT NULL DEFAULT 'ZZ',
  device_type VARCHAR(24) NOT NULL DEFAULT 'unknown',
  clicks BIGINT NOT NULL,
  uniques BIGINT,
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (tenant_id, short_code, metric_date, country_code, device_type)
);

CREATE TABLE blocked_domains (
  domain VARCHAR(255) PRIMARY KEY,
  risk_score NUMERIC(4,3) NOT NULL,
  source VARCHAR(64) NOT NULL,
  active BOOLEAN NOT NULL DEFAULT TRUE,
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_links_tenant_status_created
  ON links (tenant_id, status, created_at DESC);

CREATE INDEX idx_links_active_lookup
  ON links (tenant_id, short_code)
  WHERE status = 'active';

CREATE INDEX idx_links_expires_at
  ON links (expires_at)
  WHERE expires_at IS NOT NULL;

CREATE INDEX idx_alias_claims_short_code
  ON alias_claims (short_code);

CREATE INDEX idx_idempotency_last_seen
  ON idempotency_keys (tenant_id, last_seen_at DESC);

CREATE INDEX idx_tombstones_version
  ON deletion_tombstones (tombstone_version DESC);

CREATE INDEX idx_click_events_tenant_code_time
  ON click_events (tenant_id, short_code, clicked_at DESC);

CREATE INDEX idx_click_events_country_time
  ON click_events (country_code, clicked_at DESC);

CREATE INDEX idx_daily_metrics_date
  ON daily_link_metrics (metric_date DESC, tenant_id);

Why This Schema

•Separates link metadata from click event firehose so redirect lookups stay lightweight.
•Uses dedicated tombstone table to enforce deletion semantics across eventually consistent caches.
•Stores idempotency keys explicitly to guarantee safe client retries.
•Keeps alias claims isolated to simplify uniqueness and conflict management.
•Partitions click events by time for retention, pruning, and low-cost scans.
•Pre-aggregates daily metrics for dashboard performance and cost control.

Common Queries

Resolve active link for redirect path

SQL

EXPLAIN ANALYZE
SELECT long_url, expires_at
FROM links
WHERE tenant_id = $1
  AND short_code = $2
  AND status = 'active'
LIMIT 1;

Find links nearing expiration in next 24 hours

SQL

EXPLAIN ANALYZE
SELECT short_code, expires_at
FROM links
WHERE tenant_id = $1
  AND status = 'active'
  AND expires_at IS NOT NULL
  AND expires_at < NOW() + INTERVAL '24 hours'
ORDER BY expires_at ASC
LIMIT 2000;

Scaling & Bottlenecks

Now

Primary Bottleneck

Regional cache miss spikes during campaign traffic and failover events

Mitigation

Increase hot-key replication and introduce adaptive prewarming for scheduled campaigns.

What You'd Change

•Maintain top-N global hot set in every region
•Prewarm campaign aliases based on scheduled launch windows
•Add cache miss circuit-breakers to protect metadata store

2× Scale

Primary Bottleneck

Metadata read replica lag and analytics consumer backlog

Mitigation

Split metadata read/write pools and scale stream partitions with tenant-aware balancing.

What You'd Change

•Promote read replica pools by tenant tier
•Increase stream partitions and isolate high-volume tenants
•Use pre-aggregated hourly rollups to reduce query scan volume

10× Scale

Primary Bottleneck

Cross-region replication egress and global invalidation consistency

Mitigation

Move to regional ownership cells with selective replication and compacted invalidation topics.

What You'd Change

•Adopt per-tenant region ownership and bounded-staleness failover
•Replicate control/tombstone state globally; keep raw events region-local
•Deploy consistency watchdog jobs for cache-vs-store parity

Failure Scenarios

Monitoring & Observability

Monitoring must separately track redirect SLO, metadata health, invalidation correctness, abuse pipeline latency, and analytics freshness. The key anti-pattern is relying only on API latency without observing cache parity and deletion propagation.

Key Metrics

redirect_p99_mshistogram

P99 end-to-end redirect latency from edge POP.

Good: <20ms

Warning: 20-35ms

Critical: >35ms

cache_hit_rate_pctgauge

Regional cache hit rate for redirect lookups.

Good: >95%

Warning: 90-95%

Critical: <90%

metadata_read_qpsgauge

Read QPS against metadata store from cache misses.

Good: <8K

Warning: 8K-15K

Critical: >15K

metadata_replica_lag_mshistogram

Replication lag for metadata read replicas.

Good: <800ms

Warning: 800-2000ms

Critical: >2000ms

tombstone_replication_lag_secondsgauge

Delay for deletion/takedown propagation across regions.

Good: <5s

Warning: 5-10s

Critical: >10s

create_api_p99_mshistogram

P99 latency of create-link control-plane API.

Good: <90ms

Warning: 90-180ms

Critical: >180ms

abuse_check_p95_mshistogram

Latency for domain safety/abuse checks in create path.

Good: <40ms

Warning: 40-120ms

Critical: >120ms

click_stream_lag_secondsgauge

Consumer lag of click analytics stream.

Good: <15s

Warning: 15-90s

Critical: >90s

analytics_freshness_delay_secondsgauge

Delay between click ingest and dashboard visibility.

Good: <60s

Warning: 60-300s

Critical: >300s

alias_conflict_rate_pctgauge

Custom alias conflict ratio over create attempts.

Good: <2%

Warning: 2-6%

Critical: >6%

Alert Rules

RedirectLatencySLOBreachcritical

Redirect latency SLO is violated globally or in one major region.

redirect_p99_ms > 35 for 10m

Runbook: Inspect cache hit drop, DB miss amplification, and edge saturation. Activate hotset prewarm + traffic shedding if needed.

CacheHitCollapsecritical

Cache hit rate collapse likely causing metadata overload.

cache_hit_rate_pct < 90 for 5m

Runbook: Check recent invalidation events, shard health, and failover state. Temporarily extend TTL for top stable keys.

DeletePropagationLagcritical

Deleted links may still be resolvable in some regions.

tombstone_replication_lag_seconds > 10 for 3m

Runbook: Force global invalidate for affected prefixes, replay invalidation topic, and run parity check job.

AnalyticsStalenesswarning

Dashboard freshness degraded; customers may see delayed analytics.

analytics_freshness_delay_seconds > 300 for 15m

Runbook: Scale stream consumers, inspect partition skew, and shift heavy tenants to dedicated partitions.

AbuseDependencyDegradedwarning

Create path safety checks are slow and risk API degradation.

abuse_check_p95_ms > 120 for 10m

Runbook: Enable cached risk fallback, throttle bulk create traffic, and investigate upstream threat intel service.

ReplicaLagHighcritical

Metadata replicas too stale for safe redirect miss handling.

metadata_replica_lag_ms > 2000 for 10m

Runbook: Reduce miss pressure, promote healthier replica, and tune replication bandwidth/QoS.

Dashboard Layout

Redirect SLO

redirect_p99_mscache_hit_rate_pctmetadata_read_qps

Consistency & Safety

tombstone_replication_lag_secondsmetadata_replica_lag_msabuse_check_p95_ms

Control Plane

create_api_p99_msalias_conflict_rate_pct

Analytics Pipeline

click_stream_lag_secondsanalytics_freshness_delay_seconds

Scale Calculator

Estimate redirect capacity, storage growth, event pipeline load, and monthly compute/storage/network cost for a multi-region URL shortener.

Configuration

Redirects per Day12.00B req/day

1.00M req/day200.00B req/day

New Links per Day70.00M links/day

10.00K links/day2.00B links/day

Avg Long URL Size320.00 bytes

64.00 bytes4.10K bytes

Regional Cache Hit Rate96.00 %

50.00 %99.90 %

Active Regions8.00 regions

1.00 regions40.00 regions

Metadata Retention365.00 days

30.00 days3.65K days

Analytics Sampling85.00 %

1.00 %100.00 %

Metadata Replication Factor3.00 copies

1.00 copies6.00 copies

Redirect Node Capacity1.80K req/s/node

200.00 req/s/node10.00K req/s/node

Avg Redirect Egress1.50K bytes

200.00 bytes10.00K bytes

Calculated Results

Average Redirect Throughput

Average redirects per second.

138.89Kreq/s

Metadata DB Read QPS

Redirect reads that bypass cache and hit metadata store.

5.56K qps

Redirect Serving Nodes

Required nodes for average load with no headroom applied.

Analytics Event Throughput

Events/sec emitted into stream after sampling.

118.06Kevents/s

Analytics Ingest Volume

Raw event ingest into analytics pipeline each month.

61.23TB/month

Metadata Storage

Primary metadata + tombstone footprint (before replicas).

12.55 TB

Replicated Metadata Storage

Metadata footprint after applying replication factor.

37.64 TB

Redirect Egress Volume

Monthly redirect response egress.

491.13TB/month

Stream Consumer Instances

Approximate analytics consumer instances for current volume.

Monthly Compute Cost

Edge, API, stream, and control-plane compute cost.

$67,110

Monthly Storage Cost

Metadata + analytics storage cost estimate.

$5,682

Monthly Network Cost

Estimated redirect and replication network spend.

$25,146

Total Monthly Cost

All-in estimated monthly infrastructure cost.

$97,938

Cost per 1M Redirects

Normalized cost efficiency metric for redirect traffic.

0.27USD/1M

Estimated Monthly Cost (AWS)

$97,938/month

Monthly Compute Cost$67,110

Monthly Storage Cost$5,682

Monthly Network Cost$25,146

* Estimates based on simplified AWS pricing. Actual costs may vary.

Cost & Capacity

Traffic Estimates

Redirect Reads

Global redirect request throughput

138.9K/s avg

Peak Redirects

Burst throughput at campaign peaks

180K/s

Create Writes

Average create requests per second

810/s avg

Click Events

Sampled analytics events entering stream

160K/s

Cache Miss Reads

Metadata reads at 96% cache hit rate

~5.6K/s

Storage Estimates

Link Metadata

365-day active + disabled metadata footprint

52 TB

Tombstones

Delete/takedown records with 180-day retention

2.8 TB

Raw Click Events

Before compaction and rollup pruning

520 TB/month

Aggregated Analytics

Daily/hourly aggregate tables

38 TB/month

Test Your Understanding

Knowledge Check

Test your understanding of low-latency redirects, correctness semantics, and scale trade-offs.

Failure Diagnosis

Architecture Decisions

Summary & Takeaways

Key Takeaways

1.The core challenge is keeping redirect data plane fast while control and analytics planes evolve independently.
2.Snowflake-style IDs + Base62 give high-throughput uniqueness without global DB contention.
3.Deletion correctness requires tombstones and replication parity checks, not just cache TTL expiration.
4.Analytics must be asynchronous and sampled to preserve redirect SLO and cost control.
5.Negative caching and hotset prewarming are practical defenses against miss storms.
6.At scale, network and analytics storage often dominate cost over raw compute.

If I Had More Time

•Add tenant-level routing policies and geo-fencing for data residency requirements.
•Implement online abuse model with feature store and near-real-time scoring.
•Introduce active-active metadata conflict resolution with CRDT-inspired merge policy.
•Build dynamic cache TTL policy based on link popularity decay curves.
•Add campaign launch scheduler that prewarms caches based on expected traffic envelopes.

0/10

intermediateurl-shortenercachingedgeid-generationanalyticsdistributed-systems

URL Shortener at Global Scale

Design a globally distributed URL shortener that handles billions of redirects per day with low latency, abuse controls, and reliable analytics.

What You'll Learn

•How to split create-link control plane from ultra-low-latency redirect data plane
•ID generation strategies for short-code uniqueness without central bottlenecks
•Cache hierarchy design: edge cache, regional cache, and metadata fallback
•Negative caching and tombstones to enforce deletions and abuse takedowns
•Asynchronous click analytics ingestion with backpressure and sampling
•Partitioning and indexing patterns for high-cardinality link metadata
•Safety patterns for custom alias collisions and idempotent create requests
•Cost modeling for compute, storage, and cross-region traffic

Interview Simulation

Run a timed mock interview for this project and get a scored debrief.

Quick Context

Problem

Constraints & Assumptions8 items

•
Serve 12,000,000,000 redirects/day with peak 180,000 req/s
Why?
•
Keep redirect path p99 under 20ms globally and p95 under 12ms in-region
Why?
•
Support 70M+ link creations/day with idempotent create semantics
Why?
•
Guarantee short-code uniqueness and custom alias collision detection

Key Numbers(hover for details)

12B/day

Redirect Volume

Average global redirect requests

180K/s

Peak Redirects

Peak read-path throughput

70M/day

Create Throughput

New shortened links created

p99 <20ms

Latency

Global redirect latency target

99.99%

Availability

Redirect service objective

Regions

Active production regions

160K/s

Event Ingest

Click analytics event throughput

$165K

Monthly Cost

Production baseline for global setup

Requirements

Create Short URL

Create a short code for a long URL with optional expiration, tags, and tenant metadata.

Why it matters: Core feature for product value and API integration.

Custom Alias

Allow users to request custom aliases with uniqueness checks and reserved-word policies.

Why it matters: Enterprise and marketing workflows depend on branded URLs.

Low-Latency Redirect

Resolve short code and issue HTTP redirect quickly from edge locations.

Why it matters: Redirect path is the user-facing SLO-critical workflow.

Link Update and Disable

Update destination URL, set expiry, disable, or hard-delete links safely.

Why it matters: Operations and compliance require lifecycle controls.

Click Analytics

Capture click events with geo/device/referrer metadata and provide aggregates.

Why it matters: Analytics is a key monetization and customer retention feature.

Abuse Prevention

Block malicious domains and enforce per-tenant rate limits and quota policies.

Why it matters: Prevents platform abuse and protects sender reputation.

Idempotent Create API

Support idempotency keys so retries do not create duplicate short links.

Why it matters: Client retries are common under network turbulence.

Bulk Link Create

Allow batch creation jobs for campaign pipelines with status tracking.

Why it matters: Large customers need operational throughput beyond single-item calls.

Architecture Evolution

Show deltas

Single-region API service with PostgreSQL and Redis cache. Suitable for 100-1,000 users and 2K-5K redirects/sec at $120-400/month.

Click any component to inspect details, or use Trace Flow to animate redirect and write paths.

Legend

Clients

API

Cache

Database

What Changed & Why

•One API service handles create and redirect requests
•Redis cache for hot short-code lookups
•PostgreSQL stores link metadata and click counters
•Background worker computes simple daily analytics
•Basic rate limiting per API key

Key Decisions

5 decisions

ID Generation: DB Sequence vs Snowflake vs Random Tokens3 alternatives

Snowflake-style 64-bit IDs encoded to Base62 short codes

Redirect Path: Direct DB Reads vs Cache-First Edge3 alternatives

Edge redirect service with regional cache-first lookup

Delete Semantics: Hard Delete vs Tombstones3 alternatives

Tombstone-first delete propagation with delayed hard purge

Analytics Path: Inline Write vs Async Stream3 alternatives

Async event stream with sampled click events and rollups

Custom Aliases: Immediate Reservation vs Best-Effort Upsert3 alternatives

Transactional alias reservation with unique index + idempotency key

API Design

Base URL

https://api.shortly.dev/v1

Authentication

OAuth2 Bearer Token + HMAC key for server-to-server bulk jobs

Management endpoints require bearer tokens with tenant scopes (`links:write`, `links:read`, `analytics:read`). Bulk imports can use signed HMAC jobs for high-throughput ingestion.

Endpoints

POST/linksCreate a short link

POST/links/bulkCreate links in bulk

GET/links/{shortCode}/analyticsRead analytics aggregates

DELETE/links/{shortCode}Disable or delete link

Webhooks

EVENT

link.deleted

Triggered when a link is disabled or hard-deleted.

Payload

{
  "event": "link.deleted",
  "tenantId": "tnt_42",
  "shortCode": "b9Q4kT",
  "deletedAt": "2026-02-08T11:07:22Z",
  "actor": "user_91"
}

EVENT

abuse.detected

Triggered when destination or behavior exceeds abuse thresholds.

Payload

{
  "event": "abuse.detected",
  "shortCode": "x2ab91",
  "riskScore": 0.93,
  "action": "auto_disabled",
  "reason": "malware_domain"
}

Code Samples

Snowflake ID to Base62 Short CodeProduction

TypeScript utility converting monotonically increasing 64-bit IDs to compact Base62 codes.

Data Model & Queries

Schema

SQL

-- URL Shortener Core Schema (PostgreSQL)

CREATE TABLE tenants (
  tenant_id UUID PRIMARY KEY,
  name VARCHAR(128) NOT NULL,
  plan VARCHAR(32) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE links (
  link_id BIGINT PRIMARY KEY,
  tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),
  short_code VARCHAR(16) NOT NULL,
  long_url TEXT NOT NULL,
  long_url_hash BYTEA NOT NULL,
  canonical_domain VARCHAR(255) NOT NULL,
  status VARCHAR(24) NOT NULL CHECK (status IN ('active', 'disabled', 'deleted', 'expired')),
  is_custom_alias BOOLEAN NOT NULL DEFAULT FALSE,
  created_by UUID,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  expires_at TIMESTAMPTZ,
  deleted_at TIMESTAMPTZ,
  metadata JSONB NOT NULL DEFAULT '{}',
  UNIQUE (tenant_id, short_code)
);

CREATE TABLE alias_claims (
  tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),
  alias VARCHAR(64) NOT NULL,
  short_code VARCHAR(16) NOT NULL,
  claimed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (tenant_id, alias)
);

CREATE TABLE idempotency_keys (
  tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),
  idempotency_key UUID NOT NULL,
  request_hash BYTEA NOT NULL,
  status VARCHAR(16) NOT NULL CHECK (status IN ('in_progress', 'completed', 'failed')),
  response_payload JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  completed_at TIMESTAMPTZ,
  last_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (tenant_id, idempotency_key)
);

CREATE TABLE deletion_tombstones (
  tenant_id UUID NOT NULL REFERENCES tenants(tenant_id),
  short_code VARCHAR(16) NOT NULL,
  tombstone_version BIGINT NOT NULL,
  reason VARCHAR(64) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (tenant_id, short_code)
);

-- High-volume click stream landing table (partition by day)
CREATE TABLE click_events (
  event_id UUID NOT NULL,
  tenant_id UUID NOT NULL,
  short_code VARCHAR(16) NOT NULL,
  clicked_at TIMESTAMPTZ NOT NULL,
  country_code CHAR(2),
  device_type VARCHAR(24),
  referrer_domain VARCHAR(255),
  ip_hash BYTEA,
  user_agent_hash BYTEA,
  edge_region VARCHAR(24),
  PRIMARY KEY (event_id, clicked_at)
) PARTITION BY RANGE (clicked_at);

CREATE TABLE click_events_2026_02 PARTITION OF click_events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

CREATE TABLE daily_link_metrics (
  tenant_id UUID NOT NULL,
  short_code VARCHAR(16) NOT NULL,
  metric_date DATE NOT NULL,
  country_code CHAR(2) NOT NULL DEFAULT 'ZZ',
  device_type VARCHAR(24) NOT NULL DEFAULT 'unknown',
  clicks BIGINT NOT NULL,
  uniques BIGINT,
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (tenant_id, short_code, metric_date, country_code, device_type)
);

CREATE TABLE blocked_domains (
  domain VARCHAR(255) PRIMARY KEY,
  risk_score NUMERIC(4,3) NOT NULL,
  source VARCHAR(64) NOT NULL,
  active BOOLEAN NOT NULL DEFAULT TRUE,
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_links_tenant_status_created
  ON links (tenant_id, status, created_at DESC);

CREATE INDEX idx_links_active_lookup
  ON links (tenant_id, short_code)
  WHERE status = 'active';

CREATE INDEX idx_links_expires_at
  ON links (expires_at)
  WHERE expires_at IS NOT NULL;

CREATE INDEX idx_alias_claims_short_code
  ON alias_claims (short_code);

CREATE INDEX idx_idempotency_last_seen
  ON idempotency_keys (tenant_id, last_seen_at DESC);

CREATE INDEX idx_tombstones_version
  ON deletion_tombstones (tombstone_version DESC);

CREATE INDEX idx_click_events_tenant_code_time
  ON click_events (tenant_id, short_code, clicked_at DESC);

CREATE INDEX idx_click_events_country_time
  ON click_events (country_code, clicked_at DESC);

CREATE INDEX idx_daily_metrics_date
  ON daily_link_metrics (metric_date DESC, tenant_id);

Why This Schema

•Separates link metadata from click event firehose so redirect lookups stay lightweight.
•Uses dedicated tombstone table to enforce deletion semantics across eventually consistent caches.
•Stores idempotency keys explicitly to guarantee safe client retries.
•Keeps alias claims isolated to simplify uniqueness and conflict management.
•Partitions click events by time for retention, pruning, and low-cost scans.
•Pre-aggregates daily metrics for dashboard performance and cost control.

Common Queries

Resolve active link for redirect path

SQL

EXPLAIN ANALYZE
SELECT long_url, expires_at
FROM links
WHERE tenant_id = $1
  AND short_code = $2
  AND status = 'active'
LIMIT 1;

Find links nearing expiration in next 24 hours

SQL

EXPLAIN ANALYZE
SELECT short_code, expires_at
FROM links
WHERE tenant_id = $1
  AND status = 'active'
  AND expires_at IS NOT NULL
  AND expires_at < NOW() + INTERVAL '24 hours'
ORDER BY expires_at ASC
LIMIT 2000;

Scaling & Bottlenecks

Now

Primary Bottleneck

Regional cache miss spikes during campaign traffic and failover events

Mitigation

Increase hot-key replication and introduce adaptive prewarming for scheduled campaigns.

What You'd Change

•Maintain top-N global hot set in every region
•Prewarm campaign aliases based on scheduled launch windows
•Add cache miss circuit-breakers to protect metadata store

2× Scale

Primary Bottleneck

Metadata read replica lag and analytics consumer backlog

Mitigation

Split metadata read/write pools and scale stream partitions with tenant-aware balancing.

What You'd Change

•Promote read replica pools by tenant tier
•Increase stream partitions and isolate high-volume tenants
•Use pre-aggregated hourly rollups to reduce query scan volume

10× Scale

Primary Bottleneck

Cross-region replication egress and global invalidation consistency

Mitigation

Move to regional ownership cells with selective replication and compacted invalidation topics.

What You'd Change

•Adopt per-tenant region ownership and bounded-staleness failover
•Replicate control/tombstone state globally; keep raw events region-local
•Deploy consistency watchdog jobs for cache-vs-store parity

Failure Scenarios

Monitoring & Observability

Key Metrics

redirect_p99_mshistogram

P99 end-to-end redirect latency from edge POP.

Good: <20ms

Warning: 20-35ms

Critical: >35ms

cache_hit_rate_pctgauge

Regional cache hit rate for redirect lookups.

Good: >95%

Warning: 90-95%

Critical: <90%

metadata_read_qpsgauge

Read QPS against metadata store from cache misses.

Good: <8K

Warning: 8K-15K

Critical: >15K

metadata_replica_lag_mshistogram

Replication lag for metadata read replicas.

Good: <800ms

Warning: 800-2000ms

Critical: >2000ms

tombstone_replication_lag_secondsgauge

Delay for deletion/takedown propagation across regions.

Good: <5s

Warning: 5-10s

Critical: >10s

create_api_p99_mshistogram

P99 latency of create-link control-plane API.

Good: <90ms

Warning: 90-180ms

Critical: >180ms

abuse_check_p95_mshistogram

Latency for domain safety/abuse checks in create path.

Good: <40ms

Warning: 40-120ms

Critical: >120ms

click_stream_lag_secondsgauge

Consumer lag of click analytics stream.

Good: <15s

Warning: 15-90s

Critical: >90s

analytics_freshness_delay_secondsgauge

Delay between click ingest and dashboard visibility.

Good: <60s

Warning: 60-300s

Critical: >300s

alias_conflict_rate_pctgauge

Custom alias conflict ratio over create attempts.

Good: <2%

Warning: 2-6%

Critical: >6%

Alert Rules

RedirectLatencySLOBreachcritical

Redirect latency SLO is violated globally or in one major region.

redirect_p99_ms > 35 for 10m

Runbook: Inspect cache hit drop, DB miss amplification, and edge saturation. Activate hotset prewarm + traffic shedding if needed.

CacheHitCollapsecritical

Cache hit rate collapse likely causing metadata overload.

cache_hit_rate_pct < 90 for 5m

Runbook: Check recent invalidation events, shard health, and failover state. Temporarily extend TTL for top stable keys.

DeletePropagationLagcritical

Deleted links may still be resolvable in some regions.

tombstone_replication_lag_seconds > 10 for 3m

Runbook: Force global invalidate for affected prefixes, replay invalidation topic, and run parity check job.

AnalyticsStalenesswarning

Dashboard freshness degraded; customers may see delayed analytics.

analytics_freshness_delay_seconds > 300 for 15m

Runbook: Scale stream consumers, inspect partition skew, and shift heavy tenants to dedicated partitions.

AbuseDependencyDegradedwarning

Create path safety checks are slow and risk API degradation.

abuse_check_p95_ms > 120 for 10m

Runbook: Enable cached risk fallback, throttle bulk create traffic, and investigate upstream threat intel service.

ReplicaLagHighcritical

Metadata replicas too stale for safe redirect miss handling.

metadata_replica_lag_ms > 2000 for 10m

Runbook: Reduce miss pressure, promote healthier replica, and tune replication bandwidth/QoS.

Dashboard Layout

Redirect SLO

redirect_p99_mscache_hit_rate_pctmetadata_read_qps

Consistency & Safety

tombstone_replication_lag_secondsmetadata_replica_lag_msabuse_check_p95_ms

Control Plane

create_api_p99_msalias_conflict_rate_pct

Analytics Pipeline

click_stream_lag_secondsanalytics_freshness_delay_seconds

Scale Calculator

Estimate redirect capacity, storage growth, event pipeline load, and monthly compute/storage/network cost for a multi-region URL shortener.

Configuration

Redirects per Day12.00B req/day

1.00M req/day200.00B req/day

New Links per Day70.00M links/day

10.00K links/day2.00B links/day

Avg Long URL Size320.00 bytes

64.00 bytes4.10K bytes

Regional Cache Hit Rate96.00 %

50.00 %99.90 %

Active Regions8.00 regions

1.00 regions40.00 regions

Metadata Retention365.00 days

30.00 days3.65K days

Analytics Sampling85.00 %

1.00 %100.00 %

Metadata Replication Factor3.00 copies

1.00 copies6.00 copies

Redirect Node Capacity1.80K req/s/node

200.00 req/s/node10.00K req/s/node

Avg Redirect Egress1.50K bytes

200.00 bytes10.00K bytes

Calculated Results

Average Redirect Throughput

Average redirects per second.

138.89Kreq/s

Metadata DB Read QPS

Redirect reads that bypass cache and hit metadata store.

5.56K qps

Redirect Serving Nodes

Required nodes for average load with no headroom applied.

Analytics Event Throughput

Events/sec emitted into stream after sampling.

118.06Kevents/s

Analytics Ingest Volume

Raw event ingest into analytics pipeline each month.

61.23TB/month

Metadata Storage

Primary metadata + tombstone footprint (before replicas).

12.55 TB

Replicated Metadata Storage

Metadata footprint after applying replication factor.

37.64 TB

Redirect Egress Volume

Monthly redirect response egress.

491.13TB/month

Stream Consumer Instances

Approximate analytics consumer instances for current volume.

Monthly Compute Cost

Edge, API, stream, and control-plane compute cost.

$67,110

Monthly Storage Cost

Metadata + analytics storage cost estimate.

$5,682

Monthly Network Cost

Estimated redirect and replication network spend.

$25,146

Total Monthly Cost

All-in estimated monthly infrastructure cost.

$97,938

Cost per 1M Redirects

Normalized cost efficiency metric for redirect traffic.

0.27USD/1M

Estimated Monthly Cost (AWS)

$97,938/month

Monthly Compute Cost$67,110

Monthly Storage Cost$5,682

Monthly Network Cost$25,146

* Estimates based on simplified AWS pricing. Actual costs may vary.

Cost & Capacity

Traffic Estimates

Redirect Reads

Global redirect request throughput

138.9K/s avg

Peak Redirects

Burst throughput at campaign peaks

180K/s

Create Writes

Average create requests per second

810/s avg

Click Events

Sampled analytics events entering stream

160K/s

Cache Miss Reads

Metadata reads at 96% cache hit rate

~5.6K/s

Storage Estimates

Link Metadata

365-day active + disabled metadata footprint

52 TB

Tombstones

Delete/takedown records with 180-day retention

2.8 TB

Raw Click Events

Before compaction and rollup pruning

520 TB/month

Aggregated Analytics

Daily/hourly aggregate tables

38 TB/month

Test Your Understanding

Knowledge Check

Test your understanding of low-latency redirects, correctness semantics, and scale trade-offs.

Failure Diagnosis

Architecture Decisions

Summary & Takeaways

Key Takeaways

1.The core challenge is keeping redirect data plane fast while control and analytics planes evolve independently.
2.Snowflake-style IDs + Base62 give high-throughput uniqueness without global DB contention.
3.Deletion correctness requires tombstones and replication parity checks, not just cache TTL expiration.
4.Analytics must be asynchronous and sampled to preserve redirect SLO and cost control.
5.Negative caching and hotset prewarming are practical defenses against miss storms.
6.At scale, network and analytics storage often dominate cost over raw compute.

If I Had More Time

•Add tenant-level routing policies and geo-fencing for data residency requirements.
•Implement online abuse model with feature store and near-real-time scoring.
•Introduce active-active metadata conflict resolution with CRDT-inspired merge policy.
•Build dynamic cache TTL policy based on link popularity decay curves.
•Add campaign launch scheduler that prewarms caches based on expected traffic envelopes.

URL Shortener at Global Scale

What You'll Learn

Interview Simulation

Quick Context

Key Numbers(hover for details)

Requirements

Architecture Evolution

Legend

What Changed & Why

Key Decisions

API Design

Base URL

Authentication

Endpoints

Webhooks

Payload

Payload

Code Samples

Data Model & Queries

Scaling & Bottlenecks

Primary Bottleneck

Mitigation

What You'd Change

Primary Bottleneck

Mitigation

What You'd Change

Primary Bottleneck

Mitigation

What You'd Change

Failure Scenarios

Redirect latency spikes globally and p99 breaches 20ms for 15+ minutes

Deleted malicious links continue resolving in some regions

Custom alias creation fails intermittently with conflict errors

Analytics dashboards fall behind by hours

Create API error rate rises with 429 and domain validation timeouts

Monitoring & Observability

Redirect SLO

Consistency & Safety

Control Plane

Analytics Pipeline

Scale Calculator

Cost & Capacity

How It's Calculated

Test Your Understanding

Summary & Takeaways

URL Shortener at Global Scale

What You'll Learn

Interview Simulation

Quick Context

Key Numbers(hover for details)

Requirements

Architecture Evolution

Legend

What Changed & Why

Key Decisions

API Design

Base URL

Authentication

Endpoints

Webhooks

Payload

Payload

Code Samples

Data Model & Queries

Scaling & Bottlenecks

Primary Bottleneck

Mitigation

What You'd Change

Primary Bottleneck

Mitigation

What You'd Change

Primary Bottleneck

Mitigation

What You'd Change

Failure Scenarios

Redirect latency spikes globally and p99 breaches 20ms for 15+ minutes

Deleted malicious links continue resolving in some regions

Custom alias creation fails intermittently with conflict errors

Analytics dashboards fall behind by hours

Create API error rate rises with 429 and domain validation timeouts