System Design Series

How to Design a URL Shortener Like TinyURL:
Complete System Design Guide

25 min readIntermediate – AdvancedUpdated 2025

Learn how to design a scalable URL shortener like TinyURL or Bitly — covering distributed architecture, Redis caching, Snowflake ID generation, Kafka analytics pipelines, database sharding, and real-world engineering trade-offs.

System Design Distributed Systems Backend Engineering Interview Prep

Table of Contents

1. Introduction 2. Requirements 3. Scale Estimation 4. High-Level Architecture 5. Database Design 6. API Design 7. ID Generation 8. Caching Strategy 9. Message Queues 10. Scalability 11. Security 12. Monitoring 13. Trade-offs 14. Real-World Stack 15. Interview Guide 16. FAQs 17. Conclusion

Section 01

Introduction: The Humble Link That Scaled to Billions

Picture this: it’s 2002. You want to share a research paper over email. The URL is 180 characters long, wraps across three lines, and breaks when clicked. A student named Kevin Gilbertson had the same frustration — and TinyURL was born.

Fast forward to today. Bitly handles 10 billion clicks per month. Twitter’s t.co redirects every shared link in real time. What started as a convenience tool became one of the most quietly critical pieces of internet infrastructure.

Designing a URL shortener looks deceptively simple — “just store a short code mapping to a long URL” — but it touches nearly every hard problem in distributed systems: high-throughput reads, collision-free ID generation, cache stampedes, global latency, and abuse prevention at massive scale.

Architecture Insight

URL shorteners are fundamentally read-heavy systems with a ~100:1 read/write ratio. This single characteristic shapes every architectural decision — from caching to database sharding to CDN strategy.

Section 02

What Are We Building?

Functional Requirements

Feature	Description
URL Shortening	Generate unique short codes (e.g., tny.io/aB3xKz) from long URLs
URL Redirection	Visiting a short URL redirects to the original destination
Custom Aliases	Users can optionally choose their own short codes
Expiry Support	Links expire after a given date or click threshold
Analytics	Track clicks, geographic origin, device type, referrer
User Accounts	Authenticated users can manage and view their links
Link Deletion	Users can deactivate a short URL

Non-Functional Requirements

Requirement	Target	Why It Matters
Availability	99.99% uptime	≤52 minutes downtime/year
Redirect Latency	<10ms at P99	Users perceive slow redirects as broken links
Write Throughput	10,000 new URLs/sec at peak	Viral campaigns create write bursts
Read/Write Ratio	~100:1	Drives cache-first architecture
Durability	Zero data loss	Every mapping must be permanent
Security	Phishing + DDoS prevention	URL shorteners are abuse vectors

Section 03

Scale Estimation

Every distributed system design starts here. Architecture decisions are meaningless without understanding expected traffic patterns. Let’s work through the numbers explicitly.

Daily Active Users

100M

per day

New URLs / day

10M

~115/sec

Redirects / day

10B

~115K RPS

Storage (5 yr)

4.75TB

260B/record avg

URL Namespace

3.5T

62⁷ unique codes

Read/Write Ratio

100:1

reads dominate

Traffic Estimation:
  Writes/sec  = 10M / 86,400 ≈ 115 writes/sec
  Reads/sec   = 10B / 86,400 ≈ 115,000 reads/sec

Storage per record:
  Short code   7 bytes
  Long URL   150 bytes
  Metadata   100 bytes
  Total      260 bytes  →  2.6 GB/day  →  4.75 TB over 5 years

URL Namespace (Base62, 7 chars):
  62^7 = 3.5 trillion codes  →  lasts ~958 years at 10M/day

Section 04

High-Level Architecture

The system splits into two fundamentally different paths: the write path (creating short URLs) and the read/redirect path (resolving them). Because reads outnumber writes 100 to 1, the entire architecture is optimized around the redirect path.

Redirect Path — Critical Hot Path

Every millisecond counts. Most traffic should be served from CDN or Redis — never touching the database.

Redirect read path — request flow

🌐 User / Browser

→

⚡ CDN Edge Cache1–5ms global · L1 cache

CDN hit → instant 302

↓ CDN miss

Load BalancerL7 · NGINX / ALB

→

Redirect Servicestateless × N instances

→

Redis Cluster0.5–2ms · L2 cache

↓ Redis miss

DB Read Replica5–20ms fallback

↓ async (non-blocking)

Kafkaclick event stream

→

ClickHouseanalytics storage

Write Path — URL Creation

URL creation write path

Client

→

Write APIvalidate + auth

→

ID GeneratorSnowflake + Base62

→

Primary DBPostgreSQL write

→

Redis Cachewrite-through

Section 05

Database Design

PostgreSQL vs Cassandra

Factor	PostgreSQL	Cassandra
Consistency	Strong (ACID)	Eventual
Scalability	Moderate (sharding needed)	Very high (built-in)
Transactions	Excellent	Limited
Operational Complexity	Lower	Higher
Best for	Start here — most scales	Billions of URLs, multi-region writes

Schema Design

-- Core mapping table
CREATE TABLE url_mappings (
    short_code   VARCHAR(12)  PRIMARY KEY,
    long_url     TEXT         NOT NULL,
    user_id      BIGINT,
    created_at   TIMESTAMPTZ  DEFAULT NOW(),
    expires_at   TIMESTAMPTZ,
    is_active    BOOLEAN      DEFAULT TRUE,
    click_count  BIGINT       DEFAULT 0
);

-- Analytics events — partitioned by month for fast range scans
CREATE TABLE click_events (
    id           BIGSERIAL,
    short_code   VARCHAR(12)  NOT NULL,
    clicked_at   TIMESTAMPTZ  DEFAULT NOW(),
    ip_hash      VARCHAR(64),
    country_code CHAR(2),
    device_type  VARCHAR(20),
    referrer     TEXT,
    PRIMARY KEY (id, clicked_at)
) PARTITION BY RANGE (clicked_at);

-- Index for user dashboard queries
CREATE INDEX idx_url_user ON url_mappings(user_id, created_at DESC);

Indexing Decision

The primary key on short_code gives O(log n) lookups. Since reads dominate massively, this index must live in memory — ensure your database has enough RAM to hold the hot index in the buffer pool.

Section 06

API Design

Create Short URL

POST /api/v1/urls
Authorization: Bearer <jwt_token>

{
  "long_url": "https://example.com/very/long/path",
  "custom_alias": "my-blog",       // optional
  "expires_at": "2026-01-01T00:00:00Z"  // optional
}

// 201 Created
{
  "short_url": "https://tny.io/aB3xKz",
  "short_code": "aB3xKz",
  "created_at": "2025-06-15T10:23:45Z"
}

// 409 Conflict — alias taken
{ "error": "ALIAS_TAKEN", "message": "The alias 'my-blog' is already in use." }

Redirect

GET /{short_code}

Response:
HTTP/1.1 302 Found
Location: https://example.com/very/long/path
Cache-Control: max-age=3600

301 vs 302 — Critical Trade-off

A 301 Permanent Redirect caches in browsers permanently — great for bandwidth, fatal for analytics. A 302 Temporary Redirect forces every click through your servers, enabling accurate tracking. Analytics-first shorteners always use 302.

Get Stats

GET /api/v1/urls/{short_code}/stats
Authorization: Bearer <jwt_token>

{
  "total_clicks": 14823,
  "unique_visitors": 11201,
  "top_countries": ["US", "IN", "GB", "DE"],
  "clicks_last_7_days": [...]
}

Section 07

ID Generation: The Hardest Easy Problem

How do you generate a unique, collision-free short code at 115 writes/second across multiple API servers without coordination overhead? The answer is less obvious than it appears.

Option	Mechanism	Pro	Con
UUID + Base62	Random UUID, take first 7 chars	Simple	Collision risk; uniqueness check needed on every write
Counter + Base62	Global auto-increment	Guaranteed unique	Global counter = single point of failure + write bottleneck
Snowflake IDs ✓	Timestamp + machine ID + sequence	Globally unique, no coordination, sorts chronologically	Clock skew edge cases need handling
Pre-generated Pool	Background job fills a pool table	Zero write-path latency for code generation	Pool depletion under extreme load

Snowflake ID Structure (64-bit)

Timestamp41 bits · ms since epoch

Machine ID10 bits · 1024 nodes

Sequence12 bits · 4096/ms/node

Max throughput: 1,024 nodes × 4,096 IDs/ms = ~4 million IDs/millisecond
No central coordination required — each node generates IDs autonomously.

Section 08

Caching Strategy

Caching is the single most important performance lever in this system. With a 100:1 read/write ratio, even a 95% cache hit rate means only 5,750 of 115,000 requests/second hit the database. Design the cache layers first.

Multi-Layer Cache Hierarchy

Cache layers — miss propagates downward

L1 — CDN Edge1–5ms global · ~60% of traffic

← requests served here never go deeper

↓ miss (~40%)

L2 — Local LRU<0.1ms per instance · viral / hot links

↓ miss (~38%)

L3 — Redis Cluster0.5–2ms · ~38% of traffic

↓ miss (<2%)

L4 — DB Read Replica5–20ms · <2% of traffic

Goal: >98% cache hit rate — the database handles fewer than 2% of all redirect requests

Redis Key Structure

Key:   "url:aB3xKz"
Value: "https://example.com/very/long/path"
TTL:   86400 seconds  (24h default — tiered by activity)

TTL Strategy

URL Category	TTL Strategy	Reason
Brand-new URLs (<1 hour old)	12 hours fixed	High probability of imminent clicks
Active URLs (recent clicks)	Sliding window — extend on every hit	Keep hot links warm
Dormant (no clicks 7+ days)	1 hour	Low reuse probability; save memory
Enterprise / Pro tier	No TTL — permanent	SLA guarantee for paying customers

Hot Key Problem

A URL shared by a celebrity can generate 100,000 RPS on a single Redis key. Solutions: replicate the key across multiple Redis slots (url:aB3xKz:1, :2…), keep top 1,000 URLs in an in-process LRU cache per instance, or push to CDN edge — bypassing Redis entirely for the most viral links.

Section 09

Message Queues and Analytics

Analytics must never block redirect responses. A redirect waiting for click persistence adds 50–200ms of latency to every single user request. The solution is complete decoupling via Kafka.

Async analytics pipeline — click event never blocks redirect

Redirect Servicenon-blocking publish

→

Kafka Cluster3 brokers, RF=3

redirect response already sent ✓

↓ consume

Analytics WorkerFlink / micro-batch

→

ClickHousecolumnar analytics DB

↓ on failure

Dead Letter Queueat-least-once delivery

ClickHouse performs columnar analytics queries 10–100× faster than PostgreSQL on the same data volume.

The redirect response is sent before the Kafka publish even begins. Why ClickHouse? Analytics queries like “clicks by country, last 30 days, grouped by hour” are columnar scan operations — ClickHouse performs these 10–100× faster than PostgreSQL on the same data volume.

Section 10

Scalability Strategies

Microservices Decomposition

Service	Responsibility	Scaling Pattern
Redirect Service	99% of all traffic — resolve short code to URL	Horizontal to 500+ instances; CPU-based autoscale
URL Service	CRUD on URL mappings	Moderate horizontal; write-limited
Analytics Service	Click ingestion and reporting	Kafka partition-based scale
User Service	Auth, accounts, billing tier	Independent; not in hot path
ID Generator	Snowflake code generation	Replicated sidecar per region

Database Sharding Strategy

When a single PostgreSQL instance can’t handle write volume (typically beyond ~10,000 writes/second), shard by consistent hashing on short_code. This produces even distribution and allows adding shards with minimal data migration — unlike prefix sharding which creates hot shards for popular prefixes.

Multi-Region Architecture

Deploy database clusters in at least 3 regions (US-East, EU-West, AP-Southeast). Use Anycast DNS to route users to their nearest region. URL creation can tolerate 50ms cross-region replication lag. Redirects always read from the local region’s replica — ensuring low latency regardless of where the URL was created.

Section 11

Security Considerations

Threat	Mitigation
Phishing / malware links	Google Safe Browsing API + PhishTank check at creation time
Abuse / spam URL creation	Rate limiting: 10 URLs/min anonymous, 1,000/min authenticated
DDoS on redirect path	Cloudflare WAF + Anycast absorption at edge
Custom alias race condition	Database unique constraint + Redlock for high concurrency
PII in click events	SHA-256 + salt hash all IP addresses before storage
Man-in-the-middle	TLS 1.3 for all traffic; HSTS preload on domain

Rate Limiting — Sliding Window Counter

Key:   "ratelimit:write:{user_id}:{minute_bucket}"
Value: 47  (requests this window)
TTL:   60 seconds

If counter > limit  →  return 429 Too Many Requests + Retry-After header

Section 12

Monitoring and Observability

Layer	Tool	Key Metrics / Purpose
Metrics	Prometheus + Grafana	Redirect P99 latency, cache hit rate, error rates by status code
Distributed Tracing	Jaeger / OpenTelemetry	Cross-service span latency, slow query attribution
Centralized Logging	ELK Stack	Structured JSON logs for every write, error, and audit event
Alerting	PagerDuty	P99 >100ms, cache hit <95%, error rate >1% → page on-call
Queue Lag	Kafka Manager	Consumer lag >100K events → notify analytics team

Section 13

Bottlenecks and Trade-offs

Decision	Choice	What We Sacrifice
301 vs 302 redirect	302	Slightly higher bandwidth per redirect — worth it for analytics
Sync vs async analytics	Async (Kafka)	Up to ~30s analytics delay — completely acceptable
Cache TTL length	Sliding window, tiered	Deleted links may serve stale results for up to 1 hour
PostgreSQL vs Cassandra	PostgreSQL to start	Manual sharding at extreme scale
Read replicas	Eventual consistency	50–100ms replication lag — fine for URL reads
Cache failure recovery	Request coalescing + jitter TTL	Extra implementation complexity

CAP Theorem Stance

This is a CP system. We prioritize consistency and partition tolerance over availability. If a short code doesn’t exist, we must return a 404 — never a stale redirect to a wrong URL. For analytics, we accept eventual consistency — a 30-second delay in click counts is completely acceptable.

Section 14

Real-World Technology Stack

Twitter / X

t.co — Java services, Manhattan (custom distributed KV store), Snowflake IDs

Bitly

Cassandra for URL storage, Redis for caching, Go for the redirect service

lnkd.in — Espresso (custom NoSQL), Kafka for analytics event streaming

Facebook

fb.me — TAO (graph database), Memcached at massive scale

Google

goo.gl (deprecated) — Bigtable, Memcache, GFE global load balancing

The pattern is consistent across all major implementations: a fast KV lookup store, a distributed cache layer, and an async event pipeline for analytics. The specific technology varies; the architectural pattern is universal.

Section 15

Interview Perspective

The URL shortener is a favorite interview question precisely because it looks simple and reveals how deeply a candidate thinks about distributed systems. Interviewers aren’t evaluating whether you know the “right answer” — they’re watching your reasoning process.

How to Structure Your 30-Minute Answer

Clarify requirements (2 min) — Ask about analytics, custom aliases, expiry, and scale expectations before drawing anything.

Estimate scale (3 min) — Show your math: DAU → RPS → storage. Derive the 100:1 ratio explicitly.

High-level design (5 min) — Draw the main components. Explain the read path vs write path separation clearly.

Deep dive (15 min) — ID generation options, caching strategy, sharding decision, 301 vs 302 trade-off.

Bottlenecks (5 min) — Hot key problem, thundering herd, CAP theorem stance, abuse prevention.

Common Mistakes

Mistake	Why It’s a Red Flag
Jumping to code before clarifying requirements	Shows inability to scope ambiguous problems
Ignoring the 100:1 read/write ratio	The most important insight — missing it means missing the architecture
Making analytics synchronous	Fundamental misunderstanding of system design priorities
Not knowing 301 vs 302 difference	Reveals lack of HTTP protocol depth
Single-server design with no scale path	Shows inability to think beyond toy systems

Section 16

Frequently Asked Questions

What is the best algorithm for generating short codes in a URL shortener?

Snowflake IDs combined with Base62 encoding. They generate in microseconds without coordination between servers, guarantee no collisions across distributed nodes, and sort chronologically — making them ideal for URL shorteners at any scale.

Should a URL shortener use 301 or 302 redirects?

Use 302 (temporary redirect) if analytics matter. A 301 redirect causes browsers to cache the destination permanently, meaning subsequent clicks bypass your servers entirely — making click tracking impossible. Analytics-first shorteners always use 302.

How does a URL shortener handle millions of requests per second?

Through a multi-layer caching strategy: CDN edge caching (~60% of traffic), in-process LRU cache per instance (viral links), and Redis distributed cache (~38% of traffic). Properly designed, over 98% of redirect requests are served entirely from cache. The database handles less than 2% of total redirect load.

How many unique short codes does a 7-character Base62 code provide?

62⁷ = approximately 3.5 trillion unique codes. At 10 million new URLs per day, this namespace would last over 950 years before exhaustion. You will never run out in practice.

What database is best for a URL shortener?

PostgreSQL is ideal for most scales — ACID compliance, mature tooling, strong consistency, and simple custom alias uniqueness enforcement via unique constraints. At extreme scale (billions of URLs, multi-region active-active writes), Cassandra or DynamoDB offer better horizontal write scalability. Start with PostgreSQL; migrate to Cassandra only if you actually need to.

How do you prevent a URL shortener from being used for phishing?

Integrate with Google Safe Browsing API and PhishTank at URL creation time. Implement rate limiting (10 URLs/min for anonymous users), require authentication for bulk creation, and offer preview pages (e.g., tny.io/aB3xKz+) that show the destination before redirecting.

Section 17

Conclusion

Designing a URL shortener well is a masterclass in distributed systems fundamentals. The problem is simple enough to understand in minutes but deep enough to reveal every weakness in how you think about scalability, consistency, and real-world engineering trade-offs.

✅ Key Takeaways

Read/write asymmetry (100:1) drives everything — design the read path first, always

Snowflake IDs solve distributed ID generation without coordination overhead

Analytics must be asynchronous — never block redirects for analytics writes

Start with PostgreSQL; scale to Cassandra only when you genuinely need multi-region writes

Security is not optional — phishing detection and rate limiting are day-one requirements

CDN + Redis + local LRU = three caching layers that keep database load under 2%

When you’ve internalized this design, the same thinking applies directly to link-in-bio platforms, QR code generators, redirect managers for A/B testing, or any system with similar read-heavy, low-latency characteristics.

How to Design a URL Shortener Like TinyURL: A Complete System Design Guide

How to Design a URL Shortener Like TinyURL:
Complete System Design Guide

Introduction: The Humble Link That Scaled to Billions

What Are We Building?

Functional Requirements

Non-Functional Requirements

Scale Estimation

High-Level Architecture

Redirect Path — Critical Hot Path

Write Path — URL Creation

Database Design

PostgreSQL vs Cassandra

Schema Design

API Design

Create Short URL

Redirect

Get Stats

ID Generation: The Hardest Easy Problem

Snowflake ID Structure (64-bit)

Caching Strategy

Multi-Layer Cache Hierarchy

Redis Key Structure

TTL Strategy

Message Queues and Analytics

Scalability Strategies

Microservices Decomposition

Database Sharding Strategy

Multi-Region Architecture

Security Considerations

Rate Limiting — Sliding Window Counter

Monitoring and Observability

Bottlenecks and Trade-offs

Real-World Technology Stack

Interview Perspective

How to Structure Your 30-Minute Answer

Common Mistakes

Frequently Asked Questions

Conclusion

Leave a Comment Cancel reply

How to Design a URL Shortener Like TinyURL:Complete System Design Guide

Introduction: The Humble Link That Scaled to Billions

What Are We Building?

Functional Requirements

Non-Functional Requirements

Scale Estimation

High-Level Architecture

Redirect Path — Critical Hot Path

Write Path — URL Creation

Database Design

PostgreSQL vs Cassandra

Schema Design

API Design

Create Short URL

Redirect

Get Stats

ID Generation: The Hardest Easy Problem

Snowflake ID Structure (64-bit)

Caching Strategy

Multi-Layer Cache Hierarchy

Redis Key Structure

TTL Strategy

Message Queues and Analytics

Scalability Strategies

Microservices Decomposition

Database Sharding Strategy

Multi-Region Architecture

Security Considerations

Rate Limiting — Sliding Window Counter

Monitoring and Observability

Bottlenecks and Trade-offs

Real-World Technology Stack

Interview Perspective

How to Structure Your 30-Minute Answer

Common Mistakes

Frequently Asked Questions

Conclusion

Share this:

Leave a Comment Cancel reply

How to Design a URL Shortener Like TinyURL:
Complete System Design Guide