How to Design a URL Shortener Like TinyURL: A Complete System Design Guide

System Design Series

How to Design a URL Shortener Like TinyURL:
Complete System Design Guide

25 min readIntermediate – AdvancedUpdated 2025

Learn how to design a scalable URL shortener like TinyURL or Bitly — covering distributed architecture, Redis caching, Snowflake ID generation, Kafka analytics pipelines, database sharding, and real-world engineering trade-offs.

System Design Distributed Systems Backend Engineering Interview Prep

Introduction: The Humble Link That Scaled to Billions

Picture this: it’s 2002. You want to share a research paper over email. The URL is 180 characters long, wraps across three lines, and breaks when clicked. A student named Kevin Gilbertson had the same frustration — and TinyURL was born.

Fast forward to today. Bitly handles 10 billion clicks per month. Twitter’s t.co redirects every shared link in real time. What started as a convenience tool became one of the most quietly critical pieces of internet infrastructure.

Designing a URL shortener looks deceptively simple — “just store a short code mapping to a long URL” — but it touches nearly every hard problem in distributed systems: high-throughput reads, collision-free ID generation, cache stampedes, global latency, and abuse prevention at massive scale.

Architecture Insight

URL shorteners are fundamentally read-heavy systems with a ~100:1 read/write ratio. This single characteristic shapes every architectural decision — from caching to database sharding to CDN strategy.

What Are We Building?

Functional Requirements

FeatureDescription
URL ShorteningGenerate unique short codes (e.g., tny.io/aB3xKz) from long URLs
URL RedirectionVisiting a short URL redirects to the original destination
Custom AliasesUsers can optionally choose their own short codes
Expiry SupportLinks expire after a given date or click threshold
AnalyticsTrack clicks, geographic origin, device type, referrer
User AccountsAuthenticated users can manage and view their links
Link DeletionUsers can deactivate a short URL

Non-Functional Requirements

RequirementTargetWhy It Matters
Availability99.99% uptime≤52 minutes downtime/year
Redirect Latency<10ms at P99Users perceive slow redirects as broken links
Write Throughput10,000 new URLs/sec at peakViral campaigns create write bursts
Read/Write Ratio~100:1Drives cache-first architecture
DurabilityZero data lossEvery mapping must be permanent
SecurityPhishing + DDoS preventionURL shorteners are abuse vectors

Scale Estimation

Every distributed system design starts here. Architecture decisions are meaningless without understanding expected traffic patterns. Let’s work through the numbers explicitly.

Daily Active Users
100M
per day
New URLs / day
10M
~115/sec
Redirects / day
10B
~115K RPS
Storage (5 yr)
4.75TB
260B/record avg
URL Namespace
3.5T
62⁷ unique codes
Read/Write Ratio
100:1
reads dominate
Traffic Estimation:
  Writes/sec  = 10M / 86,400 ≈ 115 writes/sec
  Reads/sec   = 10B / 86,400 ≈ 115,000 reads/sec

Storage per record:
  Short code   7 bytes
  Long URL   150 bytes
  Metadata   100 bytes
  Total      260 bytes  →  2.6 GB/day  →  4.75 TB over 5 years

URL Namespace (Base62, 7 chars):
  62^7 = 3.5 trillion codes  →  lasts ~958 years at 10M/day

High-Level Architecture

The system splits into two fundamentally different paths: the write path (creating short URLs) and the read/redirect path (resolving them). Because reads outnumber writes 100 to 1, the entire architecture is optimized around the redirect path.

Redirect Path — Critical Hot Path

Every millisecond counts. Most traffic should be served from CDN or Redis — never touching the database.

Redirect read path — request flow
🌐 User / Browser
⚡ CDN Edge Cache1–5ms global · L1 cache
CDN hit → instant 302
↓ CDN miss
Load BalancerL7 · NGINX / ALB
Redirect Servicestateless × N instances
Redis Cluster0.5–2ms · L2 cache
↓ Redis miss
DB Read Replica5–20ms fallback
↓ async (non-blocking)
Kafkaclick event stream
ClickHouseanalytics storage

Write Path — URL Creation

URL creation write path
Client
Write APIvalidate + auth
ID GeneratorSnowflake + Base62
Primary DBPostgreSQL write
Redis Cachewrite-through

Database Design

PostgreSQL vs Cassandra

FactorPostgreSQLCassandra
ConsistencyStrong (ACID)Eventual
ScalabilityModerate (sharding needed)Very high (built-in)
TransactionsExcellentLimited
Operational ComplexityLowerHigher
Best forStart here — most scalesBillions of URLs, multi-region writes

Schema Design

-- Core mapping table
CREATE TABLE url_mappings (
    short_code   VARCHAR(12)  PRIMARY KEY,
    long_url     TEXT         NOT NULL,
    user_id      BIGINT,
    created_at   TIMESTAMPTZ  DEFAULT NOW(),
    expires_at   TIMESTAMPTZ,
    is_active    BOOLEAN      DEFAULT TRUE,
    click_count  BIGINT       DEFAULT 0
);

-- Analytics events — partitioned by month for fast range scans
CREATE TABLE click_events (
    id           BIGSERIAL,
    short_code   VARCHAR(12)  NOT NULL,
    clicked_at   TIMESTAMPTZ  DEFAULT NOW(),
    ip_hash      VARCHAR(64),
    country_code CHAR(2),
    device_type  VARCHAR(20),
    referrer     TEXT,
    PRIMARY KEY (id, clicked_at)
) PARTITION BY RANGE (clicked_at);

-- Index for user dashboard queries
CREATE INDEX idx_url_user ON url_mappings(user_id, created_at DESC);
Indexing Decision

The primary key on short_code gives O(log n) lookups. Since reads dominate massively, this index must live in memory — ensure your database has enough RAM to hold the hot index in the buffer pool.

API Design

Create Short URL

POST /api/v1/urls
Authorization: Bearer <jwt_token>

{
  "long_url": "https://example.com/very/long/path",
  "custom_alias": "my-blog",       // optional
  "expires_at": "2026-01-01T00:00:00Z"  // optional
}

// 201 Created
{
  "short_url": "https://tny.io/aB3xKz",
  "short_code": "aB3xKz",
  "created_at": "2025-06-15T10:23:45Z"
}

// 409 Conflict — alias taken
{ "error": "ALIAS_TAKEN", "message": "The alias 'my-blog' is already in use." }

Redirect

GET /{short_code}

Response:
HTTP/1.1 302 Found
Location: https://example.com/very/long/path
Cache-Control: max-age=3600
301 vs 302 — Critical Trade-off

A 301 Permanent Redirect caches in browsers permanently — great for bandwidth, fatal for analytics. A 302 Temporary Redirect forces every click through your servers, enabling accurate tracking. Analytics-first shorteners always use 302.

Get Stats

GET /api/v1/urls/{short_code}/stats
Authorization: Bearer <jwt_token>

{
  "total_clicks": 14823,
  "unique_visitors": 11201,
  "top_countries": ["US", "IN", "GB", "DE"],
  "clicks_last_7_days": [...]
}

ID Generation: The Hardest Easy Problem

How do you generate a unique, collision-free short code at 115 writes/second across multiple API servers without coordination overhead? The answer is less obvious than it appears.

OptionMechanismProCon
UUID + Base62Random UUID, take first 7 charsSimpleCollision risk; uniqueness check needed on every write
Counter + Base62Global auto-incrementGuaranteed uniqueGlobal counter = single point of failure + write bottleneck
Snowflake IDs ✓Timestamp + machine ID + sequenceGlobally unique, no coordination, sorts chronologicallyClock skew edge cases need handling
Pre-generated PoolBackground job fills a pool tableZero write-path latency for code generationPool depletion under extreme load

Snowflake ID Structure (64-bit)

Timestamp41 bits · ms since epoch
Machine ID10 bits · 1024 nodes
Sequence12 bits · 4096/ms/node
Max throughput: 1,024 nodes × 4,096 IDs/ms = ~4 million IDs/millisecond
No central coordination required — each node generates IDs autonomously.

Caching Strategy

Caching is the single most important performance lever in this system. With a 100:1 read/write ratio, even a 95% cache hit rate means only 5,750 of 115,000 requests/second hit the database. Design the cache layers first.

Multi-Layer Cache Hierarchy

Cache layers — miss propagates downward
L1 — CDN Edge1–5ms global · ~60% of traffic
← requests served here never go deeper
↓ miss (~40%)
L2 — Local LRU<0.1ms per instance · viral / hot links
↓ miss (~38%)
L3 — Redis Cluster0.5–2ms · ~38% of traffic
↓ miss (<2%)
L4 — DB Read Replica5–20ms · <2% of traffic
Goal: >98% cache hit rate — the database handles fewer than 2% of all redirect requests

Redis Key Structure

Key:   "url:aB3xKz"
Value: "https://example.com/very/long/path"
TTL:   86400 seconds  (24h default — tiered by activity)

TTL Strategy

URL CategoryTTL StrategyReason
Brand-new URLs (<1 hour old)12 hours fixedHigh probability of imminent clicks
Active URLs (recent clicks)Sliding window — extend on every hitKeep hot links warm
Dormant (no clicks 7+ days)1 hourLow reuse probability; save memory
Enterprise / Pro tierNo TTL — permanentSLA guarantee for paying customers
Hot Key Problem

A URL shared by a celebrity can generate 100,000 RPS on a single Redis key. Solutions: replicate the key across multiple Redis slots (url:aB3xKz:1, :2…), keep top 1,000 URLs in an in-process LRU cache per instance, or push to CDN edge — bypassing Redis entirely for the most viral links.

Message Queues and Analytics

Analytics must never block redirect responses. A redirect waiting for click persistence adds 50–200ms of latency to every single user request. The solution is complete decoupling via Kafka.

Async analytics pipeline — click event never blocks redirect
Redirect Servicenon-blocking publish
Kafka Cluster3 brokers, RF=3
redirect response already sent ✓
↓ consume
Analytics WorkerFlink / micro-batch
ClickHousecolumnar analytics DB
↓ on failure
Dead Letter Queueat-least-once delivery
ClickHouse performs columnar analytics queries 10–100× faster than PostgreSQL on the same data volume.

The redirect response is sent before the Kafka publish even begins. Why ClickHouse? Analytics queries like “clicks by country, last 30 days, grouped by hour” are columnar scan operations — ClickHouse performs these 10–100× faster than PostgreSQL on the same data volume.

Scalability Strategies

Microservices Decomposition

ServiceResponsibilityScaling Pattern
Redirect Service99% of all traffic — resolve short code to URLHorizontal to 500+ instances; CPU-based autoscale
URL ServiceCRUD on URL mappingsModerate horizontal; write-limited
Analytics ServiceClick ingestion and reportingKafka partition-based scale
User ServiceAuth, accounts, billing tierIndependent; not in hot path
ID GeneratorSnowflake code generationReplicated sidecar per region

Database Sharding Strategy

When a single PostgreSQL instance can’t handle write volume (typically beyond ~10,000 writes/second), shard by consistent hashing on short_code. This produces even distribution and allows adding shards with minimal data migration — unlike prefix sharding which creates hot shards for popular prefixes.

Multi-Region Architecture

Deploy database clusters in at least 3 regions (US-East, EU-West, AP-Southeast). Use Anycast DNS to route users to their nearest region. URL creation can tolerate 50ms cross-region replication lag. Redirects always read from the local region’s replica — ensuring low latency regardless of where the URL was created.

Security Considerations

ThreatMitigation
Phishing / malware linksGoogle Safe Browsing API + PhishTank check at creation time
Abuse / spam URL creationRate limiting: 10 URLs/min anonymous, 1,000/min authenticated
DDoS on redirect pathCloudflare WAF + Anycast absorption at edge
Custom alias race conditionDatabase unique constraint + Redlock for high concurrency
PII in click eventsSHA-256 + salt hash all IP addresses before storage
Man-in-the-middleTLS 1.3 for all traffic; HSTS preload on domain

Rate Limiting — Sliding Window Counter

Key:   "ratelimit:write:{user_id}:{minute_bucket}"
Value: 47  (requests this window)
TTL:   60 seconds

If counter > limit  →  return 429 Too Many Requests + Retry-After header

Monitoring and Observability

LayerToolKey Metrics / Purpose
MetricsPrometheus + GrafanaRedirect P99 latency, cache hit rate, error rates by status code
Distributed TracingJaeger / OpenTelemetryCross-service span latency, slow query attribution
Centralized LoggingELK StackStructured JSON logs for every write, error, and audit event
AlertingPagerDutyP99 >100ms, cache hit <95%, error rate >1% → page on-call
Queue LagKafka ManagerConsumer lag >100K events → notify analytics team

Bottlenecks and Trade-offs

DecisionChoiceWhat We Sacrifice
301 vs 302 redirect302Slightly higher bandwidth per redirect — worth it for analytics
Sync vs async analyticsAsync (Kafka)Up to ~30s analytics delay — completely acceptable
Cache TTL lengthSliding window, tieredDeleted links may serve stale results for up to 1 hour
PostgreSQL vs CassandraPostgreSQL to startManual sharding at extreme scale
Read replicasEventual consistency50–100ms replication lag — fine for URL reads
Cache failure recoveryRequest coalescing + jitter TTLExtra implementation complexity
CAP Theorem Stance

This is a CP system. We prioritize consistency and partition tolerance over availability. If a short code doesn’t exist, we must return a 404 — never a stale redirect to a wrong URL. For analytics, we accept eventual consistency — a 30-second delay in click counts is completely acceptable.

Real-World Technology Stack

Twitter / X
t.co — Java services, Manhattan (custom distributed KV store), Snowflake IDs
Bitly
Cassandra for URL storage, Redis for caching, Go for the redirect service
LinkedIn
lnkd.in — Espresso (custom NoSQL), Kafka for analytics event streaming
Facebook
fb.me — TAO (graph database), Memcached at massive scale
Google
goo.gl (deprecated) — Bigtable, Memcache, GFE global load balancing

The pattern is consistent across all major implementations: a fast KV lookup store, a distributed cache layer, and an async event pipeline for analytics. The specific technology varies; the architectural pattern is universal.

Interview Perspective

The URL shortener is a favorite interview question precisely because it looks simple and reveals how deeply a candidate thinks about distributed systems. Interviewers aren’t evaluating whether you know the “right answer” — they’re watching your reasoning process.

How to Structure Your 30-Minute Answer

1
Clarify requirements (2 min) — Ask about analytics, custom aliases, expiry, and scale expectations before drawing anything.
2
Estimate scale (3 min) — Show your math: DAU → RPS → storage. Derive the 100:1 ratio explicitly.
3
High-level design (5 min) — Draw the main components. Explain the read path vs write path separation clearly.
4
Deep dive (15 min) — ID generation options, caching strategy, sharding decision, 301 vs 302 trade-off.
5
Bottlenecks (5 min) — Hot key problem, thundering herd, CAP theorem stance, abuse prevention.

Common Mistakes

MistakeWhy It’s a Red Flag
Jumping to code before clarifying requirementsShows inability to scope ambiguous problems
Ignoring the 100:1 read/write ratioThe most important insight — missing it means missing the architecture
Making analytics synchronousFundamental misunderstanding of system design priorities
Not knowing 301 vs 302 differenceReveals lack of HTTP protocol depth
Single-server design with no scale pathShows inability to think beyond toy systems

Frequently Asked Questions

What is the best algorithm for generating short codes in a URL shortener?
Snowflake IDs combined with Base62 encoding. They generate in microseconds without coordination between servers, guarantee no collisions across distributed nodes, and sort chronologically — making them ideal for URL shorteners at any scale.
Should a URL shortener use 301 or 302 redirects?
Use 302 (temporary redirect) if analytics matter. A 301 redirect causes browsers to cache the destination permanently, meaning subsequent clicks bypass your servers entirely — making click tracking impossible. Analytics-first shorteners always use 302.
How does a URL shortener handle millions of requests per second?
Through a multi-layer caching strategy: CDN edge caching (~60% of traffic), in-process LRU cache per instance (viral links), and Redis distributed cache (~38% of traffic). Properly designed, over 98% of redirect requests are served entirely from cache. The database handles less than 2% of total redirect load.
How many unique short codes does a 7-character Base62 code provide?
62⁷ = approximately 3.5 trillion unique codes. At 10 million new URLs per day, this namespace would last over 950 years before exhaustion. You will never run out in practice.
What database is best for a URL shortener?
PostgreSQL is ideal for most scales — ACID compliance, mature tooling, strong consistency, and simple custom alias uniqueness enforcement via unique constraints. At extreme scale (billions of URLs, multi-region active-active writes), Cassandra or DynamoDB offer better horizontal write scalability. Start with PostgreSQL; migrate to Cassandra only if you actually need to.
How do you prevent a URL shortener from being used for phishing?
Integrate with Google Safe Browsing API and PhishTank at URL creation time. Implement rate limiting (10 URLs/min for anonymous users), require authentication for bulk creation, and offer preview pages (e.g., tny.io/aB3xKz+) that show the destination before redirecting.

Conclusion

Designing a URL shortener well is a masterclass in distributed systems fundamentals. The problem is simple enough to understand in minutes but deep enough to reveal every weakness in how you think about scalability, consistency, and real-world engineering trade-offs.

✅ Key Takeaways
Read/write asymmetry (100:1) drives everything — design the read path first, always
Snowflake IDs solve distributed ID generation without coordination overhead
Analytics must be asynchronous — never block redirects for analytics writes
Start with PostgreSQL; scale to Cassandra only when you genuinely need multi-region writes
Security is not optional — phishing detection and rate limiting are day-one requirements
CDN + Redis + local LRU = three caching layers that keep database load under 2%

When you’ve internalized this design, the same thinking applies directly to link-in-bio platforms, QR code generators, redirect managers for A/B testing, or any system with similar read-heavy, low-latency characteristics.

Leave a Comment