How to Design a URL Shortener Like TinyURL:
Complete System Design Guide
Learn how to design a scalable URL shortener like TinyURL or Bitly — covering distributed architecture, Redis caching, Snowflake ID generation, Kafka analytics pipelines, database sharding, and real-world engineering trade-offs.
Introduction: The Humble Link That Scaled to Billions
Picture this: it’s 2002. You want to share a research paper over email. The URL is 180 characters long, wraps across three lines, and breaks when clicked. A student named Kevin Gilbertson had the same frustration — and TinyURL was born.
Fast forward to today. Bitly handles 10 billion clicks per month. Twitter’s t.co redirects every shared link in real time. What started as a convenience tool became one of the most quietly critical pieces of internet infrastructure.
Designing a URL shortener looks deceptively simple — “just store a short code mapping to a long URL” — but it touches nearly every hard problem in distributed systems: high-throughput reads, collision-free ID generation, cache stampedes, global latency, and abuse prevention at massive scale.
URL shorteners are fundamentally read-heavy systems with a ~100:1 read/write ratio. This single characteristic shapes every architectural decision — from caching to database sharding to CDN strategy.
What Are We Building?
Functional Requirements
| Feature | Description |
|---|---|
| URL Shortening | Generate unique short codes (e.g., tny.io/aB3xKz) from long URLs |
| URL Redirection | Visiting a short URL redirects to the original destination |
| Custom Aliases | Users can optionally choose their own short codes |
| Expiry Support | Links expire after a given date or click threshold |
| Analytics | Track clicks, geographic origin, device type, referrer |
| User Accounts | Authenticated users can manage and view their links |
| Link Deletion | Users can deactivate a short URL |
Non-Functional Requirements
| Requirement | Target | Why It Matters |
|---|---|---|
| Availability | 99.99% uptime | ≤52 minutes downtime/year |
| Redirect Latency | <10ms at P99 | Users perceive slow redirects as broken links |
| Write Throughput | 10,000 new URLs/sec at peak | Viral campaigns create write bursts |
| Read/Write Ratio | ~100:1 | Drives cache-first architecture |
| Durability | Zero data loss | Every mapping must be permanent |
| Security | Phishing + DDoS prevention | URL shorteners are abuse vectors |
Scale Estimation
Every distributed system design starts here. Architecture decisions are meaningless without understanding expected traffic patterns. Let’s work through the numbers explicitly.
Traffic Estimation: Writes/sec = 10M / 86,400 ≈ 115 writes/sec Reads/sec = 10B / 86,400 ≈ 115,000 reads/sec Storage per record: Short code 7 bytes Long URL 150 bytes Metadata 100 bytes Total 260 bytes → 2.6 GB/day → 4.75 TB over 5 years URL Namespace (Base62, 7 chars): 62^7 = 3.5 trillion codes → lasts ~958 years at 10M/day
High-Level Architecture
The system splits into two fundamentally different paths: the write path (creating short URLs) and the read/redirect path (resolving them). Because reads outnumber writes 100 to 1, the entire architecture is optimized around the redirect path.
Redirect Path — Critical Hot Path
Every millisecond counts. Most traffic should be served from CDN or Redis — never touching the database.
Write Path — URL Creation
Database Design
PostgreSQL vs Cassandra
| Factor | PostgreSQL | Cassandra |
|---|---|---|
| Consistency | Strong (ACID) | Eventual |
| Scalability | Moderate (sharding needed) | Very high (built-in) |
| Transactions | Excellent | Limited |
| Operational Complexity | Lower | Higher |
| Best for | Start here — most scales | Billions of URLs, multi-region writes |
Schema Design
-- Core mapping table CREATE TABLE url_mappings ( short_code VARCHAR(12) PRIMARY KEY, long_url TEXT NOT NULL, user_id BIGINT, created_at TIMESTAMPTZ DEFAULT NOW(), expires_at TIMESTAMPTZ, is_active BOOLEAN DEFAULT TRUE, click_count BIGINT DEFAULT 0 ); -- Analytics events — partitioned by month for fast range scans CREATE TABLE click_events ( id BIGSERIAL, short_code VARCHAR(12) NOT NULL, clicked_at TIMESTAMPTZ DEFAULT NOW(), ip_hash VARCHAR(64), country_code CHAR(2), device_type VARCHAR(20), referrer TEXT, PRIMARY KEY (id, clicked_at) ) PARTITION BY RANGE (clicked_at); -- Index for user dashboard queries CREATE INDEX idx_url_user ON url_mappings(user_id, created_at DESC);
The primary key on short_code gives O(log n) lookups. Since reads dominate massively, this index must live in memory — ensure your database has enough RAM to hold the hot index in the buffer pool.
API Design
Create Short URL
POST /api/v1/urls Authorization: Bearer <jwt_token> { "long_url": "https://example.com/very/long/path", "custom_alias": "my-blog", // optional "expires_at": "2026-01-01T00:00:00Z" // optional } // 201 Created { "short_url": "https://tny.io/aB3xKz", "short_code": "aB3xKz", "created_at": "2025-06-15T10:23:45Z" } // 409 Conflict — alias taken { "error": "ALIAS_TAKEN", "message": "The alias 'my-blog' is already in use." }
Redirect
GET /{short_code} Response: HTTP/1.1 302 Found Location: https://example.com/very/long/path Cache-Control: max-age=3600
A 301 Permanent Redirect caches in browsers permanently — great for bandwidth, fatal for analytics. A 302 Temporary Redirect forces every click through your servers, enabling accurate tracking. Analytics-first shorteners always use 302.
Get Stats
GET /api/v1/urls/{short_code}/stats Authorization: Bearer <jwt_token> { "total_clicks": 14823, "unique_visitors": 11201, "top_countries": ["US", "IN", "GB", "DE"], "clicks_last_7_days": [...] }
ID Generation: The Hardest Easy Problem
How do you generate a unique, collision-free short code at 115 writes/second across multiple API servers without coordination overhead? The answer is less obvious than it appears.
| Option | Mechanism | Pro | Con |
|---|---|---|---|
| UUID + Base62 | Random UUID, take first 7 chars | Simple | Collision risk; uniqueness check needed on every write |
| Counter + Base62 | Global auto-increment | Guaranteed unique | Global counter = single point of failure + write bottleneck |
| Snowflake IDs ✓ | Timestamp + machine ID + sequence | Globally unique, no coordination, sorts chronologically | Clock skew edge cases need handling |
| Pre-generated Pool | Background job fills a pool table | Zero write-path latency for code generation | Pool depletion under extreme load |
Snowflake ID Structure (64-bit)
Max throughput: 1,024 nodes × 4,096 IDs/ms = ~4 million IDs/millisecond
No central coordination required — each node generates IDs autonomously.Caching Strategy
Caching is the single most important performance lever in this system. With a 100:1 read/write ratio, even a 95% cache hit rate means only 5,750 of 115,000 requests/second hit the database. Design the cache layers first.
Multi-Layer Cache Hierarchy
Redis Key Structure
Key: "url:aB3xKz" Value: "https://example.com/very/long/path" TTL: 86400 seconds (24h default — tiered by activity)
TTL Strategy
| URL Category | TTL Strategy | Reason |
|---|---|---|
| Brand-new URLs (<1 hour old) | 12 hours fixed | High probability of imminent clicks |
| Active URLs (recent clicks) | Sliding window — extend on every hit | Keep hot links warm |
| Dormant (no clicks 7+ days) | 1 hour | Low reuse probability; save memory |
| Enterprise / Pro tier | No TTL — permanent | SLA guarantee for paying customers |
A URL shared by a celebrity can generate 100,000 RPS on a single Redis key. Solutions: replicate the key across multiple Redis slots (url:aB3xKz:1, :2…), keep top 1,000 URLs in an in-process LRU cache per instance, or push to CDN edge — bypassing Redis entirely for the most viral links.
Message Queues and Analytics
Analytics must never block redirect responses. A redirect waiting for click persistence adds 50–200ms of latency to every single user request. The solution is complete decoupling via Kafka.
The redirect response is sent before the Kafka publish even begins. Why ClickHouse? Analytics queries like “clicks by country, last 30 days, grouped by hour” are columnar scan operations — ClickHouse performs these 10–100× faster than PostgreSQL on the same data volume.
Scalability Strategies
Microservices Decomposition
| Service | Responsibility | Scaling Pattern |
|---|---|---|
| Redirect Service | 99% of all traffic — resolve short code to URL | Horizontal to 500+ instances; CPU-based autoscale |
| URL Service | CRUD on URL mappings | Moderate horizontal; write-limited |
| Analytics Service | Click ingestion and reporting | Kafka partition-based scale |
| User Service | Auth, accounts, billing tier | Independent; not in hot path |
| ID Generator | Snowflake code generation | Replicated sidecar per region |
Database Sharding Strategy
When a single PostgreSQL instance can’t handle write volume (typically beyond ~10,000 writes/second), shard by consistent hashing on short_code. This produces even distribution and allows adding shards with minimal data migration — unlike prefix sharding which creates hot shards for popular prefixes.
Multi-Region Architecture
Deploy database clusters in at least 3 regions (US-East, EU-West, AP-Southeast). Use Anycast DNS to route users to their nearest region. URL creation can tolerate 50ms cross-region replication lag. Redirects always read from the local region’s replica — ensuring low latency regardless of where the URL was created.
Security Considerations
| Threat | Mitigation |
|---|---|
| Phishing / malware links | Google Safe Browsing API + PhishTank check at creation time |
| Abuse / spam URL creation | Rate limiting: 10 URLs/min anonymous, 1,000/min authenticated |
| DDoS on redirect path | Cloudflare WAF + Anycast absorption at edge |
| Custom alias race condition | Database unique constraint + Redlock for high concurrency |
| PII in click events | SHA-256 + salt hash all IP addresses before storage |
| Man-in-the-middle | TLS 1.3 for all traffic; HSTS preload on domain |
Rate Limiting — Sliding Window Counter
Key: "ratelimit:write:{user_id}:{minute_bucket}" Value: 47 (requests this window) TTL: 60 seconds If counter > limit → return 429 Too Many Requests + Retry-After header
Monitoring and Observability
| Layer | Tool | Key Metrics / Purpose |
|---|---|---|
| Metrics | Prometheus + Grafana | Redirect P99 latency, cache hit rate, error rates by status code |
| Distributed Tracing | Jaeger / OpenTelemetry | Cross-service span latency, slow query attribution |
| Centralized Logging | ELK Stack | Structured JSON logs for every write, error, and audit event |
| Alerting | PagerDuty | P99 >100ms, cache hit <95%, error rate >1% → page on-call |
| Queue Lag | Kafka Manager | Consumer lag >100K events → notify analytics team |
Bottlenecks and Trade-offs
| Decision | Choice | What We Sacrifice |
|---|---|---|
| 301 vs 302 redirect | 302 | Slightly higher bandwidth per redirect — worth it for analytics |
| Sync vs async analytics | Async (Kafka) | Up to ~30s analytics delay — completely acceptable |
| Cache TTL length | Sliding window, tiered | Deleted links may serve stale results for up to 1 hour |
| PostgreSQL vs Cassandra | PostgreSQL to start | Manual sharding at extreme scale |
| Read replicas | Eventual consistency | 50–100ms replication lag — fine for URL reads |
| Cache failure recovery | Request coalescing + jitter TTL | Extra implementation complexity |
This is a CP system. We prioritize consistency and partition tolerance over availability. If a short code doesn’t exist, we must return a 404 — never a stale redirect to a wrong URL. For analytics, we accept eventual consistency — a 30-second delay in click counts is completely acceptable.
Real-World Technology Stack
The pattern is consistent across all major implementations: a fast KV lookup store, a distributed cache layer, and an async event pipeline for analytics. The specific technology varies; the architectural pattern is universal.
Interview Perspective
The URL shortener is a favorite interview question precisely because it looks simple and reveals how deeply a candidate thinks about distributed systems. Interviewers aren’t evaluating whether you know the “right answer” — they’re watching your reasoning process.
How to Structure Your 30-Minute Answer
Common Mistakes
| Mistake | Why It’s a Red Flag |
|---|---|
| Jumping to code before clarifying requirements | Shows inability to scope ambiguous problems |
| Ignoring the 100:1 read/write ratio | The most important insight — missing it means missing the architecture |
| Making analytics synchronous | Fundamental misunderstanding of system design priorities |
| Not knowing 301 vs 302 difference | Reveals lack of HTTP protocol depth |
| Single-server design with no scale path | Shows inability to think beyond toy systems |
Frequently Asked Questions
Conclusion
Designing a URL shortener well is a masterclass in distributed systems fundamentals. The problem is simple enough to understand in minutes but deep enough to reveal every weakness in how you think about scalability, consistency, and real-world engineering trade-offs.
When you’ve internalized this design, the same thinking applies directly to link-in-bio platforms, QR code generators, redirect managers for A/B testing, or any system with similar read-heavy, low-latency characteristics.