DEV Community

Cover image for The Ultimate System Design Interview Cheatsheet (Visual Guide)
Mamoor Ahmad
Mamoor Ahmad Subscriber

Posted on • Edited on

The Ultimate System Design Interview Cheatsheet (Visual Guide)

System Design Cheatsheet

System design interviews can feel overwhelming — there's a mountain of concepts, and you never know which ones will come up. I put together a visual cheatsheet that covers the most essential topics, organized so you can see the big picture at a glance. 👇

Here's a topic-by-topic breakdown of everything on it. 🚀


1️⃣ Non-Functional Characteristics

Before designing anything, clarify the -ilities: availability, scalability, reliability, maintainability, latency, throughput, and consistency. These drive every architectural decision you'll make. 🎯

💡 Interview tip: Always ask about expected scale (QPS, data size, latency SLAs) before diving into a design.


2️⃣ CAP Theorem

You can only guarantee two of three:

  • 🔄 Consistency — every read gets the latest write
  • Availability — every request gets a response
  • 🌐 Partition Tolerance — the system works despite network splits

In distributed systems, P is non-negotiable, so you're really choosing between CP (banking, inventory) and AP (social feeds, DNS).


3️⃣ Horizontal vs. Vertical Scaling ⚖️

📈 Vertical 📊 Horizontal
How Bigger machine More machines
Limit Hardware ceiling Theoretically unlimited
Cost Exponential Linear-ish
Complexity Low High (needs load balancing, data partitioning)

Most production systems use horizontal scaling — it's the only way to handle massive traffic. 🏗️


4️⃣ DNS (Domain Name System) 🌍

DNS translates human-readable domains to IP addresses. Key concepts:

  • 🔍 Recursive resolvers do the heavy lifting
  • ⏱️ TTL controls caching duration
  • 🗺️ Geographic DNS routes users to the nearest data center

For system design, think about DNS as your first layer of traffic routing. 🛣️


5️⃣ Load Balancing ⚖️

Distributes traffic across multiple servers. Common algorithms:

  • 🔄 Round Robin — simple rotation
  • 📉 Least Connections — route to the least busy server
  • 🔗 IP Hash — sticky sessions by client IP
  • ⚖️ Weighted — more traffic to beefier servers

Works at Layer 4 (TCP) or Layer 7 (HTTP). Use health checks to automatically remove dead backends. 🏥


6️⃣ API Gateway 🚪

A single entry point for all client requests. Handles:

  • 🔐 Authentication & authorization
  • 🚦 Rate limiting
  • 🛤️ Request routing & transformation
  • 🔒 SSL termination
  • 📝 Logging & analytics

Think of it as the front door to your microservices architecture. 🏠


7️⃣ Content Delivery Network (CDN) 🌐

Caches static assets (images, CSS, JS, video) at edge locations close to users.

  • ⬆️ Push CDN — you upload content proactively
  • ⬇️ Pull CDN — fetches from origin on first request

Reduces latency dramatically. Pair with proper cache-control headers for best results. ⚡


8️⃣ Caching 💾

The fastest database query is the one you never make. 🎯

  • 🌐 Browser cache → CDN cache → ⚡ Application cache → 💽 Database cache
  • 🛠️ Tools: Redis, Memcached
  • 📋 Strategies: Cache-aside, Write-through, Write-behind, Read-through

⚠️ Watch out for: cache invalidation (hard), thundering herd, and stale data.


9️⃣ Polling vs. WebSockets 📡

🔄 Polling 🔌 WebSockets
Direction Client → Server Bidirectional
Latency Depends on interval Real-time
Overhead New HTTP connection each time Single persistent connection
Use case Email checks, dashboards Chat, live feeds, gaming

Long polling is a middle ground — the server holds the connection open until data is available. 🔗


🔟 Forward & Reverse Proxy 🛡️

  • ➡️ Forward proxy — sits in front of clients (VPN, ad blockers, corporate firewalls)
  • ⬅️ Reverse proxy — sits in front of servers (load balancer, API gateway, Nginx)

Both hide the real origin. Reverse proxies are a fundamental building block of scalable systems. 🧱


1️⃣1️⃣ Consistent Hashing 🔄

Solves the "what happens when we add/remove servers" problem.

  • 🗺️ Maps both servers and keys to a hash ring
  • 🔄 When a server is added/removed, only K/N keys need to be remapped (not all of them)
  • 🛠️ Used in distributed caches, database sharding, CDNs

Virtual nodes improve even distribution across the ring. 💫


1️⃣2️⃣ Database Types 🗄️

A quick taxonomy:

  • 📊 Relational (SQL): MySQL, PostgreSQL — structured data, ACID transactions
  • 📄 Document: MongoDB — flexible schemas, JSON-like storage
  • 🔑 Key-Value: Redis, DynamoDB — blazing fast lookups
  • 📈 Column-Family: Cassandra, HBase — wide-column, high write throughput
  • 🔗 Graph: Neo4j — relationships are first-class citizens
  • ⏱️ Time-Series: InfluxDB — metrics, IoT data

💡 Pick the right tool for the job. There's no "best" database.


1️⃣3️⃣ SQL vs. NoSQL ⚔️

📊 SQL 🍃 NoSQL
Schema Fixed Flexible
Scaling Vertical (mostly) Horizontal
Transactions Strong ACID Eventual consistency (usually)
Joins Native Application-level
Best for Complex queries, relationships Scale, flexibility, speed

Modern apps often use both — SQL for transactional data, NoSQL for caching/analytics. 🤝


1️⃣4️⃣ Database Scaling 📈

Two main strategies:

📖 Read Replicas

  • 📋 Copy data to multiple follower nodes
  • 🔄 Reads spread across replicas
  • ✍️ Writes go to the leader only

🔪 Sharding

  • ✂️ Split data across multiple databases
  • 📦 Each shard holds a subset of the data
  • 🧩 Hard problems: cross-shard queries, rebalancing

1️⃣5️⃣ Indexes 📇

A B-tree (or hash index) that makes lookups O(log n) instead of full table scans. ⚡

  • 📄 Single-column vs. 📑 composite indexes
  • 🎯 Covering index — query answered entirely from the index
  • ⚖️ Trade-off: faster reads, slower writes (index maintenance overhead)

💡 Rule of thumb: index columns used in WHERE, JOIN, and ORDER BY.


1️⃣6️⃣ Leader Election 👑

In distributed systems, you often need a single coordinator:

  • 🚀 Raft — understandable consensus (etcd, Consul)
  • 📚 Paxos — the classic (harder to implement)
  • 🏗️ ZooKeeper — battle-tested coordination service

Used in database replication, distributed locks, and task schedulers. 🔐


1️⃣7️⃣ Message Queues 📬

Decouple producers from consumers:

  • 🚀 Kafka — high throughput, durable, great for event streaming
  • 🐰 RabbitMQ — traditional broker, flexible routing
  • ☁️ SQS — managed, serverless-friendly

Benefits: buffering, async processing, retry logic, fan-out. 🎯


1️⃣8️⃣ Event-Driven Architecture ⚡

Systems communicate through events rather than direct calls:

  • 📤 Event producer → 🚌 Event bus → 📥 Event consumer
  • 🔗 Enables loose coupling and independent scaling
  • 🧩 Patterns: Event sourcing, CQRS, Saga

Think: "When X happens, trigger Y" at scale. 💭


1️⃣9️⃣ Microservices 🧱

Break a monolith into small, independently deployable services:

  • 📦 Each service owns its data and logic
  • 📡 Communicate via APIs or message queues
  • ⚖️ Trade simplicity for scalability and team autonomy

When to use: large teams, independent scaling needs, polyglot tech stacks.
When not to: small teams, early-stage products.


2️⃣0️⃣ Communication Patterns 📡

  • 🔄 Synchronous: REST, gRPC, GraphQL — request/response
  • Asynchronous: Message queues, event streams — fire and forget
  • 🚀 gRPC — binary, fast, great for inter-service communication
  • 🎯 GraphQL — client specifies exactly what data it needs

2️⃣1️⃣ Rate Limiting 🚦

Protect your system from abuse and overload:

  • 🪣 Token bucket — tokens refill at a fixed rate
  • 📊 Sliding window — counts requests in a rolling time window
  • 💧 Leaky bucket — processes at a constant rate

Implement at the API gateway level. Return 429 Too Many Requests with Retry-After header. 🛑


2️⃣2️⃣ Idempotency 🔁

The same request applied multiple times has the same effect as once.

Why it matters: network retries, message queue redelivery, double-clicks. 🖱️

How: use idempotency keys — client sends a unique key, server deduplicates. 🔑

💰 Critical for payment systems and any write operation.


2️⃣3️⃣ Bloom & Cuckoo Filters 🌸

Probabilistic data structures for "is this element in the set?" 🤔

  • 🌸 Bloom filter — space-efficient, no false negatives, possible false positives
  • 🐦 Cuckoo filter — supports deletion, better false positive rates

Use cases: cache hit prediction, spam filtering, preventing duplicate writes. 🎯


2️⃣4️⃣ Single Point of Failure (SPOF) 💀

Any component whose failure brings down the entire system.

Eliminate SPOFs with:

  • 🔄 Redundancy (multiple instances)
  • 🔀 Failover mechanisms
  • 🏥 Health checks + automatic recovery
  • 🌍 Geographic distribution

🗣️ Interview mantra: "What happens when this component dies?" ☠️


2️⃣5️⃣ Heartbeat 💓

Periodic "I'm alive" signals between components.

  • 💓 Server sends heartbeat to a monitor at regular intervals
  • ⏰ If heartbeat is missed → mark as unhealthy → trigger failover
  • 🛠️ Used in: leader election, cluster management, load balancer health checks

2️⃣6️⃣ Checksum ✅

Detects data corruption during transfer or storage.

  • 🔓 MD5 — fast but not cryptographically secure
  • 🔐 SHA-256 — secure, widely used
  • CRC32 — fast, good for error detection

Applied at: file transfers, network packets, distributed storage verification. 📁


2️⃣7️⃣ Database Replication 🔁

Copy data across multiple nodes:

  • 🔄 Synchronous — writes confirmed after all replicas update (strong consistency, higher latency)
  • Asynchronous — writes confirmed immediately, replicas catch up (eventual consistency, lower latency)

Leader-follower is the most common pattern. Multi-leader and leaderless for advanced use cases. 🏗️


2️⃣8️⃣ Database Sharding & Partitioning 🔪

  • 🔪 Sharding — horizontal split across databases/servers
  • 📊 Partitioning — split within a single database

Sharding strategies:

  • 📏 Range-based — by date, ID range
  • 🔢 Hash-based — hash the shard key
  • 📖 Directory-based — lookup table

🧩 Hard parts: rebalancing, cross-shard joins, hotspot avoidance.


🏁 Final Thoughts

This cheatsheet covers the 28 core concepts that come up again and again in system design interviews. You don't need to memorize everything — focus on understanding when and why to use each one. 🎯

The real skill in system design isn't knowing the tools. It's knowing which tools to reach for, and being able to explain your tradeoffs clearly. 💪

Good luck on your next interview. 🚀🔥


💬 What system design topic do you find trickiest? Drop a comment below! 👇

Related Reading

Top comments (0)