Double click to toggle Read Mode.

System Design Course

Github Link

Table of Contents

  1. Computer Architecture
  2. Application Architecture
  3. Design Requirements
  4. Networking Basics
  5. TCP and UDP
  6. DNS, Dynamic DNS, and the DNS Lookup Process
  7. HTTP, RPC, Methods, Status Codes, SSL/TLS, HTTPS
  8. WebSocket and Polling
  9. API Paradigms
  10. API Design
  11. Caching
  12. Content Delivery Networks (CDNs) & Edge Computing
  13. Proxies and Load Balancers
  14. Consistent Hashing
  15. SQL
  16. NoSQL
  17. Replication and Sharding
  18. CAP Theorem
  19. Object Storage
  20. Message Queues

Computer Architecture

Key Concepts

1. CPU (Central Processing Unit)

2. Memory Hierarchy

a. Registers

b. CPU Cache

c. RAM (Random Access Memory)

d. Storage (Disk)


4. Network Interface

5. I/O Devices

6. Bus

Why It Matters in System Design

Common Bottlenecks

Caching: The Data Journey from Storage to CPU

Process Flow:

Explanation:

  1. Storage → RAM:

    • When a program or data is needed, it is loaded from the disk into RAM.
    • This is relatively slow (milliseconds), especially from HDDs.
  2. RAM → CPU Cache:

    • The CPU requests specific data. If it's not in the cache (a cache miss), it fetches it from RAM.
    • Caches use algorithms (like LRU - Least Recently Used) to decide which data to keep.
  3. CPU Cache → CPU Registers:

    • The most frequently accessed values are moved to registers for immediate use by the CPU.

Cache Efficiency:




02 - Application Architecture

Key Architectures

1. Monolithic Architecture

2. Microservices Architecture

3. Client-Server Model

4. N-Tier Architecture

5. Service-Oriented Architecture (SOA)


Scaling Applications

Vertical Scaling (Scaling Up)

Horizontal Scaling (Scaling Out)


Load Balancer


Observability: Logging, Metrics, and Alerts

Logging

Metrics

Alerts




03 - Design Requirements

1. Core System Actions

Move Data

Store Data

Transform Data


2. Non-Functional Requirements

Availability

Reliability

Fault Tolerance

Redundancy


3. Performance Metrics

Throughput

Latency


4. Security and Scalability

DDoS (Distributed Denial of Service)

CDN (Content Delivery Network)




04 - Networking Basics

Key Concepts

1. IP Addressing

2. Public vs Private IP

3. Static vs Dynamic IP


4. Ports

5. DNS (Domain Name System)

6. NAT (Network Address Translation)

7. MAC Address

8. Firewalls


Communication Models

Client-Server Model

Peer-to-Peer Model


The Request-Response Lifecycle (Web Example)

1. User Action

2. DNS Resolution

3. TCP Connection

4. HTTP Request Sent

5. Server Processing

6. HTTP Response Sent

7. Rendering the Page




05 - TCP and UDP

TCP (Transmission Control Protocol)

Characteristics:


TCP 3-Way Handshake

  1. SYN (Synchronize): Client sends a TCP packet with SYN flag to server.
  2. SYN-ACK (Synchronize-Acknowledge): Server responds with SYN and ACK.
  3. ACK (Acknowledge): Client sends final ACK, and the connection is established.

Once the handshake completes, data transfer begins.


TCP Connection Termination


Use Cases for TCP


UDP (User Datagram Protocol)

Characteristics:


How UDP Works


Use Cases for UDP




06 - DNS, Dynamic DNS, and the DNS Lookup Process

What is DNS?

DNS (Domain Name System) is the internet’s phonebook. It translates human-readable domain names (like openai.com) into machine-readable IP addresses (like 104.18.12.123), enabling devices to locate and communicate with each other.


Key Components of DNS


What is Dynamic DNS (DDNS)?

Dynamic DNS automatically updates DNS records whenever the device's IP changes (especially useful for devices on dynamic IPs, like most home internet connections).

Use Cases:

How It Works:


DNS Lookup Process (Step-by-Step)

Let’s say a user visits www.example.com. Here’s what happens:

1. Browser Cache

2. Operating System Cache

3. DNS Resolver (Usually ISP)

4. Root Server Query

5. TLD Server Query

6. Authoritative Server Query

7. IP Address Returned

8. Client Connects to Server

9. Caching for Speed


Example Timeline:

User types www.example.com ↓ Local browser/OS cache → miss ↓ Query to ISP DNS resolver (or 8.8.8.8) ↓ → Root server (where's .com?) → TLD server (where's example.com?) → Authoritative server (what's www.example.com?) ↓ IP returned (e.g., 93.184.216.34) ↓ Browser loads the website.




07 - HTTP, RPC, Methods, Status Codes, SSL/TLS, HTTPS

1. What is HTTP?

HTTP (HyperText Transfer Protocol) is the foundational protocol for data communication on the web. It defines how clients (usually browsers) communicate with servers.


2. What is RPC?

RPC (Remote Procedure Call) is a protocol that allows a program to execute a function or procedure on a remote server as if it were local.

HTTP vs RPC:

FeatureHTTPRPC
FormatText-based (usually JSON/HTML)Binary or JSON
AbstractionResource-based (/users)Function-based (getUser())
Use CaseWeb apps, REST APIsBackend microservice comms

3. Common HTTP Methods

MethodPurpose
GETRetrieve data (no side effects)
POSTSubmit data (e.g., form submission)
PUTReplace existing resource
PATCHPartially update a resource
DELETERemove a resource
HEADSame as GET, but no response body
OPTIONSDiscover supported methods

4. Common HTTP Status Codes

1xx: Informational

2xx: Success

3xx: Redirection

4xx: Client Errors

5xx: Server Errors


5. SSL/TLS and HTTPS

What is SSL/TLS?

How SSL/TLS Works (Simplified):

  1. Handshake: Client and server agree on encryption methods and exchange keys.
  2. Authentication: Server provides a digital certificate (usually issued by a trusted CA).
  3. Session Keys: Both sides derive session keys for encryption.
  4. Encrypted Communication: All subsequent data is encrypted using these keys.

6. What is HTTPS?




08 - WebSocket and Polling

Overview

Real-time communication is a key requirement in many modern applications, such as chat apps, multiplayer games, live updates, and collaborative tools. Two primary techniques for enabling this are WebSockets and Polling.


1. Polling

What is Polling?

Polling is a technique where the client repeatedly asks the server for new data at regular intervals.

Example:

Client sends a request every 5 seconds:

Server responds with latest data (even if there's no update).

Types of Polling

a. Short Polling

setInterval(() => { fetch("/new-messages") .then((res) => res.json()) .then((data) => console.log(data)); }, 3000); // every 3 seconds

b. Long Polling

Pros

Cons


2. WebSocket

What is WebSocket?

WebSocket is a protocol that enables full-duplex (two-way) communication between client and server over a single, long-lived connection.

WebSocket Lifecycle

  1. Handshake: Client sends HTTP request with Upgrade: websocket.
  2. Upgrade: Server responds with 101 Switching Protocols.
  3. Open Connection: Persistent TCP connection is established.
  4. Data Exchange: Both client and server can send data anytime.
  5. Close Connection: Either party can close the connection.

Example Use Cases

// Client const socket = new WebSocket("wss://example.com/chat"); socket.onopen = () => socket.send("Hello!"); socket.onmessage = (event) => console.log("Server:", event.data); // Server const WebSocket = require("ws"); const wss = new WebSocket.Server({ port: 8080 }); wss.on("connection", (ws) => { ws.on("message", (message) => console.log("Client:", message)); ws.send("Welcome!"); });

Pros

Cons


3. Polling vs WebSocket

FeaturePollingWebSocket
ProtocolHTTPTCP (after HTTP upgrade)
DirectionClient → ServerBidirectional
LatencyHigherVery low
OverheadHigh (many HTTP requests)Low (single persistent connection)
ComplexitySimpleModerate to complex
Use CaseSimple updatesReal-time apps



09 - API Paradigms

Overview

APIs (Application Programming Interfaces) define how different software components communicate. Choosing the right API paradigm impacts performance, scalability, and developer experience. This lesson covers major API paradigms:


1. REST (Representational State Transfer)

Characteristics

Sessions & Cookies

Example

GET /users/123

Returns user data for user with ID 123.

Pros

Cons


2. GraphQL

Characteristics

Query and Mutation

Overfetching & Underfetching

Example Query

{ user(id: "123") { name email } }

Pros

Cons


3. gRPC

Characteristics

Protocol Buffers

Streaming Support

Pros

Cons


4. Other Paradigms

WebSockets

Traditional RPC (Remote Procedure Call)




10 - API Design

Introduction to API Design

API (Application Programming Interface) design is the process of developing APIs that effectively expose data and application functionality for consumption by developers and applications. Good API design is crucial for developer experience, system scalability, and long-term maintenance.

Core API Design Paradigms

REST (Representational State Transfer)

REST is an architectural style for distributed systems, particularly web services. RESTful APIs use HTTP methods explicitly and have the following characteristics:

REST Maturity Levels (Richardson Maturity Model)

  1. Level 0: Single URI, single HTTP method (typically POST)
  2. Level 1: Multiple URIs for different resources, but still a single HTTP method
  3. Level 2: HTTP verbs used as intended (GET, POST, PUT, DELETE)
  4. Level 3: HATEOAS (Hypermedia as the Engine of Application State) - APIs provide hyperlinks to guide clients

RPC (Remote Procedure Call)

RPC focuses on actions and functions rather than resources:

gRPC

GraphQL

A query language and runtime for APIs:

CRUD Operations

CRUD represents the four basic operations for persistent storage:

OperationDescriptionHTTP Method (REST)SQL
CreateAdd new resourcesPOSTINSERT
ReadRetrieve resourcesGETSELECT
UpdateModify existing resourcesPUT/PATCHUPDATE
DeleteRemove resourcesDELETEDELETE

REST CRUD Mapping

POST /users             # Create a user
GET /users              # Read all users
GET /users/{id}         # Read specific user
PUT /users/{id}         # Update user (full replacement)
PATCH /users/{id}       # Update user (partial modification)
DELETE /users/{id}      # Delete user

URL Design Best Practices

Resource Naming

Hierarchy and Relationships

Query Parameters vs Path Parameters

Pagination Strategies

Limit-Offset Pagination

GET /products?limit=10&offset=20

Pros:

Cons:

Cursor-Based Pagination

Uses a pointer (cursor) to a specific item:

GET /products?limit=10&after=eyJpZCI6MTAwfQ==

Pros:

Cons:

Page-Based Pagination

Uses page numbers:

GET /products?page=2&per_page=10

Pros:

Cons:

Response Formats

JSON Structure Best Practices

{ "data": { "id": 123, "name": "Example Product", "price": 99.99 }, "meta": { "timestamp": "2023-04-10T15:00:00Z", "version": "1.0" } }

Collection Responses with Pagination

{ "data": [ { "id": 1, "name": "Product A" }, { "id": 2, "name": "Product B" } ], "pagination": { "total_items": 50, "total_pages": 5, "current_page": 1, "per_page": 10, "next": "/products?page=2&per_page=10", "prev": null } }

Error Handling

HTTP Status Codes

Error Response Format

{ "error": { "code": "VALIDATION_ERROR", "message": "Invalid input parameters", "details": [ { "field": "email", "message": "Must be a valid email address" } ] } }

Versioning Strategies

URI Path Versioning

/api/v1/products
/api/v2/products

Query Parameter Versioning

/api/products?version=1

Header Versioning

Accept: application/vnd.myapi.v2+json

Content Negotiation

Accept: application/vnd.myapi+json; version=2.0

Authentication & Authorization

Authentication Methods

Authorization Patterns

API Documentation

Documentation Formats

Documentation Best Practices

Performance Considerations

Security Best Practices

Hypermedia and HATEOAS

{ "id": 123, "name": "Example Product", "price": 99.99, "_links": { "self": { "href": "/products/123" }, "manufacturer": { "href": "/manufacturers/456" }, "related": { "href": "/products/123/related" } } }

API Design Tools




11 - Caching

Introduction to Caching

Caching is a technique that stores copies of frequently accessed data in a high-speed storage layer (the cache), allowing future requests for that data to be served faster. Caching plays a crucial role in improving application performance, reducing latency, decreasing network traffic, and minimizing load on backend servers.

Why Use Caching?

Key Caching Metrics

Throughput

Throughput in caching refers to the number of requests a cache can process per unit of time (typically measured in requests per second).

Cache Hit Ratio

The percentage of requests that are served from the cache:

Hit Ratio = (Cache Hits / Total Requests) × 100%

Types of Caches

Client-Side Caching

Server-Side Caching

Network Caching

Caching Strategies

Cache-Aside (Lazy Loading)

  1. Application checks the cache for data
  2. If found (cache hit), data is returned
  3. If not found (cache miss), data is fetched from the database
  4. Application stores fetched data in cache for future use
function getData(key) {
    data = cache.get(key)
    if (data == null) {
        data = database.get(key)
        cache.put(key, data)
    }
    return data
}

Pros:

Cons:

Write-Through

  1. Application updates the database
  2. Immediately after, application updates the cache
function saveData(key, value) {
    database.put(key, value)
    cache.put(key, value)
    return success
}

Pros:

Cons:

Write-Behind (Write-Back)

  1. Application updates the cache
  2. Cache asynchronously updates the database
function saveData(key, value) {
    cache.put(key, value)
    cacheQueue.queueForPersistence(key, value)
    return success
}

// Separate process
function persistCacheUpdates() {
    while (item = cacheQueue.dequeue()) {
        database.put(item.key, item.value)
    }
}

Pros:

Cons:

Write-Around

  1. Application writes directly to the database, bypassing cache
  2. Cache is only populated when data is read
function saveData(key, value) {
    database.put(key, value)
    return success
}

function getData(key) {
    data = cache.get(key)
    if (data == null) {
        data = database.get(key)
        cache.put(key, data)
    }
    return data
}

Pros:

Cons:

Refresh-Ahead

Proactively refreshes frequently accessed items before they expire

function setUpRefreshAhead(key, refreshThreshold) {
    item = cache.get(key)
    if (item.accessCount > threshold && item.expiryTime - now() < refreshThreshold) {
        asyncRefresh(key)
    }
}

function asyncRefresh(key) {
    data = database.get(key)
    cache.put(key, data)
}

Pros:

Cons:

Cache Eviction Policies

When a cache reaches its capacity limit, eviction policies determine which items to remove to make space for new data.

Least Recently Used (LRU)

Removes the items that haven't been accessed for the longest time.

Least Frequently Used (LFU)

Removes items that are accessed least often. Tracks access frequency (hit count) for each item.

Other Notable Eviction Policies

Redis as a Caching Solution

Redis (Remote Dictionary Server) is an open-source, in-memory data structure store that can be used as a database, cache, message broker, and streaming engine.

Key Redis Features

Redis provides several eviction policies configured via the maxmemory-policy setting:

Throughput Optimization in Caching Systems

Throughput is a critical metric measuring a cache's request processing capacity. Several factors influence and can be optimized to improve throughput.

Hardware Considerations

Software Optimizations

Caching Architecture for Throughput

Measuring and Monitoring Throughput

Cache Consistency Patterns

TTL-Based Invalidation

Explicit Invalidation

function updateData(key, value) {
    database.put(key, value)
    cache.invalidate(key)  // OR cache.put(key, value)
}

Event-Based Invalidation

function updateData(key, value) {
    database.put(key, value)
    messageBus.publish("data-change", {key: key})
}

// Cache subscriber
messageBus.subscribe("data-change", function(message) {
    cache.invalidate(message.key)
})

Version Stamping / ETag

HTTP Caching

Cache-Control Headers

Cache-Control: max-age=3600, public

Caching Challenges

Cache Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things."
- Phil Karlton

Cache Stampede/Thundering Herd

When many requests simultaneously try to refresh an expired cache item

Solutions:




Content Delivery Networks (CDNs) & Edge Computing

Introduction to CDNs

A Content Delivery Network (CDN) is a distributed network of servers deployed in multiple data centers across different geographic locations. CDNs are designed to deliver content to end-users with high availability and performance by serving content from edge servers that are physically closer to users than origin servers.

Core CDN Concepts

Purpose and Benefits

Key CDN Components

How CDNs Work

  1. User requests content from a website/application
  2. DNS routes the request to the nearest CDN edge server
  3. The edge server checks its cache for requested content
  4. If content is in cache (cache hit), it's delivered directly to the user
  5. If content is not cached (cache miss), the edge server requests it from the origin
  6. The edge server caches the content and delivers it to the user
  7. Subsequent requests for the same content are served from the edge cache

CDN Architecture Types

Traditional Pull CDNs

In a pull CDN, content is "pulled" from the origin server when first requested by a user and not found in the edge cache.

Characteristics:

Examples: Cloudflare, Amazon CloudFront (in pull mode), Akamai

Push CDNs

In a push CDN, content is proactively "pushed" to the edge servers before users request it.

Characteristics:

Examples: Amazon CloudFront (in push mode), Azure CDN

CDN Content Types

Static Content Delivery

Static content doesn't change between user requests and is ideal for CDN caching:

Optimization techniques:

Dynamic Content Acceleration

While traditionally challenging to cache, modern CDNs offer ways to optimize dynamic content:

Edge Computing

Edge computing extends the CDN concept by not just caching content but also running code at edge locations closer to users.

Edge Computing vs. Traditional Cloud

Traditional CloudEdge Computing
Centralized processingDistributed processing
Higher latencyLower latency
Potentially higher bandwidth costsReduced bandwidth requirements
Consistent resourcesVariable resources by location
Easier managementMore complex orchestration

Edge Computing Use Cases

Edge Functions/Workers

Many CDNs now offer serverless functions that run at the edge:

CDN Edge Computing Platforms

Proxy Servers and CDNs

CDNs function as a specialized form of reverse proxy, but understanding the broader proxy server concepts helps clarify their role.

Types of Proxies

Forward Proxy

Acts on behalf of clients, forwarding their requests to servers.

Use cases:

Reverse Proxy

Acts on behalf of servers, receiving client requests and forwarding them to appropriate backend servers.

Use cases:

CDN as a Specialized Reverse Proxy

CDNs extend the reverse proxy concept with:

Global CDN Providers




13 - Proxies and Load Balancers

Forward Proxy vs. Reverse Proxy

Forward Proxy:

Reverse Proxy:

Load Balancing Fundamentals

Load balancing distributes incoming traffic across multiple servers to ensure:

Essential Load Balancing Algorithms

  1. Round Robin

    • Routes requests sequentially across servers
    • Simple but doesn't account for server load
    • Example: Request 1 → Server A, Request 2 → Server B, etc.
  2. Least Connections

    • Routes to server with fewest active connections
    • Better for varying request durations
    • Prevents overloading busy servers
  3. IP Hash

    • Uses client IP to determine server
    • Ensures client always reaches same server
    • Critical for session persistence
  4. Weighted Algorithms

    • Assigns capacity values to servers
    • More powerful servers receive proportionally more traffic
    • Example: Server A (weight 3) gets 3x traffic of Server B (weight 1)

Layer 4 vs. Layer 7 Load Balancing

Layer 4 (Transport):

Layer 7 (Application):

Health Checks

Active Monitoring:

Passive Monitoring:

Session Persistence

Problem: Stateful applications need consistent routing to same server

Solutions:

Common Load Balancing Patterns

  1. High Availability Pairs

    • Two load balancers (active/passive)
    • Heartbeat connection between them
    • Failover if primary goes down
  2. Global Server Load Balancing (GSLB)

    • Distributes traffic across multiple data centers
    • Uses DNS or anycast routing
    • Provides geographic redundancy
  3. Direct Server Return (DSR)

    • Response traffic bypasses load balancer
    • Reduces load balancer bandwidth requirements
    • More complex to implement

Software:

Hardware:

Cloud Services:

SSL Termination

Common Challenges & Solutions

  1. Single Point of Failure

    • Solution: Load balancer redundancy (active/passive pair)
  2. Uneven Load Distribution

    • Solution: Dynamic weighting, least connections algorithm
  3. Session Management

    • Solution: Sticky sessions, shared session storage
  4. SSL Performance

    • Solution: SSL acceleration hardware, session caching

Monitoring Metrics That Matter

When to Scale




14 - Consistent Hashing

The Challenge of Distributed Data Placement

Imagine you have a large collection of data items that need to be distributed across multiple servers. How do you decide which server should handle which data item? This decision becomes particularly challenging when servers are added or removed from the system.

Traditional Approach and Its Limitations

Traditional Method: Using modulo arithmetic (hash(key) % number_of_servers)

Example:

Problem: When you add a 5th server, almost all keys need to be remapped:

Consistent Hashing

Conceptual Model: The Hash Ring

Imagine a circular ring numbered from 0 to 359 degrees (like a compass). Both servers and data keys are placed on this ring using a hash function.

How it works:

  1. Place each server at positions on the ring based on their hash values
  2. For each data key, find its position on the ring
  3. Move clockwise from the key's position and assign it to the first server encountered

Concrete Example:

If we have data keys that hash to positions 30, 110, 200, and 290:

Server Addition Example:

Server Removal Example:

Virtual Nodes

Problem: With few servers, distribution can be unbalanced

Solution: Each physical server is represented by multiple points on the ring

Example:

Real-World Analogy

Think of consistent hashing like postal delivery zones. Each server is responsible for a "zone" on the ring. When a new post office opens, it only takes over a portion of one existing zone, rather than changing all zone boundaries.

Rendezvous Hashing (Highest Random Weight)

Conceptual Model: Scoring Contest

Imagine that for each data item, all servers compete in a "contest" to determine who will store that item. The contest is deterministic but unique for each key-server pair.

How it works:

  1. For a given data key, each server generates a score based on hash(server_id + key)
  2. The server with the highest score "wins" the key

Concrete Example: With servers A, B, C, D and data key "user_123":

ServerScore for "user_123"
A82
B45
C96
D31

For a different key "product_456":

ServerScore for "product_456"
A74
B91
C23
D57

Server Addition Example:

Server Removal Example:

Real-World Analogy

Think of rendezvous hashing like a specialized job assignment. Each data item is a job with unique requirements, and each server has different skills for different jobs. The server that's the best match for a particular job gets assigned that job.

Practical Applications

Consistent Hashing Applications:

Rendezvous Hashing Applications:

When to Choose Which




15 - SQL

SQL (Structured Query Language)

SQL is a domain-specific language used for managing and manipulating relational databases. It serves as the standard language for relational database management systems (RDBMS).

Core Components of SQL

1. Data Definition Language (DDL)

Commands that define and modify database structure:

2. Data Manipulation Language (DML)

Commands that manipulate data within tables:

3. Data Control Language (DCL)

Commands that control access to data:

4. Transaction Control Language (TCL)

Commands that manage transactions:

Key SQL Concepts

Joins

Connect rows from multiple tables based on related columns:

Indexes

Special data structures that improve the speed of data retrieval operations:

Constraints

Rules enforced on data columns:

Aggregate Functions

Operations that perform calculations on sets of values:

B+ Trees

B+ Trees are self-balancing tree data structures that maintain sorted data and allow for efficient insertion, deletion, and search operations. They are the most common implementation for indexes in database systems.

Structure of B+ Trees

1. Nodes

2. Properties

B+ Tree Operations

Search Operation

  1. Start at the root node
  2. For each level, find the appropriate subtree based on key comparison
  3. Continue until reaching a leaf node
  4. Scan the leaf node for the target key

Range Queries

  1. Search for the lower bound key
  2. Once found in a leaf node, traverse the linked list of leaf nodes
  3. Continue until reaching the upper bound
  4. Time complexity: O(log n + k) where k is the number of elements in the range

Advantages in Database Systems

ACID Properties

ACID is an acronym that represents a set of properties ensuring reliable processing of database transactions.

The Four ACID Properties

1. Atomicity

Example: A bank transfer must either complete fully (debit one account and credit another) or not happen at all. Partial completion is not acceptable.

2. Consistency

Example: If a table has a constraint that account balances cannot be negative, any transaction resulting in a negative balance will be rejected.

3. Isolation

Example: When two users update the same data simultaneously, isolation ensures one user's changes don't overwrite or interfere with the other's.

4. Durability

Example: After confirming a payment, the data is permanently stored even if the database crashes immediately afterward.

ACID vs. BASE

Modern distributed systems sometimes use BASE (Basically Available, Soft state, Eventually consistent) as an alternative to ACID:

ACIDBASE
Strong consistencyEventual consistency
High isolationLower isolation
Focus on reliabilityFocus on availability
Traditional RDBMSOften used in NoSQL systems



16 - NoSQL

NoSQL ("Not Only SQL") databases are non-relational database systems designed to handle various data models, provide horizontal scalability, and deliver high performance for specific use cases.

Key Characteristics

NoSQL Data Models

1. Key-Value Stores

The simplest NoSQL model, storing data as key-value pairs:

2. Document Stores

Store semi-structured data in document format (typically JSON or BSON):

3. Column-Family Stores

Store data in column families - groups of related data:

4. Graph Databases

Specialized for highly connected data:

Key-Value Stores

Document Stores

Column-Family Stores

Graph Databases

Graph Databases in Depth

Graph databases excel at managing highly connected data by focusing on the relationships between entities.

Core Components

Advantages of Graph Databases

Common Use Cases

BASE Properties

BASE is an alternative to ACID for distributed systems, emphasizing availability over consistency:

ACID vs. BASE Comparison

ACIDBASE
Strong consistencyEventual consistency
IsolationAvailability first
Focus on reliabilityFocus on scalability
Pessimistic approachOptimistic approach
Difficulty scaling horizontallyDesigned for horizontal scaling

Scaling Strategies for Databases

Vertical Scaling (Scale Up)

Horizontal Scaling (Scale Out)

Read Replicas

Sharding

Sharding divides data across multiple servers, with each server responsible for a subset of the data.

Sharding Strategies

  1. Range-Based Sharding

    • Divides data based on ranges of a key
    • Example: Users A-M on shard 1, N-Z on shard 2
    • Pros: Simple to implement, efficient range queries
    • Cons: Potential for hot spots, uneven distribution
  2. Hash-Based Sharding

    • Uses a hash function to distribute data evenly
    • Example: hash(user_id) % num_shards determines placement
    • Pros: Even distribution, prevents hot spots
    • Cons: Difficult to perform range queries, resharding complexity
  3. Directory-Based Sharding

    • Uses a lookup service to track data location
    • Example: Shard manager service that maps keys to shards
    • Pros: Flexible, allows for dynamic resharding
    • Cons: Lookup service becomes single point of failure
  4. Geographically-Based Sharding

    • Locates data near its users
    • Example: European users on EU servers, US users on US servers
    • Pros: Reduced latency for users, regulatory compliance
    • Cons: Cross-region operations can be slow

Sharding Challenges

NoSQL vs. SQL: When to Choose Each

Choose NoSQL When:

Choose SQL When:




17 - Replication and Sharding

Replication

Overview

Replication is the process of storing copies of the same data on multiple machines. This provides:

Leader-Follower Replication

Multi-Leader Replication

Leaderless Replication

Synchronous vs Asynchronous Replication

Synchronous Replication

Asynchronous Replication

Semi-Synchronous Replication

Sharding

Overview

Sharding (also called partitioning) splits a large dataset across multiple machines to distribute load and improve scalability.

Sharding Strategies

Range-Based Sharding

Hash-Based Sharding

Directory-Based Sharding

Rebalancing Strategies

Combining Replication and Sharding

Common Challenges

Consistency Issues

Rebalancing Overhead

Hot Spots

Database Systems

Caching Systems

Storage Systems




18 - CAP Theorem

Overview

The CAP theorem, formulated by Eric Brewer in 2000, states that a distributed data system can only guarantee at most two out of the following three properties simultaneously:

The Three Properties Explained

Consistency

Availability

Partition Tolerance

CAP Theorem in Practice

CA Systems (Consistency + Availability)

CP Systems (Consistency + Partition Tolerance)

AP Systems (Availability + Partition Tolerance)

Consistency Models

Practical Considerations

System Design Tradeoffs

Handling Partition Scenarios

Regional Considerations




19 - Object Storage

Overview

Object storage is a data storage architecture that manages data as discrete units called objects, rather than as files in a hierarchical file system or blocks in a block storage system. Objects are stored in a flat address space, making object storage highly scalable and well-suited for unstructured data.

Key Characteristics

Object Structure

Architecture

Advantages

Limitations

Use Cases

Storage Classes & Tiering

Public Cloud

Object Storage vs. Other Storage Types

Object vs. File Storage

Object vs. Block Storage

Blob Storage

Blob (Binary Large Object) storage is a subset of object storage optimized for storing large unstructured data.

Key Features

Storage Tiers




20 - Message Queues

Message Queues: Fundamental Concepts

Message queues are asynchronous communication mechanisms that enable services to communicate without being directly connected. They serve as intermediaries for passing messages between different parts of a system.

Core Components

Key Benefits

Messaging Patterns

Publish-Subscribe (Pub/Sub)

In this pattern, publishers send messages to topics, and subscribers receive all messages from the topics they subscribe to.

Characteristics

Use Cases

Message Delivery Models

Push Model

In the push model, the broker actively sends messages to consumers as they arrive.

Characteristics

Advantages

Challenges

Pull Model

In the pull model, consumers request messages from the broker when they're ready to process them.

Characteristics

Advantages

Challenges

Fan-Out Pattern

Fan-out distributes a single message to multiple destinations simultaneously.

Implementation Approaches

Use Cases

Apache Kafka

Kafka is a distributed streaming platform designed for high-throughput, fault-tolerant, publish-subscribe messaging.

Key Concepts

Architecture Highlights

Strengths

Best Use Cases

RabbitMQ

RabbitMQ is a message broker implementing the Advanced Message Queuing Protocol (AMQP) with rich routing capabilities.

Key Concepts

Exchange Types

Strengths

Best Use Cases

Comparing Kafka and RabbitMQ

FeatureKafkaRabbitMQ
Primary PatternPub/Sub with consumer groupsFlexible (direct, fanout, topic)
Storage ModelPersistent logMemory with optional persistence
Message RetentionConfigurable (time/size based)Until consumed (typically)
ThroughputVery high (millions/sec)Moderate to high (tens of thousands/sec)
Delivery ModelPull-basedPush-based by default
OrderingGuaranteed within partitionsFIFO per queue
RedeliveryConsumer managedAutomatic with acknowledgments
Routing ComplexitySimple (topic/partition)Rich (exchanges/bindings)