Lesson 03 · Metrics

Latency vs Throughput: Speed vs Volume

Imagine a garden hose. Latency is the time it takes for the first drop of water to travel from the tap to the nozzle. Throughput is the total amount of water coming out of the hose per second.

In System Design, it's the same! Understanding this difference is crucial to know if your system is "slow" or just "overloaded".

The Duel: Travel Time vs Traffic

Latency (Speed)

It's the waiting time. It's the stopwatch running between clicking a button and seeing the result appear.

Definition: Time to perform a single action from start to finish.
Unit: Milliseconds (ms).
Goal: Be as low as possible (fastest).

Throughput (Volume)

It's the processing capacity. It's how many people your site can handle simultaneously before crashing.

Definition: Number of successful actions per unit of time.
Unit: Requests per second (RPS) or GB/s (for data).
Goal: Be as high as possible (highest volume).

Visualization

Diagram comparing Latency (Speed) and Throughput (Volume) — Left: Latency is the travel time of a single packet. Right: Throughput is the total volume of packets passing through the pipe per second.

Comparison Table

Criteria	Latency	Throughput (Volume)
Key Question	"How long does it take for ONE user?"	"How many TOTAL users can I handle?"
Client Context	User Experience (Click, page load time)	Scalability (Handling millions of users)
How to improve?	Reduce distance (CDN), optimize code, use Cache (Redis)	Add servers (Scale Out), do things in parallel (Async)

The Trade-off

The Paradox: Improving one can degrade the other

It's the classic dilemma.

Take the example of "Batching" (grouping several small requests to send them as one big chunk to the database).

✅ Throughput: It's great! The database is more efficient, handling more total data.
❌ Latency: It's bad for the first request in the group, which has to wait for the others to arrive before being sent. Its waiting time increases.

Classic Pitfalls

The Lie of Averages

Never say "My average latency is 100ms". If 1% of your users wait 10 seconds, your average looks good, but those users are furious.
Solution: Speak in Percentiles (p95, p99). "99% of my users get a response in under 200ms". It's much more honest.

Latency ≠ Ping

"Ping" is just the network travel time. Total latency also includes time the server spends working (CPU) and time spent waiting in a queue (Queueing).
Reminder: Total Latency = Network Trip + Processing Time + Waiting.

In Interviews (The System Design Interview)

Clarify requirements from the start

Before drawing anything, ask the crucial question:

The Question:"What is most important for this system? Ultra-fast response for the user (Low Latency) or the ability to ingest terabytes of data (High Throughput)?"

Bonus Keyword:Mention "Tail Latency". It's the latency experienced by the slowest 1% or 0.1% of users (the famous p99 or p99.9). Showing you care about these edge cases always impresses interviewers.

Flash Quiz

Self-assessment

Your Turn

Mission: Identify the Bottleneck

Draw a simple flow: Client ➔ Server ➔ Database.

The timings:
1. Client ➔ Server: 50ms
2. Server ➔ Database (Read): 800ms (Ouch!)

Your mission: Where is the latency problem? Draw a solution (hint: a magic box between Server and DB) to drastically reduce this time.

For the Curious (Bonus)

Little's Law

It's a magical and very intuitive formula:
L = λ * W

In plain English: The number of customers in your store (L) equals the number of customers entering per minute (λ, throughput) multiplied by the average time they spend inside (W, latency). If your cashier is slow (high latency), your store fills up!

Jitter: Video's Enemy

Jitter is when latency isn't stable. Having a constant 100ms latency is often better than latency jumping from 20ms to 500ms constantly. Jitter is the #1 enemy of video calls (Zoom, Teams) and live streaming, as it causes stuttering.