What is the formula for RPS in a threaded model?

RPS = (cores × threads_per_core) / (response_time_in_seconds). Each worker thread handles one request at a time, so total workers divided by response time gives throughput. A safety margin of ~70–80% utilization keeps the queue from backing up.

Why does the async model produce higher RPS for I/O-bound work?

Async/event-loop servers do not block a thread while waiting on I/O (database, network, disk). A single thread can have hundreds of in-flight requests simultaneously. The bottleneck shifts to CPU cycles for business logic, not thread count.

What is a realistic threads-per-core value?

For CPU-bound work, 1–2 threads per core avoids context-switch overhead. For I/O-bound work (typical web APIs), 4–16 threads per core is common because threads spend most of their time waiting, not computing.

Does this include database or downstream service latency?

Yes — your average response time should include all downstream latency (DB queries, external API calls, cache misses). If an endpoint takes 200 ms end-to-end, use 200 ms, even if only 10 ms is CPU work.

How accurate are these estimates?

They are ceiling estimates under ideal conditions. Real-world RPS is lower due to garbage collection pauses, memory pressure, kernel scheduling, and uneven load distribution. Use the result as a planning benchmark, then validate with a load test (k6, Locust, wrk).

What utilization target should I plan for?

Keep steady-state utilization below 70–75% of the estimated ceiling. That headroom absorbs traffic spikes before autoscaling kicks in and prevents queue buildup from compounding latency.

Requests Per Second Capacity

Name: Requests Per Second Capacity
Availability: InStock
Author: Nham Vu

Enter your CPU core count, average response time, and concurrency model to estimate how many requests per second your server can handle.

Server Parameters

CPU Cores

Logical cores available to the server process (vCPUs count).

Avg Response Time (ms)

End-to-end latency per request including DB and downstream calls.

Concurrency Model

Thread-per-request (blocking)

Apache, gunicorn/Django, Tomcat, Rails Puma

Async / event-loop (non-blocking)

Node.js, Nginx, FastAPI async, Go net/http

Advanced

Threads per core

For I/O-bound work: 4–16. For CPU-bound: 1–2.

Target utilization (%)

Headroom before autoscaling. Recommended: 70–75%.

Estimated Capacity

Peak RPS (ceiling)

—

requests / second

Safe Operating RPS

—

at 70% utilization

Scale Reference

Servers	Peak RPS (total)	Safe RPS (total)	Req/min

Set your server parameters and click Calculate

Summary