What is replication lag?

Replication lag is the delay between a write being committed on the primary database and that change being applied on a replica. During this window the replica holds stale data and, if the primary fails, those uncommitted changes can be lost.

Recovery Point Objective (RPO) is the maximum amount of data loss — measured in time — that a business can tolerate after a failure. An RPO of 60 seconds means no more than 60 seconds of committed writes may be lost.

How is "bytes at risk" calculated?

Bytes at risk = write rate (bytes/sec) x lag (seconds). It represents the volume of data written to the primary that has not yet been applied on the replica at the moment of failure.

What alert threshold percentage should I use?

A common rule of thumb is 50–70% of RPO. This gives the on-call engineer time to investigate before the replica breaches the SLA limit. For critical systems with slow human response, 30–40% is safer.

Does this tool apply to MySQL, PostgreSQL, and other databases?

Yes. The math is database-agnostic — it depends only on write rate and acceptable lag duration, which apply equally to MySQL async replication, PostgreSQL streaming replication, MongoDB replica sets, and similar systems.

What write rate unit should I use?

Use whichever is easiest to measure. If you track transactions per second, enter rows/sec and also supply an average row size to get the bytes-at-risk figure. If you monitor WAL or binlog throughput in MB/sec, switch the unit selector to MB/sec.

Database Replication Lag Helper

Name: Database Replication Lag Helper
Availability: InStock
Author: Nham Vu

Enter your write rate and SLA to get the max tolerable replication lag, estimated data-loss window, and bytes at risk.

Configuration

Write throughput on primary

Average write load on the primary database.

Recovery Point Objective (RPO)

Maximum data loss your SLA allows after a primary failure.

Alert threshold — % of RPO (50%)

10% (conservative) 90% (aggressive)

Trigger an alert before lag reaches the full RPO limit.

Replication mode

Asynchronous Semi-synchronous Synchronous

Max Tolerable Lag

—

seconds

Alert Trigger At

—

seconds

Data at Risk

—

at max lag

Lag Safety Zones

Safe

Warning

Breach

0s — —

Full Breakdown

Status

Enter values and click Calculate

Summary