Linux 7.0 Cuts PostgreSQL Performance in Half


TL;DR

  • Performance Drop: An AWS engineer confirmed that PostgreSQL throughput falls to 0.51x on Linux 7.0 due to the removal of the PREEMPT_NONE scheduling option.
  • Root Cause: Kernel commit 7dadeaa6e851 by Intel developer Peter Zijlstra restricts modern CPU architectures to Full and Lazy preemption only, disrupting PostgreSQL’s spinlock-heavy buffer management.
  • Fix Disputed: Kernel developers are directing PostgreSQL to adopt the rseq time slice extension rather than reverting the change, with no resolution guaranteed before stable release.
  • Upgrade Risk: Database operators on Ubuntu 26.04 LTS, which ships with Linux 7.0, must decide now whether to delay upgrades or pin an older kernel.

An AWS engineer reported on April 3 that PostgreSQL throughput dropped to roughly half on Linux 7.0, with benchmark data tracing the cause to a deliberate kernel change that removed the PREEMPT_NONE scheduling option. On a 96-vCPU Graviton4 instance, throughput fell to 0.51x compared to prior kernel versions. With the stable release approximately two weeks away and set to power Ubuntu 26.04 LTS, kernel maintainers are pushing the fix burden onto PostgreSQL rather than reverting the change.

How the Regression Was Found

Salvatore Dipietro of Amazon/AWS reported a throughput regression after benchmarking PostgreSQL 17 on a 96-vCPU Graviton4 instance (EC2 m8g.24xlarge) using pgbench with 1,024 clients, 96 threads, and a 1,200-second duration. Running a simple-update workload with a scale factor of 8,470 and fillfactor of 90 on AL2023 with 12 IO2 volumes in RAID0 on XFS, Linux 7.0 delivered just 0.51x the throughput of prior kernel versions.

Dipietro identified the root cause through bisection, tracing it in the kernel patch submission to commit 7dadeaa6e851, introduced in v7.0-rc1 and authored by Intel kernel developer Peter Zijlstra. Titled “sched: Further restrict the preemption modes,” the commit removes PREEMPT_NONE as the default and limits modern CPU architectures to Full and Lazy preemption only. By forcing the kernel to use PREEMPT_LAZY, the scheduler can preempt threads more aggressively than PREEMPT_NONE allowed.

Under PREEMPT_NONE, a thread holding a spinlock could complete its operation without being preempted. Under PREEMPT_LAZY, the scheduler can interrupt the lock holder, forcing other threads to spin longer waiting for the lock. PostgreSQL relies heavily on short-held spinlocks for buffer pool management, and PREEMPT_LAZY disrupts this pattern by allowing preemption during what were previously uninterruptible sequences.

What makes the regression so severe at scale is the compounding nature of spinlock contention. Perf profiling reveals that 55% of CPU time goes to spinning in PostgreSQL’s spinlock (s_lock()) under the new preemption model, specifically in the StrategyGetBuffer/GetVictimBuffer buffer management call path. Each preempted lock holder causes dozens of waiting threads to consume CPU cycles spinning rather than doing useful work. On a 96-vCPU system running 1,024 concurrent clients, that waste multiplies across every core simultaneously, leaving the database burning more than half its CPU budget on lock contention rather than query execution.