Linux 7.0 Cuts PostgreSQL Performance in Half

TL;DR

Performance Drop: An AWS engineer confirmed that PostgreSQL throughput falls to 0.51x on Linux 7.0 due to the removal of the PREEMPT_NONE scheduling option.
Root Cause: Kernel commit 7dadeaa6e851 by Intel developer Peter Zijlstra restricts modern CPU architectures to Full and Lazy preemption only, disrupting PostgreSQL’s spinlock-heavy buffer management.
Fix Disputed: Kernel developers are directing PostgreSQL to adopt the rseq time slice extension rather than reverting the change, with no resolution guaranteed before stable release.
Upgrade Risk: Database operators on Ubuntu 26.04 LTS, which ships with Linux 7.0, must decide now whether to delay upgrades or pin an older kernel.

An AWS engineer reported on April 3 that PostgreSQL throughput dropped to roughly half on Linux 7.0, with benchmark data tracing the cause to a deliberate kernel change that removed the PREEMPT_NONE scheduling option. On a 96-vCPU Graviton4 instance, throughput fell to 0.51x compared to prior kernel versions. With the stable release approximately two weeks away and set to power Ubuntu 26.04 LTS, kernel maintainers are pushing the fix burden onto PostgreSQL rather than reverting the change.

How the Regression Was Found

Salvatore Dipietro of Amazon/AWS reported a throughput regression after benchmarking PostgreSQL 17 on a 96-vCPU Graviton4 instance (EC2 m8g.24xlarge) using pgbench with 1,024 clients, 96 threads, and a 1,200-second duration. Running a simple-update workload with a scale factor of 8,470 and fillfactor of 90 on AL2023 with 12 IO2 volumes in RAID0 on XFS, Linux 7.0 delivered just 0.51x the throughput of prior kernel versions.

Dipietro identified the root cause through bisection, tracing it in the kernel patch submission to commit 7dadeaa6e851, introduced in v7.0-rc1 and authored by Intel kernel developer Peter Zijlstra. Titled “sched: Further restrict the preemption modes,” the commit removes PREEMPT_NONE as the default and limits modern CPU architectures to Full and Lazy preemption only. By forcing the kernel to use PREEMPT_LAZY, the scheduler can preempt threads more aggressively than PREEMPT_NONE allowed.

Under PREEMPT_NONE, a thread holding a spinlock could complete its operation without being preempted. Under PREEMPT_LAZY, the scheduler can interrupt the lock holder, forcing other threads to spin longer waiting for the lock. PostgreSQL relies heavily on short-held spinlocks for buffer pool management, and PREEMPT_LAZY disrupts this pattern by allowing preemption during what were previously uninterruptible sequences.

What makes the regression so severe at scale is the compounding nature of spinlock contention. Perf profiling reveals that 55% of CPU time goes to spinning in PostgreSQL’s spinlock (s_lock()) under the new preemption model, specifically in the StrategyGetBuffer/GetVictimBuffer buffer management call path. Each preempted lock holder causes dozens of waiting threads to consume CPU cycles spinning rather than doing useful work. On a 96-vCPU system running 1,024 concurrent clients, that waste multiplies across every core simultaneously, leaving the database burning more than half its CPU budget on lock contention rather than query execution.

With a revert patch applied, throughput recovered to 1.94x baseline, averaging 98,565 tps versus 50,751 tps across three runs. Recovery to near-original performance confirms the preemption change as the sole cause rather than a coincidental PostgreSQL or hardware issue.

Why the Kernel Changed

Far from accidental, the preemption mode restriction reflects a deliberate design decision. Linux 7.0 restricts available preemption modes to Full and Lazy only for modern CPU architectures, including arm64, x86, powerpc, riscv, s390, and loongarch. Removing PREEMPT_NONE and PREEMPT_VOLUNTARY on these platforms means any application relying on the old non-preemptive behavior now faces the same class of regression PostgreSQL is experiencing.

Zijlstra designed the change to address longstanding problems with the kernel’s scheduling model, particularly the proliferation of cond_resched() calls scattered throughout the kernel codebase to provide voluntary preemption points. PREEMPT_RT, the real-time kernel variant, had long suffered from over-scheduling that hurt performance compared to non-RT kernels. PREEMPT_LAZY was designed to provide a middle ground: preemption happens, but lazily, reducing scheduling overhead while still allowing the kernel to interrupt long-running tasks. Eliminating PREEMPT_NONE from modern architectures was the final step in making this model universal.

In the commit message, Zijlstra explained his rationale:

“The introduction of PREEMPT_LAZY was for multiple reasons: PREEMPT_RT suffered from over-scheduling, hurting performance compared to !PREEMPT_RT… By moving to a model that is fundamentally preemptable these things become manageable and avoid needing to introduce more horrible hacks.”

Peter Zijlstra, kernel developer at Intel (via git commit 7dadeaa6e851)

Zijlstra also designed the change with fallback in mind, writing to “keep the patch minimal in case of hard to address regressions that might pop up.” PostgreSQL’s throughput dropping to 0.51x on Graviton4 hardware now tests whether the kernel team considers this regression severe enough to justify a revert.

Separately, Linux 7.0 merged rseq time slice extension support, a mechanism that had been in development for about a decade. Restartable sequences let user-space processes request temporary extension of CPU time slices without preemption, specifically designed for threads entering locked sections to avoid contention when scheduled outside the locked section. Rseq directly addresses the class of regression PostgreSQL is experiencing, which is why kernel developers view it as the proper long-term fix rather than preserving legacy preemption modes. Its arrival in the same release that removed PREEMPT_NONE gives kernel developers a ready-made argument that the tooling for applications to adapt already exists. For PostgreSQL, adopting a brand-new kernel interface under release pressure is a different proposition than integrating a mature, well-tested mechanism.

The Fix Standoff

Two paths forward exist, and neither is guaranteed before the stable release.

Dipietro submitted a patch restoring PREEMPT_NONE as the default on April 3, CC’ing Zijlstra, Thomas Gleixner, Valentin Schneider of Red Hat, and Sebastian Andrzej Siewior of Linutronix. Rather than accepting the revert, a kernel developer on the mailing list pushed back:

“The fix here is to make PostgreSQL make use of rseq slice extension: [lkml link]. That should limit the exposure to lock holder preemption (unless PostgreSQL is doing seriously egregious things).”

Linux kernel mailing list respondent (via Dipietro’s kernel mailing list thread)

Adopting the rseq slice extension would require PostgreSQL to integrate a kernel mechanism it has not previously used, a nontrivial change for a database server with decades of architecture built around traditional spinlock assumptions. PostgreSQL’s buffer manager, which accounts for the bulk of the regression, would need modification to request time slice extensions before entering spinlock-protected regions. Because rseq is a Linux-specific mechanism, such a change would also introduce platform-specific code paths into a database that prides itself on broad OS portability.

Reverting the kernel change would be simpler but would undo Zijlstra’s design work aimed at eliminating legacy preemption models across all six affected architectures. It would also leave the underlying tension unresolved: PostgreSQL’s spinlock-heavy buffer management was designed for a non-preemptive kernel, and future scheduler changes could expose the same vulnerability again.

Characterizing a spinlock pattern used by many of the world’s widely deployed databases as potentially “seriously egregious things” signals that the kernel community views the burden of adaptation as belonging to applications, not the scheduler. Whether other spinlock-heavy applications will encounter similar regressions on Linux 7.0 remains an open question, though any software using user-space spinlocks without rseq integration is a candidate. PostgreSQL’s case indicates that decades-old locking assumptions built around non-preemptive kernels are no longer safe defaults on modern Linux.

For database operators running PostgreSQL on Linux, the stakes are immediate. Any deployment upgrading to Linux 7.0 without a resolution in place could see throughput halved, with no configuration change on the database side. Organizations planning to adopt Ubuntu 26.04 LTS for production database servers must decide now: delay the upgrade, pin an older kernel, or accept degraded performance while awaiting either a kernel revert or a PostgreSQL update. With Linux 7.0 stable approximately two weeks out and Ubuntu 26.04 LTS following shortly after, that decision window is closing. Amazon/AWS has published benchmark data and revert patch so teams can verify exposure on their own hardware before the upgrade deadline arrives.

Linux 7.0 Cuts PostgreSQL Performance in Half

How the Regression Was Found

Recent Articles

Asus ZenBook S14 Review: The OLED Ultrabook That Gets Almost Everything Right

How Croakwood captures the challenge of a townbuilder like Anno but avoids the stress and pressure

U.S. Order Forces Anthropic to Suspend Fable 5, Mythos 5

Here’s An Easy Way To Test If Your Mower Deck Spindles Are Bad

Microsoft released Windows 11 KB5094149 / KB5095971 / KB5094156 Setup, Recovery updates

Related Stories