LoongArch KVM CVE-2024-53089 Fix: Timers Expire in Hardirq Context on PREEMPT_RT

  • Thread Author
A critical Linux-kernel fix for LoongArch KVM addresses a scheduling-in-atomic-context bug that can crash or render a host unavailable: CVE-2024-53089 patches KVM timer handling so that high-resolution timers (hrtimers) used by the LoongArch KVM backend are allowed to expire in hard interrupt (hardirq) context on PREEMPT_RT-enabled kernels, preventing illegal sleeps in atomic contexts and removing a reliable path to host denial-of-service.

Linux Kernel: LoongArch, KVM, CVE patch rollout, and PREEMPT_RT kernels.Background​

What the CVE is, in plain terms​

The vulnerability described by CVE-2024-53089 concerns KVM on the LoongArch architecture. On PREEMPT_RT (real-time) enabled kernels, unmarked hrtimers were being moved into a soft-interrupt expiry mode by default. The code path that canceled those timers could run with preemption disabled — a context where sleeping is forbidden under PREEMPT_RT — while the timers themselves could end up invoking short callbacks that execute in hard-IRQ contexts. The mismatch of context and timer behavior led to a scheduling-while-atomic bug: the kernel attempted to perform operations that may sleep while in an atomic (no-sleep) context, producing BUG traces and instability.

Why this matters operationally​

This is not a remote code-execution vulnerability. Its primary impact is availability: hosts running vulnerable kernels can hit kernel BUGs, vCPU failures, or even panic, which in multi-tenant or cloud environments translates into denial-of-service (DoS) of the hypervisor host and disruption for all co-located guests. Attackers with the ability to run a guest VM on an affected host — a plausible scenario in shared clouds or poorly segregated infrastructures — could craft workloads or instruction sequences to reproduce the condition and force host instability. Multiple vendor trackers and the NVD classify the issue with a medium CVSS score but emphasize the availability consequences.

Technical analysis​

The root cause: timers, contexts, and PREEMPT_RT​

Linux timers have subtle lifetime and context semantics. On PREEMPT_RT kernels, where real-time semantics alter how preemption and softirqs behave, the kernel moves certain unmarked hrtimers into a soft-expiry path. In this case, the timer cancelation logic was invoked from a preempt-notifier that ran with preemption disabled. On RT kernels, the preempt-notifier invocation contract prevents sleeping — but the cancellation call sequence could cause the timer callback to execute in a context incompatible with sleeping, producing the classic kernel message: “scheduling while atomic”. The more robust fix is to allow the timer to expire in hardirq context instead of being forced into softirq/soft-expiry modes that lead to the preempt-disabled cancellation path. The upstream change implements precisely that: mark the timer to expire in hardirq (hard interrupt) context on PREEMPT_RT so the short callback runs in a context consistent with the rest of the paths.

What the patch changes (high-level)​

  • The patch aligns timer expiry context with the actual callback behavior: instead of moving those hrtimers into softirq expiry by default, the code marks them to expire in hardirq context.
  • This prevents the cancelation path from invoking operations that require preemption to be enabled and avoids the need to cancel timers from preempt-notifiers running with preemption disabled.
  • The change is deliberately narrow to minimize regression risk while restoring correct kernel invariants under PREEMPT_RT.
This surgical approach mirrors prior KVM fixes for similar context correctness problems (for example, earlier fixes that adjusted LAPIC/ARM timer expiry behavior), making the change conservative and targeted.

Symptom reproducibility and evidence​

Public kernel traces in vendor advisories and the NVD show typical call stacks and messages tied to this bug class: kernel call traces containing __schedule_bug, schedule_rtlock, rt_spin_lock, hrtimer_cancel_wait_running, kvm_restore_timer, and related KVM frames (qemu-system-loo invoked when running guests). These traces were used to confirm the problem and the rationale for the patch. Tracking pages and distro advisories include example stack traces and the upstream patch references used by maintainers.

Exploitability and risk model​

Attack surface​

  • Who can trigger it: A tenant/attacker able to run a guest VM on the host (local, host-adjacent vector) can craft workloads that exercise KVM timer flows and preemption interactions. The vulnerability is not exploitable remotely without the ability to run a guest.
  • Complexity: Low-to-moderate. The conditions require specific KVM paths (LoongArch KVM code) on PREEMPT_RT-enabled kernels, but they are deterministic: running a sequence that leads to the preempt-notifier cancel path and short hard-IRQ callbacks will produce the BUG.
  • Impact: High for availability. A host-level kernel BUG or oops may terminate guests, cause QEMU/KVM threads to abort, and in severe cases require host reboot.

CVSS and categorization​

Most trackers (NVD, OSV, cloud vendor trackers) set the CVSS v3 base score around 5.5 (Medium), with vector AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H — reflecting a local attack vector that yields complete availability loss for the affected component. A medium score is appropriate because the bug doesn’t provide code execution or data leakage, but it can reliably deny availability to multi-tenant hosts.

Is there evidence of active exploitation?​

There are no widely published proofs-of-concept or exploit campaigns tied to CVE-2024-53089 in the public trackers at the time of writing. Vendor trackers and distribution advisories describe the bug and the upstream fix but do not report confirmed active exploitation in the wild. That said, the deterministic nature of the failure (host instability) makes it an attractive vector for attackers seeking denial-of-service in multi-tenant environments; lack of public exploitation does not equal absence of risk. Treat the absence of public PoCs as non-conclusive and prioritize patching where exposure exists.

Who and what is affected​

Architectures and kernels​

  • The bug is specific to the LoongArch KVM backend as described in the CVE entry and upstream description; it affects PREEMPT_RT-enabled builds where the timer expiry/softirq handling differs from standard kernels.

Distribution and vendor status​

  • Multiple public vulnerability trackers and distribution security pages (Debian, SUSE, Amazon Linux trackers) have indexed the CVE and note that upstream patches are available and shipped via stable kernel trees or distribution backports. Operators must consult their distribution’s security tracker and kernel changelogs to confirm whether their installed kernel packages include the fix. Different vendors may label packages as “Not Affected”, “Pending Fix”, or “Fixed” depending on their kernel configuration and shipping policy.

Typical impacted deployments​

  • High-priority: multi-tenant hosts, public cloud hypervisors, hosting providers, or any environment where untrusted guests can be run.
  • Moderate: internal virtualization clusters that accept third-party or less-trusted guest images (CI build farms, shared development clusters).
  • Lower priority: single-user desktops and tightly-controlled private hosts running trusted VMs — still recommended to patch, but operational impact and exploitation likelihood are lower.

Remediation and operational guidance​

Definitive remediation​

  • Apply the vendor/distribution kernel update that includes the upstream patch/backport and reboot the host into the patched kernel. This is the only reliable long-term fix.
  • Confirm the kernel package changelog or vendor advisory explicitly references CVE-2024-53089 or the upstream commit. If the changelog references the patch commit IDs (stable-tree commit references), that confirms the fix is present.

Short-term mitigations (if you cannot immediately patch)​

  • Avoid running untrusted guests on vulnerable hosts. Move untrusted tenants to patched hosts where possible.
  • Limit exposure by reviewing and restricting KVM and CPU configuration flags that might widen the attack surface. Be cautious: changing low-level flags (e.g., NRIP-related toggles) can have side effects and should be validated in test environments before production.
  • If feasible for lab or non-production workloads, configure KVM to prefer the emulator (slow) path in situations where performance is not critical; this ensures instruction decoding happens in sleepable contexts but will degrade performance. Use this only as a temporary measure.

Detection and post-patch validation​

  • Search kernel logs for the characteristic symptoms: messages such as “scheduling while atomic” and call traces including __might_fault, __might_resched, kvm_vcpu_read_guest_page, kvm_restore_timer, and similar KVM frames. Those traces are strong indicators of the pre-patch failure mode.
  • After applying updates, reproduce representative workloads in a staging environment to confirm the bug no longer triggers the prior stack traces. Monitor host logs for 7–14 days post-update for regression signals.

Recommended rollout strategy​

  • Inventory: identify all KVM hosts and confirm kernel versions (uname -r) and whether PREEMPT_RT is enabled.
  • Pilot: apply patches to a small pilot group, verify KVM functionality (guest boot, live migration, snapshots, IO), and monitor for regressions.
  • Staged rollout: deploy to production in waves with monitoring windows and rollback plans.
  • Post-deploy monitoring: watch kernel logs and guest stability closely after each wave.

Critical perspective: strengths of the fix and remaining caveats​

Strengths​

  • The upstream change is narrow and targeted: it adjusts timer expiry context behavior rather than redesigning KVM timer subsystems, which minimizes regression risk.
  • The fix was upstreamed into stable kernel branches and distributed via vendor backports, enabling standard patching workflows for operators.
  • The change removes a deterministic crash vector and restores kernel invariants, protecting host availability for multi-tenant workloads.

Caveats and residual risks​

  • Distribution timelines vary: embedded, vendor-custom, or long-term-support kernels may take longer to receive backports, creating a patching gap for some environments. Operators must verify vendor advisories for their specific platform.
  • The fix addresses this specific context/timer correctness issue but does not eliminate other classes of hypervisor vulnerabilities (e.g., memory corruption, speculative side-channels). Continue layered mitigation and monitoring.
  • No public PoC does not equal no real-world exploitation. The deterministic availability impact makes the bug attractive in hostile multi-tenant environments, so prioritize patching where exposure exists.

Practical checklist for administrators (concise)​

  • Inventory hosts that run KVM and determine whether they are PREEMPT_RT-enabled.
  • Check vendor advisories and kernel package changelogs for CVE-2024-53089 or the upstream stable commit references.
  • Apply vendor-supplied kernel packages that include the fix, and reboot hosts during scheduled maintenance windows.
  • If immediate patching is impossible, restrict untrusted guests and consider temporary configuration mitigations (validate in test first).
  • Monitor kernel logs for the signature traces and validate the fix in staging prior to wide deployment.

Conclusion​

CVE-2024-53089 is a correctness bug in LoongArch’s KVM timer handling on PREEMPT_RT kernels that can produce scheduling-while-atomic failures and host-level instability. Upstream maintainers fixed the issue by ensuring the hrtimer expires in a context consistent with its callback semantics (hardirq on RT builds), and vendors have been distributing backports through normal kernel update channels. The vulnerability’s operational impact is clear: availability and reliability of hypervisor hosts are at risk when unpatched, particularly in multi-tenant environments. Operators should prioritize standard kernel updates, validate vendor changelogs for the CVE or commit IDs, and use staged rollouts and close monitoring to ensure both remediation and operational stability. Note: public trackers do not show confirmed active exploitation linked to this CVE at this time, but the deterministic DoS potential warrants prompt remediation for exposed hosts. Treat the absence of public exploit reports as provisional and prioritize patching for shared or cloud infrastructure.
Source: MSRC Security Update Guide - Microsoft Security Response Center
 

Back
Top