PatchGuard’s Detection of Hypervisor-Based Introspection: KiErrata420Present and Errata1337 [P2]

PatchGuard’s Detection of Hypervisor-Based Introspection: KiErrata704Present, Skx55, and 361 [P1]

Original text: “Patchguard: Detection of Hypervisor Based Introspection [P1]”Nick Peterson, Reverse Engineering (revers.engineering) (April 26, 2020). The original is licensed Read-Only; the prose below is a paraphrase. Disassembly screenshots and short code excerpts are reproduced under fair-use commentary with attribution captions.

Executive Summary

Nick Peterson’s post on revers.engineering walks through three Windows kernel patch-protection (PatchGuard) routines that exist specifically to detect a hostile or non-transparent hypervisor sitting underneath the kernel. The targets aren’t obvious paravirtualization seams — Windows runs cleanly inside Hyper-V and exposes a paravirt interface for that purpose — they are side-channels in the x86 architecture itself that an introspection VMM has to handle perfectly or get caught. Each routine is built around a single architectural quirk: KiErrata704Present uses FMASK, TF and SYSCALL to indirectly read the real LSTAR MSR via the resulting #DB exception RIP; KiErrataSkx55Present revives the POP SS / MOV SS single-step trick from CVE-2018-8897; and KiErrata361Present exploits the fact that ICEBP (the privileged-software-exception form of #DB) clears DR6.BS even when TF was pending — producing an architecturally non-resumable VMCS state that crashes naive VMMs at VMRESUME.

The takeaway for hypervisor authors — especially anti-cheat, EDR and security VMMs that introspect a Windows guest — is that PatchGuard is now actively probing for the exact kinds of state-faking that a transparent VMM cannot afford to do sloppily. Hiding the real LSTAR, intercepting RDMSR/WRMSR, injecting #DB without checking the exit qualification, or failing to restore BS in the VMCS pending-debug field after an ICEBP-induced exit are all detectable, and several of them have already been the root cause of public reintroductions of CVE-2018-8897 inside third-party hypervisors. The post is meant as a reference for interoperability between EDR/AV/introspection tooling and KPP.

Errata or nah?

Over the last few years, Microsoft has steadily wired hypervisor-introspection detection into PatchGuard. The motivation is mechanical: a VMM running below the kernel sits at a strictly higher privilege than the protection itself, which makes “subverting KPP” trivial from above. Windows runs fine inside a hypervisor (Hyper-V is one), and the paravirtualization interface is documented, so the kernel doesn’t care that a hypervisor is there. What it does care about is whether the VMM is tampering with state that wouldn’t need to be tampered with to run a VM: hiding the real LSTAR from RDMSR to keep a syscall hook, exploiting nested paging to gain execution at sensitive callsites, that kind of thing.

The author highlights three favourite checks. There are more inside PatchGuard than these — finding the rest is an exercise for the reader — but the trio below are the most architecturally elegant. The point of writing them up is to help security, anti-virus and introspection tools interoperate with KPP rather than blunder into it.

KiErrata704Present — reading LSTAR through a trap-flag side channel

The name is camouflage. KiErrata704Present reads, to an untrained eye, like a legitimate CPU-errata feature probe. It is not.

Disassembly of KiErrata704Present from ntoskrnl showing FMASK MSR save and SYSCALL single-step check
KiErrata704Present: save FMASK, mask TF out of the SYSCALL RFLAGS scrub, set TF in RFLAGS, execute SYSCALL, watch where the resulting #DB lands. Source: original article.

Background: why SYSCALL single-stepping is special

Older privilege-transition mechanisms — SYSENTER, call gates — let the caller single-step the transition by setting TF. The resulting #DB was delivered after the branch completed, which meant the kernel had to remember “we were mid-step” so it could IRET back into the user’s single-stepping loop after handling the syscall. Awkward.

SYSCALL/SYSRET fixed this with the FMASK MSR. FMASK tells the CPU which bits of RFLAGS to clear at SYSCALL entry; any sane OS masks off IF and TF there. The matching SYSRET was built so that if it returns to a user-mode RFLAGS image with TF set, the resulting #DB fires on the next instruction boundary — not, as IRET would do, at the boundary after its branch target. Net effect: clean single-stepping over a syscall from a user-mode debugger.

Now look at the disassembly again. The first thing KiErrata704Present does is save the current FMASK and rewrite it so that TF is not cleared by SYSCALL. TF will survive the transition.

The trap: PUSHFQ / POPFQ with TF set, then SYSCALL

The routine then primes RFLAGS.TF via a PUSHFQ/POPFQ sequence. That makes TF active during the next instruction, which fires a #DB on the boundary after it — unless that next instruction is a software exception, software interrupt, privileged software exception, or generates a hardware exception, in which case the pending single-step is suppressed.

The instruction PatchGuard chooses is SYSCALL. Because FMASK no longer wipes TF at entry, the single-step survives across the privilege transition. Picture a hypothetical LSTAR-target kernel handler:

0x40000: SWAPGS
0x40001: MOV GS:[0x8], RSP
0x40002: MOV RSP, GS:[0x10]

When the #DB fires, the interrupt frame’s saved RIP will be 0x40000 — the SWAPGS at the entry of the real syscall handler — because that instruction is the branch target that hasn’t executed yet.

Which means PatchGuard now indirectly knows the true LSTAR — it doesn’t need to RDMSR it. A VMM that intercepted RDMSR / WRMSR on LSTAR to return fake values (in order to keep a syscall hook hidden) is detectable: the #DB IP in the handler frame will not match what RDMSR(LSTAR) returned.

KiErrataSkx55Present — a CVE-2018-8897 throwback

The second check is the author’s personal favourite, because it is a clean callback to the POP SS / MOV SS debugging hole disclosed as CVE-2018-8897. PatchGuard added it not long after the original mitigation landed.

Disassembly of KiErrataSkx55Present from ntoskrnl, a CVE-2018-8897 inspired single-step check
KiErrataSkx55Present: a CVE-2018-8897-flavoured probe that uses MOV SS / POP SS single-step semantics to reveal the same branch target. Source: original article.

If you have read the whitepaper the disassembly is self-explanatory. Reusing the toy SYSCALL handler from the previous section, the #DB that fires here also lands with 0x40000 on the interrupt-stack RIP — same indirect LSTAR read, different architectural quirk.

What a hypervisor must do here

A VMM that is intercepting these games — whether to hide a hook or to virtualize debugging — must:

  • Set the exception bitmap to vmexit on #DB.
  • On exit, look at guest-state RIP and decide which of the two architecturally-possible #DB boundary positions the exit corresponds to.
  • If it’s legitimately the guest’s own debug, reflect via vectored event injection.
  • Critically, inspect exit qualification rather than blindly trusting guest RFLAGS.TF.

The author recounts a real incident: a well-known anti-virus hypervisor failed to perform that exit-qualification check. When it injected the #DB back into the guest, it re-aimed the exception at RIP = its own secret syscall handler. The kernel’s KiDebugTraps mitigation didn’t notice anything wrong — and the host machine was suddenly vulnerable to CVE-2018-8897 all over again, courtesy of its “protective” hypervisor.

KiErrata361Present — the ICEBP / pending-BS trap

The icing on the cake is a check that only burns a hypervisor if it is exiting on #DB — which, given the previous two checks, it pretty much has to be.

Disassembly of KiErrata361Present from ntoskrnl, an ICEBP/privileged-software-exception based check
KiErrata361Present: exploits an architectural inconsistency between an ICEBP-induced #DB and the VMCS pending-debug-exception field. Source: original article.

The architectural setup

Normally, loading RFLAGS.TF via a POPF variant and then doing an SS load (which establishes the “MOV SS / POP SS interrupt shadow”) defers any single-step #DB until the boundary after the instruction following the SS load. Same goes for armed-debug-register #DBs that get pushed out by an SS load.

Several instruction classes break this: a software interrupt (INTn), the dedicated INT3 software exception, and the undocumented privileged software exception ICEBP all discard the pending TF-based single-step entirely — the #DB simply never fires from that pending TF.

The VMCS trap

ICEBP carries a VT-x quirk. When it triggers a #DB vmexit, the “true” architectural state is “BS was pending” (the TF-induced single-step) — so the BS bit in the VMCS Pending Debug Exceptions field ought to be set. But because the exit was induced by a privileged software exception, the CPU clears that BS bit instead.

That clearing is the trap. The architecture also requires that whenever the guest is in an MOV SS/POP SS interrupt shadow AND RFLAGS.TF is set, the VMCS Pending Debug Exceptions field must have BS set — there is no other legal way to reach that machine state. So the VMCS that ICEBP leaves behind is not architecturally resumable: VMRESUME will fail, and a hypervisor that wasn’t expecting it tends to crash on the spot.

The mitigation, once you know to look for it, is simple: on a qualifying exit that’s flagged as a privileged software exception, if guest interruptibility shows MOV SS blocking and RFLAGS.TF is 1, manually set BS in the VMCS Pending Debug Exceptions field before VMRESUME.

The idea for KiErrata361Present was lifted from CVE-2018-1087 in KVM — back before it was public knowledge that “privileged software exception” in the Intel SDM was actually ICEBP. The SDM has since been clarified about the opcode but still understates the pending-BS edge case.

Key Takeaways

  • PatchGuard now actively probes the hypervisor. The named-as-errata checks are not CPU bug probes; they are anti-introspection checks specifically designed to catch a VMM that is forging architectural state.
  • Side-channels, not paravirt. All three checks read what should be invisible state (LSTAR, the BS bit, instruction-boundary RIP) through architectural quirks of SYSCALL, FMASK, TF, ICEBP and the MOV SS shadow. Intercepting RDMSR/WRMSR isn’t enough — you have to keep the side-channels consistent too.
  • KiErrata704Present indirectly reads the real LSTAR via the #DB interrupt-frame RIP after a TF-armed SYSCALL with FMASK tweaked.
  • KiErrataSkx55Present rides the same idea on MOV SS / POP SS shadow semantics — the CVE-2018-8897 trick. Sloppy #DB handling inside a VMM reintroduces the original vulnerability.
  • KiErrata361Present exploits a hard architectural invariant: MOV SS-shadow + TF==1 requires BS in VMCS Pending Debug Exceptions. ICEBP deliberately violates it. Naive hypervisors die at VMRESUME.
  • The fixes are local. Save and restore FMASK sanely; inspect exit qualification rather than guest RFLAGS.TF before injecting #DB; on privileged-software-exception exits, re-set BS in the pending-debug field when guest interruptibility shows MOV SS blocking and TF.
  • This is just three of them. The author hints there are more — Part 2 introduces another and walks through how to derive your own.

Hardening Checklist for Hypervisor Authors

For EDR, anti-cheat, security or introspection VMMs that want to coexist with Windows kernel patch protection, treat the following as non-optional:

  1. Don’t lie about LSTAR. If you must hook syscalls, do it without falsifying the MSR — or accept that PatchGuard will eventually catch you through the TF/SYSCALL/#DB side-channel.
  2. Inspect VMCS exit qualification on every #DB exit, never just guest RFLAGS.TF. Misinjecting #DB back to the guest with a wrong RIP resurrects CVE-2018-8897.
  3. Handle the ICEBP pending-BS edge case. On qualifying exits flagged as privileged-software-exception, if guest interruptibility shows MOV SS / POP SS blocking and TF==1, set BS in the VMCS Pending Debug Exceptions before VMRESUME.
  4. Preserve FMASK semantics. Don’t silently rewrite the MSR for introspection bookkeeping — if it’s changed under the guest’s feet, KiErrata704Present notices.
  5. Audit any place you inject #DB. Vectored event injection with a wrong target RIP is the canonical way to fail this whole family of checks.
  6. Don’t hide nested-paging tricks at kernel call sites. If you remap or split-view code pages around nt!Ki* entry points, every page-table walk PatchGuard does becomes a probe.
  7. Read the source whitepapers. The POP SS / MOV SS paper for CVE-2018-8897 and the CVE-2018-1087 KVM advisory are required reading — the entire KiErrata* family generalises from them.
  8. Treat “errata”-named routines in nt!Ki* as anti-introspection probes. The naming is camouflage; assume any new one is a new check until proven otherwise.

Conclusion

The PatchGuard KiErrata* routines documented in Part 1 are a clean illustration of how Windows defends itself against the only adversary that genuinely outranks it: code running below ring 0 in a hypervisor. None of the three checks rely on direct “is there a VMM?” tests — Windows knows perfectly well it can be virtualised. They rely on the much harder property of architectural transparency: a friendly VMM has to keep every observable side-channel exactly as bare metal would, including the ones that no documented interface exposes. For hypervisor authors the cost of skimping on any of these is steep — from a kernel bugcheck up to silently re-opening CVE-2018-8897 on every machine your product ships on. The author promises a Part 2 with another check and some derive-it-yourself reasoning — recommended reading for anyone building a security VMM that needs to coexist with KPP.

Original text: “Patchguard: Detection of Hypervisor Based Introspection [P1]” by Nick Peterson at Reverse Engineering (revers.engineering).

Comments are closed.