CVE-2018-8611 — Exploiting the Windows Kernel Transaction Manager (Part 1/5: Introduction)

CVE-2018-8611 — Exploiting the Windows Kernel Transaction Manager (Part 1/5: Introduction)

Original text: “CVE-2018-8611 Exploiting Windows KTM Part 1/5 — Introduction”Aaron Adams and Cedric Halbronn, NCC Group (27 April 2020). Code snippets, kernel structure definitions, Win32 API signatures, undocumented flag enums and the cookie / pool tag tables are reproduced verbatim with attribution; commentary and analysis are rewritten.

Executive Summary

This is part one of NCC Group’s five-part write-up on the exploitation of CVE-2018-8611, a local privilege-escalation race condition in the Windows kernel’s Kernel Transaction Manager (KTM). The bug surfaced in October 2018 when Kaspersky’s Automatic Exploit Prevention (AEP) caught a working 0day in the wild; the vendor disclosed it to Microsoft, who shipped a patch in the December 2018 cumulative update. To date, public technical material on the underlying flaw and on the precise exploitation primitives has been very thin — Kaspersky published a brief writeup that hinted at the trigger but never released a sample, hashes, or a full breakdown. Adams and Halbronn at NCC Group’s Exploit Development Group built their own exploit from scratch, working across Windows Vista through Windows 10 1809 on both x86 and x64, and this series is the deep-dive.

Two things make this issue more interesting than the typical kernel LPE. First, KTM is a comparatively under-explored kernel subsystem — it is not the win32k surface that everyone audits and that the major browser sandboxes block. Calls into KTM are not filtered by the win32k syscall filter used by Chrome and other sandboxes, which means the bug works as a sandbox escape in client-side chains where win32k is locked down. Second, race conditions in the Windows kernel are a less-discussed bug class outside of bochspwn-style fuzzing work. Part 1 of the series — reproduced and reframed below — lays down the foundation: what KTM is, what its four core object types look like in memory, how to drive them from userland, and what the Kaspersky disclosure actually told us about the trigger before NCC Group worked it out themselves.

TL;DR

CVE-2018-8611 is a Windows-kernel local privilege escalation living inside KTM — the subsystem that backs Windows’s notion of “transactional” registry and filesystem operations. Kaspersky’s December 2018 disclosure revealed the existence of an in-the-wild 0day, but the substantive technical detail (the actual flaw, the exploitation primitive, hashes, samples) was never made public. NCC Group’s EDG team reversed it, built a reliable end-to-end exploit, and presented the work at POC 2019. The series walks through the entire process: KTM internals, patch diffing, basic vulnerability triggering, race-condition timing, debugging tricks, and the path from race win to a kernel read/write primitive.

The most important operational note: KTM is reachable through standard syscalls that are not blocked by modern win32k-style sandbox filters. Where today’s renderer / content-process sandboxes (Chromium and friends) take the position that win32k is the dominant kernel attack surface and block it with the win32k syscall filter, KTM remained accessible — making this exact bug a useful sandbox-escape primitive in 2018–2020 client-side chains.

The target was developed and verified across Windows Vista, Windows 7, Windows 8, Windows 8.1, and Windows 10 up to 1809, on both x86 and x64. The work was done by Aaron Adams and Cedric Halbronn at NCC Group EDG and first presented at POC2019. In the spirit of NCC’s earlier writeups, the series is candid about wrong turns, mistakes, and dead ends — not just the final exploit.

Getting Started

What was publicly known at the time

The starting point for the research was Kaspersky’s December 2018 Securelist post. That post is the entire public technical breadcrumb trail: AEP picked up the in-the-wild exploit in October 2018, Kaspersky reported it to MSRC, and Microsoft shipped a fix in the December 2018 patch cycle. The published description was light, and on a first read the details did not line up neatly with what a careful KTM reverser would later identify; only after Adams and Halbronn had built up an independent model of KTM did the hints click into place.

The relevant operational summary from Kaspersky — paraphrased rather than reproduced — is that the in-the-wild exploit (a) created a named pipe and opened it for read/write, (b) instantiated a pair of transaction managers, resource managers and transactions, then created a large number of enlistments against the second transaction (referred to as “Transaction #2”), and (c) created exactly one more enlistment against “Transaction #1” and committed it. From there the exploit moved into the race-condition phase: multiple threads pinned to a single CPU core, one looping on NtQueryInformationResourceManager, a second issuing a single NtRecoverResourceManager, and a third thread using NtQueryInformationThread to read the most recent syscall on the second thread — effectively a side-channel signal that the recover call had landed. Once that race fired, a subsequent WriteFile against the named pipe triggered the corruption.

Reading that for the first time, almost nothing makes operational sense — the named pipe in particular feels random until you understand which kernel object is being corrupted. By the time you finish part 5, those hints are exactly what they appear to be: a sketch of one specific exploitation strategy out of several possible ones. After NCC Group’s own work was already done and the blog series was being prepared in October 2019, Kaspersky presented additional detail at BlueHat Shanghai in May 2019. Part 5 of the series compares techniques between the in-the-wild exploit and NCC Group’s approach.

Test environment

For anyone who wants to reproduce the work, NCC Group’s tooling stack is the standard Windows kernel research kit. A summary of what they used:

Three online references also saved a lot of time:

Why start on Windows 7

The starting platform was Windows 7 x64, specifically ntoskrnl.exe version 6.1.7601.24291. Two reasons drove that choice. First, prior experience reversing win32k bugs suggested there might be more symbols available on Windows 7 — that turned out not to actually matter for KTM, but it was a reasonable guess. Second, IDA Pro is awkward when a project spans multiple binaries: from Windows 8 onward KTM lives in its own driver (tm.sys) rather than inside ntoskrnl.exe. Attacking tm.sys in isolation looks attractive (only the KTM code, no noise) but it forces constant context switching between two IDB databases because KTM calls a lot of ntoskrnl.exe routines under the hood.

The downside of starting on Windows 7 turned up later: the first write primitive Adams and Halbronn built worked fine on Vista and 7 but broke on Windows 8 and above due to safe-unlinking and refcount-hardening mitigations that Microsoft introduced. There is always a trade-off — better to start, see what breaks, and iterate than to over-analyse the platform choice in advance.

One nice side effect of starting on Windows 7: patch diffing KTM on Windows 8+ is easier because tm.sys is isolated — no unrelated ntoskrnl.exe changes pollute the diff output. The code snippets that follow throughout the series are decompiled from Windows 7 ntoskrnl.exe unless explicitly noted otherwise. Differences across versions are called out as they come up, but for a complete cross-platform port you should expect a long tail of small offset / struct-layout / behaviour quirks that need version-specific handling.

Understanding the Windows Kernel Transaction Manager (KTM)

Where to find documentation (and what is missing)

Once the patch was in hand and clearly incomprehensible without context, the obvious next step was learning KTM end-to-end. The method was: reverse the major kernel APIs and the relevant system calls, build small userland clients exercising each API, and accumulate working samples that could be folded back into the eventual exploit. Two MSDN portals are the main public references — the user-mode KTM portal and the kernel-mode KTM portal. The user-mode portal has fewer working samples than most MSDN topics, and the kernel-mode portal is thorough but easy to drown in without context.

The practical answer was trial-and-error driven by reading the kernel implementations. Microsoft also produced three useful overview videos in the “Going Deep” series around the Vista launch, which give the best high-level mental model of why KTM exists:

One real-world malware appearance worth noting: the Proton Bot loader was observed using KTM APIs in 2019 to dodge API monitoring and hooking, though without actually leveraging KTM’s transactional semantics — the calls were cover, not function.

What KTM actually is

KTM showed up in Windows Vista to give the operating system a primitive for “transactional” operations — the same idea databases have had for decades, now exposed to NTFS (TxF) and the registry (TxR). The motivating use case is straightforward: some logical piece of work touches several resources, and the system needs that combined work to be atomic. Either everything happens or nothing does. The canonical examples in the docs are ATM withdrawals reconciling cash drawers with account balances, and software installs that need to roll back cleanly when a user cancels mid-installation.

The mechanism is straightforward in principle. A multi-step operation registers itself as a transaction. The transaction completes successfully only if all of its constituent steps succeed. If any step fails, the system rolls everything back to the pre-transaction state. To make this work across multiple resources (filesystem, registry, application data) KTM also has the concept of recovering from partial failure: when one piece of work fails, the system notifies the other participants so they know to re-synchronise to a common state.

From an exploitation standpoint, four kernel object types dominate the discussion and we’ll abbreviate them throughout:

  • Resource Manager (RM)
  • Transaction Manager (TM)
  • Transaction (Tx)
  • Enlistment (En)

What is a transaction (Tx)?

Microsoft gives a high-level transaction overview here and a lower-level walk-through here. Operationally, a transaction is a _KTRANSACTION kernel object associated with one transaction manager and with one or more enlistments. It represents an in-flight unit of work and has three primary state transitions: create, commit, and rollback. Committing converts in-progress changes into permanent ones; rolling back reverts the partial work. A transaction that has been rolled back cannot be committed and vice versa.

The relevant fields of the Windows 7 / 2008R2 SP1 x64 _KTRANSACTION structure:

//0x2d8 bytes (sizeof)
struct _KTRANSACTION
{
    struct _KEVENT OutcomeEvent;                                            //0x0
    ULONG cookie;                                                           //0x18
    struct _KMUTANT Mutex;                                                  //0x20
    [....]
    struct _GUID UOW;                                                       //0xb0
    enum _KTRANSACTION_STATE State;                                         //0xc0
    ULONG Flags;                                                            //0xc4
    struct _LIST_ENTRY EnlistmentHead;                                      //0xc8
    ULONG EnlistmentCount;                                                  //0xd8
    [....]
    union _LARGE_INTEGER Timeout;                                           //0x128
    struct _UNICODE_STRING Description;                                     //0x130
    [....]
    struct _KTM* Tm;                                                        //0x200
    [....]
};

Practically, you do not poke at this structure much during exploitation, but the three fields worth keeping in mind are EnlistmentCount (a count of associated enlistments), EnlistmentHead (the head of the linked list of those enlistments), and Tm (a back-pointer to the owning transaction manager).

From userland a transaction is created with the CreateTransaction() API:

HANDLE hTx = CreateTransaction(
    NULL, // lpTransactionAttributes
    0,  // UOW
    0,  // CreateOptions
    0,  // IsolationLevel
    0,  // IsolationFlags
    0,  // infinite timeout
    L"ExampleTx" // Description
);

Almost every KTM kernel object carries a cookie field. The cookies are unique per type, which makes them a quick sanity check when staring at raw memory in the debugger: if the four bytes at the expected cookie offset don’t match the expected magic, you are not looking at the structure you think you are. The cookies for the relevant KTM object types are reproduced verbatim below.

CookieObject type
0xb00b0001_KTRANSACTION
0xb00b0002_KRESOURCEMANAGER
0xb00b0003_KENLISTMENT
0xb00b0004_KTM
0xb00b0005Protocol Address Info?
0xb00b0006Propagate Request?
Cookie values for KTM kernel objects. Source: original article.

The KTM objects also use pool tags, which is useful when tracking allocations with PoolMon or filtering pool spray. The relevant ones for this vulnerability:

Pool tagObject type
TmTx_KTRANSACTION
TmRm_KRESOURCEMANAGER
TmEn_KENLISTMENT
TmTm_KTM
Pool tags for KTM kernel objects. Source: original article.

What is a transaction manager (TM)?

A transaction manager is the top of the KTM hierarchy — the entity that owns transactions and the resource managers attached to them. Before any transactional work can occur, a TM has to exist.

The distinction that matters most for exploitation is durable vs. volatile. MSDN describes both:

  • Durable — the default kind. The TM is backed by a log file on disk and can recover state across reboots.
  • Volatile — no log, no state recovery.

Why does this matter? If you are spraying thousands of enlistments per second to win a race, a durable TM will write log records for every operation — and you will quickly fill the log, which produces errors that break the spray loop. Switching to a volatile TM removes that whole class of failures. Empirically, volatile was the right call for this exploit.

Comments are closed.