Reverse-engineering Valorant's Vanguard Guarded Regions: PML4 Cloning, CR3 Swaps, and the SwapContext Hook PoC (Walk-through of Xyrem's Post)

Reverse-engineering Valorant’s Vanguard Guarded Regions: PML4 Cloning, CR3 Swaps, and the SwapContext Hook PoC (Walk-through of Xyrem’s Post)

Original text: “In-depth analysis on Valorant’s Guarded Regions”Xyrem, reversing.info (2023). Code blocks and figures below are reproduced verbatim with attribution captions.

Executive Summary

Riot’s Vanguard anti-cheat keeps a slice of Valorant’s game state in memory that is, from any other process or unprivileged thread’s point of view, simply not mapped. Xyrem’s post walks the reader through how that’s actually done. It isn’t a hypervisor and it isn’t a clever marshalling trick — it is a small, carefully placed manipulation of the x86-64 four-level paging hierarchy. Vanguard reserves an unused PML4 index inside the game process’s page-table tree, builds a parallel “shadow” PML4 that only maps the protected memory through that one index, and swaps the live CR3 between the public PML4 and the shadow PML4 on every thread context switch — using a hook on the kernel’s SwapContext — so that the shadow region is visible to Vanguard’s own whitelisted threads and invisible to everything else, including the game’s main thread.

The post builds the technique up from first principles — the IOCTL that returns the shadow base, the x86-64 paging recap, the SwapContext reverse — and then turns it around with a working PoC, “Yumekage”, that reimplements the same primitive from a custom kernel driver: clone the client PML4, plant a shadow PML4 at a free index, hook SwapContext, whitelist threads through an IOCTL, and watch the same address resolve to “Failed” before whitelisting and “Success” after. The closing section is the most interesting bit for defenders — the obvious cheat side-channels (hard-coded shadow base, recognisable 'TnoC' pool tag) that Vanguard’s production build has clearly evolved to detect.

Disclaimer (Original)

Xyrem prefaces the post with the obligatory note that it’s an educational analysis of Riot Vanguard and Windows internals — nothing in the post is meant to be production cheat code, and nothing on this republication is either. The Windows-internals content is general; the Vanguard-specific bits are anchored in a build from 2023.

Introduction

The framing is the standard online-FPS framing: cheating in competitive games has driven anti-cheats from user-mode handle scanners through kernel-mode signature scanners through the current generation, where the anti-cheat ships its own driver and lives at ring 0 alongside the game. Vanguard is the canonical example. What sets the Guarded Regions feature apart is that it doesn’t rely on hiding inside the address space of any particular process — it rewires the page tables of the game process itself so that some addresses are only resolvable from inside specific, pre-blessed threads. The rest of the post is the answer to “how, exactly”.

The Problem

The problem for an anti-cheat is that any external reader — a cheat in another process, a debugger, a hardware-DMA card — can in principle read any memory the OS thinks the target process owns. The conventional defences (handle stripping, callback-based read denial, signature scanning for known cheats) are bypassed routinely. Vanguard’s answer with Guarded Regions is structural: don’t rely on the OS to refuse reads; arrange it so that, at the level of the CPU’s own paging hardware, the addresses to read just aren’t there for the wrong context. The trade-off is that this requires constant page-table maintenance and very careful interaction with the Windows scheduler, which is what the rest of the post unpacks.

Reversing the Logic

Xyrem starts where any reverser would: pulling apart the game’s stub.dll packer to find the calls that talk to Vanguard’s kernel device, and watching the IOCTL that returns the shadow base. The interesting pattern is the use of __rdtsc() as a request nonce and an encrypted input/output buffer round-trip. The control-flow scaffolding around the call — reproduced verbatim from the source, with the encryption blocks stripped for readability — looks like this:

NOTE: This code is heavily edited and stripped to be readable.

Original article
uint64_t* InputBuffer = (uint64_t*)malloc(8);
*InputBuffer = __rdtsc();

// Input Buffer encryption block removed.

uint64_t* OutputBuffer = (uint64_t*)malloc(16);
memset(OutputBuffer, 0, 16);

if ( !DeviceIoControl(Data::VgkHandle, [REDACTED], InputBuffer, 8, OutputBuffer, 16, &BytesReturned, 0)
  || BytesReturned != 16 )
{
  free(InputBuffer);
  free(OutputBuffer);
  
  return EPackmanStatus::VanguardFailure;
}

// Output Buffer decryption block removed.

*Arg2_pShadowBase = OutputBuffer[1];

free(InputBuffer);
free(OutputBuffer);
  
return EPackmanStatus::Success;

Source: original article.

After decryption, the output buffer holds two 64-bit values — the echoed TSC timestamp and the shadow base address itself:

OutputBuffer[0] = 0x293E76759617 // TSC timestamp
OutputBuffer[1] = 0x008000000000 // Shadow base

Source: original article.

The shadow base is the upshot. 0x008000000000 is a canonical virtual address whose only set bits are inside PML4 index 1 (i.e., bit 39 = 1, all higher bits zero). The Vanguard kernel driver derives that base from the PML4 index it has chosen for the game process — one of the unused entries in the upper half of the PML4 — via a small function whose post-stripping skeleton is reproduced verbatim:

NOTE: This code has been heavily stripped and modified for your ease.

Original article
if ( IoGetCurrentProcess() != Data::GameProcess )
  return 0;

return (uint64_t)Data::FreePML4EIndex << 39;

Source: original article.

The really load-bearing piece, though, is the routine that actually performs the page-table swap when a whitelisted thread is scheduled. The reverse, again from the original source, looks like this — cloning the game process’s PML4, planting the shadow PML4 entry at FreePML4Index, conditionally writing CR3 to point at the clone if the current thread is on the whitelist, and finally flushing the TLB by toggling the global-pages bit in CR4:

if ( PsGetThreadProcess(CurrentThread) != Data::GameProcess
  || __readcr3() != Data::GameCR3 )
  return;

bool WriteToCR3 = true;

_disable();

memmove(Data::CloneVirtCR3, Data::VirtGameCR3, 0x1000);

Data::CloneVirtCR3[Data::FreePML4Index] = Data::ShadowPML4Value;

for ( int ThreadIdx = 0; ThreadIdx < Data::ThreadCount; ThreadIdx++ )
{
    WriteToCR3 = Data::ThreadArray[ThreadIdx] == CurrentThread;
    
    if ( WriteToCR3 )
      break;
}

DoTask:
if ( WriteToCR3 )
  __writecr3(Data::CloneCR3);

if ( CanFlushTLB )
{
  uint64_t OriginalCR4 = __readcr4();
  __writecr4(OriginalCR4 ^ 0x80);
  __writecr4(OriginalCR4);  
}

_enable();

Source: original article.

Paging Tables

NOTE: The following explanation of paging tables is severely dumbed down, for new inexperienced readers. It is recommended to refer to the Intel manual for a better explanation.
NOTE: This section of the post only talks about the 4-level paging table hierarchy, and ignores the 5-level one.

Original article

x86-64 with 4-level paging splits a 48-bit virtual address into four 9-bit indices plus a 12-bit page offset. Each index walks one level of a four-deep table tree rooted at CR3: PML4 → PDPT → PD → PT, where the leaf PT entry holds the 40-bit page-frame number that maps to a 4 KiB physical page. Two of those levels (PD and PDPT) can also short-circuit the walk by setting the PS bit and pointing directly at a 2 MiB or 1 GiB physical page respectively — useful for the kernel direct-map but not relevant to this specific trick. The whole tree lives in physical memory; the CPU walks it on every TLB miss.

Diagram of x86-64 4-level paging tables for 4KB pages (PML4, PDPT, PD, PT)
x86-64 4-level paging for 4 KiB pages: PML4 → PDPT → PD → PT, rooted at CR3. Source: original article.

Process Isolation

Each Windows process owns its own page-table tree, with the kernel half (upper canonical addresses) typically shared via a kernel direct-map embedded into every process’s PML4. The user half is per-process. The hardware enforces isolation by simply changing CR3 when the kernel scheduler switches the running thread to one whose owning process is different — the very next memory access walks a different tree, and the previous process’s mappings are no longer reachable.

That “change CR3 on context switch” behaviour lives inside the Windows kernel function SwapContext. Hook SwapContext and you can, at the cost of running in ring 0, change which page-table tree the freshly-scheduled thread will see — while leaving the OS’s book-keeping intact. That is precisely what Vanguard does.

Basic Implementation of the Idea

⚠️ DISCLAIMER: I am aware that the code snippet may be hard to read due to the alignment, you are free to copy it to somewhere else to read it.

Original article

To make the swap concrete, Xyrem walks through the canonical SwapContext reverse — the kernel routine that runs every time the scheduler picks a new thread to run on the current logical CPU. The relevant skeleton, reproduced verbatim, is below. The interesting line for our purposes is the __writecr3(NewDirectoryBase) — that is the single hardware-visible thing that defines “the current process’s address space” on this CPU.

bool SwapContext()
{
 PKTHREAD NewThread = RSI;
 PKTHREAD OldThread = RDI;

 if ( NewThread->Running )
 {
   while ( NewThread->Running )
     _mm_pause();
 }

 NewThread->Running = 1;

 NewProcess = NewThread->ApcState.Process;
 if ( NewProcess != OldThread->ApcState.Process )
 {
   NewDirectoryBase = NewProcess->DirectoryTableBase;
   if ( KiKvaShadow & 1 )
     Prcb->KernelDirectoryTableBase = (NewDirectoryBase & 2) ? NewDirectoryBase | (1 << 64) : NewDirectoryBase;

    __writecr3(NewDirectoryBase);
    if ( KiKvaShadow & 1 && (NewDirectoryBase & 2) == 0 )
    {
      CR4 = __readcr4();
      __writecr4(CR4 ^ 0x80);
      __writecr4(CR4);
    }
 }

 OldThread->Running = 0;

 return NewThread->ApcState.KernelApcPending ? (NewThread->SpecialApcDisable | NewThread->WaitIrql) : false;
}

Source: original article.

The structural plan, then: build a clone of the game process’s PML4, modify only one entry (the chosen free PML4 index, so that index now maps the shadow region’s PDPT), keep both copies around, and on every SwapContext for a thread belonging to the game process, write the clone’s physical address into CR3 only if the thread is on the whitelist. The result: whitelisted threads see the shadow mapping; everyone else, including unwhitelisted threads in the very same process, see 0x008000000000 as unmapped.

The Exploit

Xyrem’s “exploit” in the post is the inversion of Vanguard’s technique: a small custom kernel driver, Yumekage, that does the same PML4 clone + SwapContext hook but exposes the primitive through IOCTLs so a usermode demo can drive it. The hook itself, reproduced verbatim, is the minimum that’s needed — check the whitelist, rebuild the clone PML4, and rewrite CR3 to point at the clone:

void SwapContextHook( )
{
	if ( !WhitelistedThreads.Contains( KeGetCurrentThread( ) ) )
		return;

	memcpy( Paging::ClonePML4Virt, Paging::ClientPML4Virt, 0x1000 );
	Paging::ClonePML4Virt[ Paging::FreePML4Index ] = Paging::ShadowPML4;

	cr3 CR3 = cr3{ .flags = __readcr3( ) }; CR3.address_of_page_directory = Paging::CloneCR3Phys >> 12;
	SetCR3( CR3 );
}

Source: original article.

The IOCTL handler that adds the current thread to the whitelist does the same swap immediately, so that the thread running the IOCTL itself gets the shadow mapping straight away — otherwise it would have to wait for the next scheduling point to see the new world:

case WhitelistThreadCTL:
{
  WhitelistedThreads.Insert( KeGetCurrentThread( ) );

  memcpy( Paging::ClonePML4Virt, Paging::ClientPML4Virt, 0x1000 );
  Paging::ClonePML4Virt[ Paging::FreePML4Index ] = Paging::ShadowPML4;
  
  cr3 CR3 = cr3{ .flags = __readcr3( ) }; 
  CR3.address_of_page_directory = Paging::CloneCR3Phys >> 12;
  SetCR3( CR3 );

  DBG( "Whitelisted thread %dn", PsGetCurrentThreadId( ) );

  *SystemBuffer = 0x1BADD00D;
  break;
}

Source: original article.

The setup IOCTL is the one that picks the free PML4 index. It maps the current client’s PML4 to a kernel virtual address (via MmGetVirtualForPhysical), scans the lower 256 entries (the user-half PML4 entries) for a zero-valued one, and reports the index back so the usermode side can locate the shadow base. In production cheats you’d randomise the choice — more on that later — but here it’s the first free entry:

case InitializeCTL:
{
  WhitelistedThreads.Clear( );

  cr3 CR3 = { .flags = __readcr3( ) };
  Paging::ClientPML4Virt = (pml4e_64*)MmGetVirtualForPhysical( PHYSICAL_ADDRESS{ .QuadPart = LONGLONG( CR3.address_of_page_directory << 12 ) } );
		
  for ( int i = 0; i < 256; i++ )
  {
    if ( !Paging::ClientPML4Virt[ i ].flags )
    {
      Paging::FreePML4Index = i;
      break;
    }
  }

  *SystemBuffer = Paging::FreePML4Index;

  DBG( "Initialized paging for process %dn", PsGetCurrentProcessId( ) );
  break;
}

Source: original article.

The usermode demo glues those IOCTLs into a clean before/after test: try to read the shadow address through IsBadReadPtr before whitelisting (must fail), whitelist the current thread, try the same read again (must succeed), then exercise the page with a simple read/write loop:

int main( )
{
	printf( "[*] Yumekage Usermode Demonn" );

	if ( !Comm::Initialize( ) )
	{
		printf( "[-] Failed to initialize comms.n" );
		Sleep( 5000 );
		return 0;
	}

	printf( "[+] Initialized comms.n" );

	uint64_t Address = Comm::InitializeHiddenPages( );
	if ( !Address )
	{
		printf( "[-] Failed to initialize hidden pages.n" );
		Sleep( 5000 );
		return 0;
	}

	printf( "[+] Hidden pages created at 0x%llXn", Address );

	printf( "[*] Trying to access page before whitelisting: %sn", IsBadReadPtr( (void*)Address, 1 ) ? "Failed" : "Success" );

	if ( !Comm::WhitelistCurrentThread( ) )
	{
		printf( "[-] Failed to whitelist thread.n" );
		Sleep( 5000 );
		return 0;
	}

	printf( "[*] Trying to access page after whitelisting: %sn", IsBadReadPtr( (void*)Address, 1 ) ? "Failed" : "Success" );

	for(int i = 0; i <= 5; i++ )
	{
		Sleep( 50 );

		*(volatile int*)Address = i;
		printf( "[*] Read and written index %dn", *(volatile int*)Address );
	}

	printf( "[*] Done exitting...n" );

	Comm::Destroy( );
	Sleep( -1 );
	return 0;
}

Source: original article.

Animated demo of usermode PoC accessing a hidden shadow page after whitelisting the thread
Yumekage usermode demo: IsBadReadPtr goes from “Failed” to “Success” the moment the thread is whitelisted. Source: original article.
Animated WinDbg log output showing the SwapContext hook switching CR3 to the cloned PML4 for whitelisted threads
WinDbg log of the SwapContext hook firing on every schedule of the whitelisted thread. Source: original article.

Back to Vanguard and Valorant

The last section of the post is the most interesting bit for anyone thinking about how Vanguard actually hardens this in production. Two observations stand out.

First, a naive cheat — or a naive reverser — will hard-code the shadow base. 0x008000000000 is what came back from the IOCTL on Xyrem’s machine, so the obvious shortcut is to bake that constant into the cheat and call it a day. Vanguard’s production build clearly assumes attackers will do exactly that and randomises which free PML4 index gets reserved. A robust attacker has to find a free index dynamically. The skeleton, reproduced verbatim:

bool FindFreeIndex( _In_ pml4e_64* PML4, _Out_ int* FreeIndexOut )
{
	if ( !PML4 || !FreeIndexOut )
		return false;

	int FreeIndexes[ 256 ];
	int NumOfFreeIndexes = 0;

	for ( int i = 0; i < 256; i++ )
	{
		if ( PML4[i].flags )
			continue;

		FreeIndexes[ NumOfFreeIndexes++ ] = i;
	}

	if ( !NumOfFreeIndexes )
		return false;

	*FreeIndexOut = FreeIndexes[ __rdtsc( ) % (NumOfFreeIndexes + 1) ];
	return true;
}

Source: original article.

Second, by default any kernel allocation a driver makes is tracked in the Big Pool Table with a four-byte pool tag. A cheat driver that allocates its shadow pages naively will leave a known tag — Xyrem flags 'TnoC' as the obvious one for the simple “Cont”iguous allocation pattern — that a Vanguard self-scan can grep for. The cheat needs to remove its own pool entries from the table after allocation. The post offers a small helper that does exactly that by resolving ExRemovePoolTag via a hard-coded byte signature:

bool RemovePoolEntry( _In_ void* Allocation, _In_ POOL_TYPE Type )
{
	static void(*ExRemovePoolTag)(_In_ void* Alloc, _Out_ uint32_t* PoolTag, _Out_ uint64_t* Size, _In_ POOL_TYPE Type) = 0;
	if ( !ExRemovePoolTag )
	{
		uint64_t Addr = Utils::FindPattern( KernelBase, "\xE8\xCC\xCC\xCC\xCC\x4C\x8B\x4D\xCC\x49\x81\xF9\x00\x10\x00\x00" );
		
		if ( !Addr )
			return false;

		ExRemovePoolTag = decltype(ExRemovePoolTag)(Addr + *(int*)(Addr + 1) + 5);
	}

	uint32_t PoolTag = 0;
	uint64_t Size = 0;

	ExRemovePoolTag( Allocation, &PoolTag, &Size, Type );
	return true;
}

Source: original article.

Disclaimer: Very risky and unsafe, if something went wrong you will bugcheck!

Original article

Proof-of-Concept Source

Xyrem leaves the PoC source as an exercise/repository link rather than inlining the full driver. The pieces above — the SwapContext hook, the two IOCTLs, and the usermode demo — are the load-bearing parts of the implementation.

Special Thanks

The original post thanks the broader Windows-kernel reversing community whose prior work on SwapContext, KVA Shadow, and Vanguard’s earlier mitigations made the analysis tractable.

Key Takeaways

  • Guarded Regions is not a hypervisor trick — it’s native page-table manipulation. Vanguard reserves a previously-unused PML4 index inside the game process, builds a parallel PML4 with that one index pointing at the shadow region, and switches CR3 to the parallel PML4 only when one of its whitelisted threads is being scheduled.
  • The hook point is nt!SwapContext, which Windows runs on every thread context switch. That is also the point at which the kernel itself writes CR3 for cross-process switches — piggy-backing on that boundary keeps everything coherent.
  • The shadow base is just (FreePML4Index << 39). On Xyrem’s capture it’s 0x008000000000, but the production build randomises the index, so a robust cheat has to discover the index dynamically rather than hard-coding it.
  • The PoC (“Yumekage”) reduces the entire technique to ~3 IOCTLs and a SwapContext hook. Before whitelisting, IsBadReadPtr on the shadow address returns Failed; after whitelisting, the same address returns Success and is read/write addressable — same physical page either way, the difference is entirely which PML4 the CPU walks.
  • Default kernel allocations leak through pool-tag enumeration. Vanguard’s production build is presumed to scan the Big Pool Table for known cheat-driver tags such as 'TnoC'. A serious cheat driver removes its own entries via ExRemovePoolTag.
  • TLB consistency is non-trivial here. The technique relies on toggling CR4.PGE to force a global-pages flush after rewriting CR3, otherwise stale TLB entries would expose or hide the wrong mappings to the wrong threads.
  • The structural defence the technique relies on — “you can’t map what isn’t in the page table the CPU is walking” — is genuinely robust against external readers (other processes, DMA-card reads against the wrong CR3). It is not robust against another piece of ring-0 code that also hooks SwapContext first.

Defensive Recommendations

  • Enforce HVCI / Memory Integrity on every Windows workstation that runs a kernel-mode anti-cheat (or, more generally, that you do not want to host arbitrary self-signed drivers). HVCI is the one mitigation that meaningfully raises the bar against the “ship a kernel driver that hooks SwapContext” pattern, by enforcing W^X across kernel pages and disallowing unsigned driver loads.
  • Treat the existence of a third-party SwapContext hook as a high-severity signal. Production Windows builds do not have inline hooks on scheduler internals. An EDR (or a self-defending anti-cheat) that periodically reads the first bytes of nt!SwapContext and compares against the on-disk image can detect this entire class of trick cheaply.
  • If you maintain an anti-cheat or similar product: randomise everything an attacker would otherwise hard-code (the PML4 index, the shadow base, the pool tag), and remove your own pool entries via ExRemovePoolTag after allocation, exactly as the post recommends. The defensive code looks identical to the offensive code.
  • For the rest of the OS: the Big Pool Table is your friend on the detection side. Periodic enumeration of pool tags from a trusted EDR component — even a YARA rule over the kernel image at known offsets — catches the lazier end of this technique.
  • Audit kernel-mode CR3 writes via the kernel ETW Threat-Intelligence provider where available. The shape of the activity — thousands of SwapContext-shaped CR3 writes that don’t match a process-switch boundary — is detectable from kernel-mode telemetry the user-mode driver can’t patch.
  • Restrict who can install kernel drivers. The entire technique requires a kernel driver to be loadable. Driver Signature Enforcement, Microsoft’s vulnerable-driver blocklist, and an organisation-level driver allowlist together kill off the easy variants.
  • Do not rely on user-mode anti-cheat / EDR primitives alone for any structural defence. A user-mode process cannot meaningfully tell whether the page-table tree the CPU is walking is the one the OS thinks it set up. Structural defences against ring-0 attackers must themselves live in ring 0 (or beneath it via VBS/HVCI).
  • For research and red-team contexts: remember that the Vanguard-style “trustworthy ring 0” model breaks if another piece of ring-0 code runs alongside it. Detection should focus on the boundary — what loads, when, with what driver signature, and via what loader — rather than on the in-memory state, which the attacker controls.

Conclusion

The Guarded Regions technique is a clean example of an anti-cheat using the same primitives a kernel attacker would — SwapContext hooking, PML4 manipulation, deliberate TLB flushing — to enforce a structural memory boundary the OS itself does not give you. Xyrem’s post is interesting precisely because it reverses the trick in both directions: first showing how Vanguard hides game state from the rest of the system, and then showing that the same primitive can be rebuilt by an attacker to hide cheat state from Vanguard. The defensive lesson is also the engineering lesson — structural defences in this corner of Windows live in ring 0, and whoever loads first sets the rules.

Original text: “In-depth analysis on Valorant’s Guarded Regions” by Xyrem at reversing.info.

Comments are closed.