Executive Summary
Modern endpoint detection has moved its trustworthiness anchor from userland hooks to kernel-mode telemetry: kernel callbacks (ObRegisterCallbacks, PsSetCreate*NotifyRoutine, minifilters) and the Microsoft-Windows-Threat-Intelligence (ETW-Ti) provider with STACKWALK enabled. Together they collect the call stack at the syscall boundary inside the kernel, so traditional NTDLL-bypass tricks no longer hide the originating frame. The LACUNA Chain — introduced by Mohamed Alzhrani as Part II of his HookChain research — demonstrates that this kernel-collected call stack is still spoofable, and that every documented call-stack-based detection layer can be evaded at once.
The chain combines several primitives discovered by direct reverse engineering of ntdll!RtlVirtualUnwind, scanning .pdata sections of ntdll.dll, kernelbase.dll, win32u.dll and wow64.dll, and timing analysis of ETW-Ti’s APC-based stack collection: BYOUD-Gap (zero-modification stack spoofing through .pdata gaps), the ETW-Ti APC window attack, the Win32u NOP gap chain through 1,242 categorically whitelisted leaf positions, the Ghost Gadget — an uncovered JMP [RBX] at ntdll+0xFC47B, BYOUD-RT (runtime-adaptive RSP calculation), BYOUD-MF (single-frame RSP teleport via UWOP_PUSH_MACHFRAME), and parameter encryption via a hardware-breakpoint VEH handler. The end result is a six-layer call chain that is syntactically valid, semantically plausible, and forensically clean — tested to full bypass against Bitdefender and Kaspersky Endpoint Security. The remaining detection surface is behavioral kernel callback correlation, not stack inspection.

What This Paper Actually Contributes
Before we get into it, let me be upfront about what’s new here versus what I’m building on top of. I spent months in Ghidra reversing RtlVirtualUnwind, analyzing .pdata sections across multiple Windows DLLs, and testing against controlled detection setups. Here’s what came out of that:
- BYOUD-Gap — Call-stack spoofing that requires zero
.pdatamodification. I found it by reversing how the unwinder handles addresses that fall betweenRUNTIME_FUNCTIONrecords. These gaps exist in every Windows DLL and nobody was exploiting them. - ETW-Ti APC Window Attack — The timing gap between an ETW-Ti event firing and its APC-based stack collection is exploitable. I documented exactly how to control when the stack snapshot happens by manipulating thread alertable state.
- Parameter Encryption in BYOUD Context — Carrying over our Part I parameter encryption into the new BYOUD world. Syscall params are encrypted at staging and decrypted inside a hardware-breakpoint VEH handler right at the
syscallinstruction. - Win32u NOP Gap Chain + Ghost Gadget — I pulled
win32u.dllfrom my lab host and scanned every byte. Zero stack-pivot gadgets — just syscall stubs and 8-byte NOP gaps. Those 1,242 NOP gaps are perfect BYOUD-Gap leaf frames. I also found 1,031 ghost functions in ntdll and aJMP [RBX]gadget atntdll+0xFC47Binside one of them — a dual-use primitive nobody had documented. - kernelbase Semantic Ghost Proximity — 432 ghost functions in kernelbase, including a 238-byte ghost that ends exactly at
VirtualProtect‘s entry point. Fake frames here are semantically indistinguishable from a real VirtualProtect return site. - BYOUD-MF (Machine Frame RSP Teleport) — Found by decompiling
RtlVirtualUnwind: opcode 10 (UWOP_PUSH_MACHFRAME) reads RSP from the stack instead of computing a delta. FourKiUser*functions have this opcode. Place a fake 40-byte machine frame on the stack and you get arbitrary RSP teleport in a single frame. - BYOUD-RT (Runtime RSP Calculation) — Reads
TEB.StackBaseand current RSP at call time to compute the exact frame distance. No pre-calibration needed — works even in injected shellcode that doesn’t know its own stack depth. - wow64.dll Ghost Proximity — 22 ghost functions in wow64.dll.
Wow64PrepareForExceptionhas a 91-byte ghost ending at its entry — a fourth semantic layer for the chain. - Lab Measurements — Empirical results against controlled detection configurations showing exactly what beats what.
Where Part I Left Off

Part I demonstrated that 94% of analyzed EDR solutions have no hooks above the NTDLL subsystem layer. HookChain exploited this with three primitives:
- IAT manipulation — redirect API calls before they reach hooked stubs
- Dynamic SSN resolution — Halo’s Gate to find unhooked neighbors and derive correct syscall numbers
- Indirect syscalls — route execution through ntdll’s own
syscall;retgadget
These defeat EDRs that rely exclusively on userland NTDLL hooks. That was the gap in 2024.
EDR vendors responded — not by adding more userland hooks, but by moving their telemetry below user-mode entirely, into the kernel. The new telemetry doesn’t care that you bypassed ntdll. It sees your call at the kernel boundary and captures the stack at the moment it crosses.
That call stack is what Part II is about.
How EDRs Responded: The Kernel Telemetry Shift

Modern enterprise EDRs now collect behavior through two mechanisms that no user-mode manipulation can suppress.
Kernel Callbacks

The Windows kernel exposes registration APIs for kernel-mode drivers to receive synchronous notifications:
| Callback | What It Monitors | Bypassed by HookChain? |
|---|---|---|
ObRegisterCallbacks | Handle open/duplicate for processes and threads | No |
PsSetCreateProcessNotifyRoutine | Process creation/termination | No |
PsSetCreateThreadNotifyRoutine | Thread creation/termination | No |
PsSetLoadImageNotifyRoutine | DLL/image loads | No |
CmRegisterCallback | Registry operations | No |
Minifilter FltRegisterFilter | File system I/O | No |
These fire inside the kernel. No IAT manipulation, no SSN remapping, no indirect syscall suppresses them.
ETW-Ti: The Eyes Inside the Kernel

Microsoft-Windows-Threat-Intelligence (ETW-Ti) is a kernel-mode ETW provider. Unlike user-mode ETW which malware trivially suppresses by patching ntdll!EtwEventWrite, ETW-Ti events are generated inside the kernel at the moment of each security-sensitive operation:
KERNEL_THREATINT_TASK_ALLOCVM—NtAllocateVirtualMemoryKERNEL_THREATINT_TASK_PROTECTVM—NtProtectVirtualMemoryKERNEL_THREATINT_TASK_MAPVIEW—NtMapViewOfSectionKERNEL_THREATINT_TASK_QUEUEUSERAPC— APC queuingKERNEL_THREATINT_TASK_SETTHREADCONTEXT—NtSetContextThreadKERNEL_THREATINT_TASK_WRITEVM— cross-process memory writes
When STACKWALK mode is enabled, the kernel collects the full call stack and attaches it to each event. This is what kills HookChain-class evasion — the syscall still reaches the kernel, the kernel still fires the event, and your shellcode’s address appears in the collected stack.
The new problem: how to make that collected stack look legitimate.
x64 Stack Walking Internals: What EDRs Actually Read
To defeat call-stack collection, you need to understand exactly how it works. I spent a lot of time in Ghidra with ntdll.dll and ntoskrnl.exe to figure this out.
The Death of Frame Pointers on x64

RUNTIME_FUNCTION entries describe every covered function. Source: original article.On x86 (32-bit), EBP formed a linked list — every frame stored the previous frame’s base pointer. Spoofing that was trivial.
On x64, Microsoft eliminated RBP as a frame pointer. Instead, every function is described in the .pdata section.
The UNWIND_CODE operations that matter for spoofing:
| Operation | What It Does | RSP Delta |
|---|---|---|
UWOP_PUSH_NONVOL | Register push | +8 |
UWOP_ALLOC_SMALL | sub rsp, N*8+8 | +N*8+8 |
UWOP_ALLOC_LARGE | Large allocation | variable |
UWOP_SET_FPREG | Frame pointer set | 0 |
RtlVirtualUnwind traverses these codes in reverse for each frame, computing the RSP delta and locating the next return address. An attacker who manufactures fake frames must produce addresses that have valid RUNTIME_FUNCTION entries with correct UNWIND_CODEs — or the unwinder aborts and exposes the real stack.
The Critical Branch I Found in Ghidra

Disassembling ntdll!RtlVirtualUnwind (Windows 11 22H2, SHA256 verified), I identified a branch that changes everything:
RtlVirtualUnwind pseudocode (from Ghidra decompilation):
RuntimeFunction = RtlLookupFunctionEntry(ControlPc, &ImageBase, NULL);
if (RuntimeFunction == NULL) {
// No RUNTIME_FUNCTION for this address = "leaf function"
// Leaf functions never modify RSP
// Return address is simply at [RSP]
*EstablisherFrame = ContextRecord->Rsp;
ContextRecord->Rip = *(PULONG64)ContextRecord->Rsp;
ContextRecord->Rsp += 8; // just consume the return address
return NULL;
}
When RtlLookupFunctionEntry returns NULL — meaning the address has no RUNTIME_FUNCTION coverage — the unwinder treats it as a leaf function and advances RSP by exactly 8 bytes. It doesn’t crash. It doesn’t abort. It doesn’t flag anything. It just reads the next 8 bytes from RSP as the return address and moves on.
These uncovered “gaps” exist in every DLL. They are the spaces between one function’s end address and the next function’s begin address. This is the foundation of everything that follows.
How Sysmon Collects Stacks
SysmonDrv.sys registers ObRegisterCallbacks for process handle operations (Event ID 10). When the callback fires, it calls RtlWalkFrameChain with flag=1 (user-mode frames only). The collection is synchronous — it happens in the triggering thread at the exact moment of the operation. No race window here.
How ETW-Ti Collects Stacks (Different Mechanism)

ETW-Ti does not collect synchronously. My Ghidra analysis of the ETW-Ti callback path shows something interesting:
The APC is a USER_APC, not a KERNEL_APC. It only delivers when the thread enters an alertable wait. This timing gap is what we exploit later.
The Four Generations of Call-Stack Evasion


Before getting into my own work, here’s the progression of techniques by other researchers that I’m building on top of.
My contributions extend Generation 2 (BYOUD-Gap, Win32u NOP Gap Chain, Ghost Gadget), Generation 3 (ETW-Ti APC window), and Generation 4 (BYOUD-RT, parameter encryption, BYOUD-MF).
BYOUD-Gap: Zero-Modification Stack Spoofing
Every existing call-stack spoofing technique modifies something: return addresses (Gen 2/3), .pdata entries (Gen 4 BYOUD), or synthesizes fake RUNTIME_FUNCTION records. Each one leaves a forensic artifact.
BYOUD-Gap leaves no artifact because it modifies nothing.
The Core Idea
From the Ghidra analysis above: when RtlVirtualUnwind encounters an address with no RUNTIME_FUNCTION coverage, it treats it as a leaf and advances RSP by 8. Every Windows DLL has these uncovered address ranges between functions — the gap between one function’s EndAddress and the next function’s BeginAddress. These gaps are legitimate memory: part of the DLL image, mapped read-only, backed by the PE file.
Using Gaps as Bridge Frames

The gap address acts as a leaf “function.” When the unwinder encounters it:
- No
RUNTIME_FUNCTIONfound → treated as leaf - RSP advances by 8 (just the return address consumed)
- Control passes to the address at
[RSP]— which is the next frame in your chain
This gives you a free RSP-skip of 8 bytes per gap frame. Chain N gap frames and you consume N*8 bytes of stack, hiding N frames of real execution.
Gap Availability: What I Measured from Real Binaries

I extracted these DLLs from a Windows 10.00 lab host and ran .pdata gap analysis directly against the PE binaries:
| DLL | RUNTIME_FUNCTIONs | Gaps Found | Total Gap Bytes | Ghost Functions |
|---|---|---|---|---|
| ntdll.dll | 4,725 | 3,913 | 73,745 bytes | 1,031 (48,805 B) |
| win32u.dll | 1,244 | 1,243 | 9,960 bytes | 0 |
ntdll.dll gap breakdown (3,913 total):
- Small gaps (8–64 bytes): 2,847 gaps
- Medium gaps (65–256 bytes): 892 gaps
- Large gaps (257–1,468 bytes): 174 gaps
The Ghost Function Discovery
The most significant finding from this analysis: 1,031 of ntdll’s 3,913 gaps contain real executable code — 48,805 bytes of live, runnable instructions that have no .pdata RUNTIME_FUNCTION entry. I call these ghost functions.
The largest ghost function starts at ntdll+0x000F5004 with 1,468 bytes of code — clearly a functioning routine, not alignment filler. It just isn’t registered in .pdata.
Ghost functions appear to be compiler-generated helper routines, inlined thunks, or __declspec(nothrow) functions where the compiler deliberately omitted exception metadata.
Why ghost functions are the richest BYOUD-Gap positions:
- Stable code addresses that don’t shift with alignment changes between builds
- Recognizable to reverse engineers as “inside ntdll” — nothing anomalous
- The largest ghost function alone provides 183 distinct leaf-frame addresses
Why BYOUD-Gap Goes Undetected


BYOUD-Gap modifies nothing: no .pdata writes, no return-address changes, no gadget calls. Gap addresses are in signed DLLs. The call chain passes RtlVirtualUnwind validation because leaf-frame treatment is architecturally correct. Shadow stacks record gap addresses as valid leaf exits. Gap locations predate the attack — they’re not allocated, not written, not synthesized.
ETW-Ti APC Window Attack
The Ghidra analysis confirmed that ETW-Ti stack collection uses USER_APC queuing — not synchronous collection. Between the kernel returning to user-mode (T+3) and your thread entering an alertable state (T+5), your thread is executing normally with no monitoring looking at its stack.
The call stack that gets collected at T+6 is whatever your stack looks like at T+5 — not what it looked like at T+0 when the operation occurred.
The Attack Flow

T+0: Your code calls NtAllocateVirtualMemory(shellcode_address, ...)
T+1: Syscall instruction fires, kernel sees the operation
T+2: ETW-Ti event queues a USER_APC callback to your thread
T+3: Kernel returns to user-mode; your thread resumes executing
T+4–T+5: Your thread is running in the scheduler — clean execution, no hooks active
T+5: Your thread happens to call NtDelayExecution(..., alertable=TRUE)
T+6: Thread enters alertable wait, queued APCs deliver, RtlWalkFrameChain collects the stack
T+7: ETW event is logged with the collected stack
For more precise control, you can suppress APC delivery entirely during sensitive operations by keeping the thread in a non-alertable state. APCs just pile up in the queue. Then you clean your stack, enter an alertable wait, and all the queued ETW-Ti APCs fire — seeing nothing but a legitimate call chain.
Combining with BYOUD-Gap
For the strongest variant: use BYOUD-Gap to construct a synthetic call chain before entering NtDelayExecution. The APC delivers into a BYOUD-Gap-constructed frame chain where every address is in a signed DLL, every frame passes RtlVirtualUnwind traversal, and no .pdata modification exists.
The ETW-Ti event records the right operation. The collected stack shows kernelbase!BaseThreadInitThunk → [gap frames] → NtAllocateVirtualMemory. Clean.
Limitation: This requires the shellcode to control the call chain when NtDelayExecution is called — trivially achievable for injected code running in a thread you control, harder for shellcode in a hijacked thread with an existing stack.
The CET Wall and BYOUD

Intel CET (Control-flow Enforcement Technology) introduces a hardware-maintained, read-only shadow stack. Every CALL pushes the return address to both RSP and the shadow stack. Every RET validates they match. Mismatch → #CP fault.
This breaks everything in Gen 2 and Gen 3. They all manipulate return addresses on the RSP stack, which no longer matches the shadow stack.
BYOUD (klezVirus, Black Hat Europe 2025) solves this by manipulating .pdata unwind metadata instead. CET validates return addresses. CET does not validate .pdata. They are separate systems.
I don’t repeat the full BYOUD derivation — that’s klezVirus’s work. What I add are the extensions below.
BYOUD-RT: Runtime Adaptive Variant
Every published BYOUD variant requires knowing the RSP distance from the thread entry point to the current frame before constructing the fake chain. In practice this means pre-calibration: measure distances in a test environment and hard-code them.
Pre-calibration fails when:
- Shellcode is injected into a thread at unknown stack depth
- The caller’s stack depth varies at runtime
- A reflective loader creates threads with non-standard stack layouts
BYOUD-RT computes the RSP distance at call time using the Thread Environment Block. TEB.StackBase (GS:[0x08]) gives you the highest stack address, and _AddressOfReturnAddress() + 8 gives you the current RSP. The difference is your total consumed stack — the exact distance you need for the BYOUD bridge frame.
I verified that TEB.StackBase is reliable across every common injection method:
| Injection Method | TEB.StackBase Accurate? |
|---|---|
NtCreateThreadEx (fresh thread) | Yes — set by kernel |
NtSetContextThread (thread hijack) | Yes — thread’s own TEB |
NtQueueUserAPC (APC injection) | Yes — runs in target thread’s TEB |
| Reflective DLL Injection | Yes — loads into existing thread |
| Process Hollowing | Yes — main thread TEB preserved |
This makes BYOUD work in any injected context without pre-calibration.
Win32u NOP Gap Chain + The Ghost Gadget
Two original discoveries from direct binary analysis of win32u.dll and ntdll.dll extracted from my lab host.
What win32u.dll Actually Contains

I extracted win32u.dll and scanned its entire executable section for stack-pivot gadgets (add rsp,N; ret, jmp [rbx], jmp [rax]).
Result: zero gadgets. Every byte in the .text section is one of:
- 24-byte win32k syscall stubs (1,244 stubs, SSNs
0x1000–0x14DB) - 8-byte alignment NOPs between stubs
No function prologues, no matching epilog gadgets. Zero.
What win32u CAN Do: The 1,242 NOP Gap Chain
Although win32u has no stack-pivot gadgets, it has 1,242 perfectly uniform, deterministically whitelisted leaf-frame positions — the 8-byte NOPs between every pair of syscall stubs.
Each NOP gap address is simultaneously:
- Whitelisted — inside
win32u.dll, explicitly excluded from all current module-of-origin rules - Leaf frame — no
RUNTIME_FUNCTIONcovers it, RSP advances exactly 8 bytes - Stable — same relative position between same-SSN stubs across builds
Win32u NOP Gap Chain:
RSP → [win32u NOP gap #1] ← leaf, RSP+=8
[win32u NOP gap #2] ← leaf, RSP+=8
[win32u NOP gap #3] ← leaf, RSP+=8
...×N...
[ntdll!RtlUserThreadStart+0x21] ← thread root
Frame summary seen by Elastic detection rule:
win32u.dll | ... | ntdll.dll
Categorically WHITELISTED, not evaded probabilistically.
For a 256-byte hide: chain 32 NOP gaps.
No arithmetic errors possible — gap size is always exactly 8.
The Ghost Gadget: Uncovered JMP[RBX] in ntdll
My .pdata gap classification revealed that 1,031 of ntdll’s 3,913 gaps contain real executable code with no exception-table coverage. Scanning all ghost function content for gadgets found a JMP [RBX] at ntdll+0x000FC47B, inside an 80-byte ghost function:
ntdll+0x000FC46C [ghost function — no RUNTIME_FUNCTION entry]
+0x000 CC CC CC CC CC CC CC CC CC int3 × 9
+0x009 48 83 41 28 FE dec qword ptr [rcx+28h]
+0x00E 41 B8 FF 23 00 00 mov r8d, 23FFh
+0x014 FF 23 JMP [RBX] ← RVA 0x000FC47B
+0x016 ...
ntdll+0x000FC4BC [next RUNTIME_FUNCTION begins here]
This ghost gadget has a property no previously documented gadget possesses — it’s a dual-use primitive:
| Property | Normal JMP[RBX] | Ghost JMP[RBX] at ntdll+0xFC47B |
|---|---|---|
| Inside signed DLL | Yes | Yes |
| Has RUNTIME_FUNCTION | Yes — unwinder processes it | No — unwinder skips as leaf (RSP+8) |
| .pdata forensic artifact | Gadget RVA in named function | Nothing — no .pdata entry |
When used as a BYOUD-Gap bridge frame:
- Real execution:
CALL ntdll+0xFC47B→JMP [RBX]→ jumps to target - Shadow stack: records
ntdll+0xFC47Bas return site (normal CET behavior) - Traditional unwind walk: treats address as leaf, advances RSP+8, never sees the JMP
Execution redirect AND zero-artifact bridge frame in one address.
kernelbase Semantic Ghost Proximity
Extending the analysis to kernelbase.dll reveals a qualitatively different attack surface: ghost functions that sit right next to high-value Win32 API exports.
Binary Survey: ntdll vs kernelbase
| Metric | ntdll | kernelbase |
|---|---|---|
| RUNTIME_FUNCTIONs | 4,725 | 4,992 |
| .pdata gaps | 3,913 | 3,982 |
| Ghost functions | 1,031 | 432 |
| Ghost function bytes | 48,805 | 51,577 |
add rsp,0x28;ret gadgets | 182 | 442 (2.4x more) |
jmp [rbx] gadgets | 3 | 12 (4x more) |
kernelbase is the superior gadget source — and frames showing kernelbase internals are less scrutinized by detection rules than ntdll.
The Semantic Proximity Attack
When ETW-Ti fires on NtProtectVirtualMemory, the expected call stack contains the Win32 wrapper directly above the syscall:
Expected legitimate stack:
[0] ntdll!NtProtectVirtualMemory ← syscall stub
[1] kernelbase!VirtualProtect+0x?? ← Win32 wrapper
[2] [caller code] ← whoever called VirtualProtect
A BYOUD-Gap chain using the 238-byte ghost that ends at VirtualProtect‘s entry point produces:
BYOUD-Gap fake stack:
[0] ntdll!NtProtectVirtualMemory ← real syscall stub
[1] kernelbase+0x642xx ← ghost frame (leaf, RSP+8)
[2] ntdll!RtlUserThreadStart+0x21 ← thread root
Frame [1] resolves to "20 bytes before kernelbase!VirtualProtect"
— indistinguishable from a real VirtualProtect return site.
Ghost Functions Near High-Value Exports
| Ghost Region | Size | Distance | Adjacent API |
|---|---|---|---|
kernelbase+0x64132–0x6421F | 238 bytes | 0x0 | VirtualProtect (ends at VP entry) |
kernelbase+0xfe4e0–0xfe51F | 64 bytes | 0x90 | CreateRemoteThread |
kernelbase+0x41f6d–0x41f8F | 35 bytes | 0x60 | VirtualAllocEx |
kernelbase+0x5a720–0x5a78F | 112 bytes | 0x1E0 | VirtualAlloc |
kernelbase+0xf9ad4–0xf9b43 | 112 bytes | 0x364 | WriteProcessMemory |
The VirtualProtect ghost is the most forensically convincing BYOUD-Gap position across all analyzed binaries: 238 usable addresses, inside a signed DLL, adjacent to an API that legitimately appears in injection call stacks.
A second ghost gadget (JMP [RBX] at kernelbase+0xC4EA2) provides a second dual-use primitive.
Multi-DLL Ghost Chain
The strongest BYOUD-Gap chain draws from both DLLs:
Optimal multi-DLL BYOUD-Gap chain:
[0] NtProtectVirtualMemory ← real syscall stub
[1] kernelbase+0x6420A ← ghost in VirtualProtect's shadow
[2] kernelbase+0x64200 ← second ghost position (staggered)
[3] ntdll+0x000F5040 ← ntdll ghost function (1,468B)
[4] ntdll!RtlUserThreadStart+0x21 ← thread root
What an analyst sees:
NtProtectVirtualMemory ← VirtualProtect-area ← ntdll internals ← thread start
Indistinguishable from a real VirtualProtect call chain.
BYOUD-MF: Machine Frame RSP Teleport
All previous BYOUD-Gap variants advance RSP in small 8-byte increments. BYOUD-MF is fundamentally different — it teleports RSP to an arbitrary value in a single frame.
What I Found in RtlVirtualUnwind

Decompiling RtlVirtualUnwind reveals a handler for UNWIND_CODE opcode 10 (UWOP_PUSH_MACHFRAME) that nobody had exploited before:
The handler reads the current RSP value, adds an offset from the unwind info, and updates the context RSP to that computed value. This is the RSP teleportation primitive.
The Four KiUser* RUNTIME_FUNCTIONs
Binary scan of ntdll’s .pdata (4,736 entries) found exactly 4 functions with UWOP_PUSH_MACHFRAME:
| Function | RVA Range | Prolog Offset |
|---|---|---|
KiUserApcDispatcher | 0xa3f20–0xa3f95 | 0x00 |
KiUserCallbackDispatcher | 0xa4030–0xa406b | 0x00 |
KiUserExceptionDispatcher | 0xa4080–0xa40dc | 0x00 |
| Unnamed dispatcher | 0xa4880–0xa4a3e | 0x00 |
prolog_offset=0x00 means any PC within these functions triggers the handler. No need to target a specific byte.
Fake Machine Frame Structure

Place this 40-byte structure on the stack:
FAKE_MACHFRAME (op_info=0, no error code):
Offset 0: RIP (8 bytes)
Offset 8: CS (8 bytes)
Offset 16: RFLAGS (8 bytes)
Offset 24: RSP (8 bytes)
Offset 32: SS (8 bytes)
When RtlVirtualUnwind processes this frame with UWOP_PUSH_MACHFRAME, it reads the RSP field (offset 24) and sets ContextRecord->Rsp to that value.
Comparison to Everything Else
| Technique | RSP Change per Frame | .pdata Write | Gadget | CET | Forensic Artifact |
|---|---|---|---|---|---|
| SilentMoonwalk Desync | RSP += N (gadget) | No | Yes | No | Gadget offsets |
| BYOUD (klezVirus) | Delta from UNWIND_INFO | Yes | No | Yes | Modified .pdata |
| BYOUD-Gap | RSP += 8 | No | No | Yes | Address in gap |
| BYOUD-MF | RSP = any value | No | No | Yes | 40-byte struct |
BYOUD-MF is the only technique that achieves arbitrary RSP assignment in a single frame without modifying .pdata and without a gadget.
Parameter Encryption in the BYOUD Context
In Part I, I introduced parameter encryption: encrypting syscall parameters before the call and decrypting them at the syscall instruction via a hardware-breakpoint VEH handler.
Here I extend this into the BYOUD context. The combination addresses two orthogonal detection surfaces:
- BYOUD-Gap / BYOUD-RT / Win32u chain: defeats call-stack inspection (who called)
- Parameter encryption: defeats parameter inspection (what was called with)
How It Works

The challenge: parameters can’t stay encrypted all the way to the kernel. The kernel must receive real values. So you decrypt at the last possible moment — inside a VEH handler that fires on a hardware breakpoint at the syscall instruction.
Parameter Encryption Flow:
1. Before syscall:
rax = encrypted(NtAllocateVirtualMemory opcode)
rcx = encrypted(BaseAddress)
rdx = encrypted(RegionSize)
r8 = encrypted(AllocType)
r9 = encrypted(Protect)
2. Hardware breakpoint fires at `syscall` instruction
3. VEH handler decrypts all registers
4. Control returns to `syscall`
5. Kernel receives plaintext parameters
Where Parameter Encryption Actually Helps
ETW-Ti records parameters after the kernel has them — so the kernel-side event has the decrypted values. But parameter encryption helps against:
- User-mode hook intercept — any remaining hook sees encrypted parameters
- Memory scanning — parameters stored encrypted at rest;
PAGE_EXECUTE_READWRITE = 0x40never appears in memory - Call-trace parameter logging — EDR rules logging parameters at hook intercept get ciphertext
Combined with BYOUD: the stack is spoofed (who called) and parameters are opaque at rest (what was prepared).
The LACUNA Chain: A Named Zero-Detection Technique
I named the complete technique LACUNA Chain — after the Latin lacuna (pl. lacunae): a gap, void, or absent part. Every frame in the chain inhabits a lacuna: an executable code region that exists in memory but has no .pdata coverage. Lacunae are invisible to the unwind machinery, leave no forensic artifact, and — when chosen near high-value exports — are semantically indistinguishable from legitimate call-site return addresses.
wow64.dll: A Fourth Semantic Layer
Binary analysis of wow64.dll adds a fourth DLL to the semantic stack. Wow64PrepareForException has a 91-byte ghost ending exactly at its entry point. Ghidra confirms the function checks CS == 0x23 (x86 WoW64 selector) before copying a 32-bit exception context. A fake frame in this ghost region appears to an EDR as code executing immediately before WoW64 exception preparation — completely routine in any WoW64-enabled process.
The Six-Layer LACUNA Chain

| Layer | Frame Address | DLL | Type | Size | Semantic Cover |
|---|---|---|---|---|---|
| 0 | RtlVirtualUnwind+0x21 | ntdll | Real syscall stub | N/A | Exception context copy |
| 1 | +0x177C0 | wow64 | Ghost function | 91 bytes | WoW64 exception prep |
| 2 | +0x64180 | kernelbase | Ghost function | 238 bytes | Before VirtualProtect |
| 3 | +0x50F80 | ntdll | Ghost function | 1,468 bytes | Near RtlCreateUserThread |
| 4 | +0x(NOP) | win32u | NOP gap | 8 bytes | Win32k syscall boundary |
| 5 | RtlUserThreadStart+0x21 | ntdll | Real function | N/A | Thread start routine |
What an EDR Analyst Sees

Walking the LACUNA Chain produces:
Frame 0: ntdll!KiUserExceptionDispatcher+4 [exception context]
Frame 1: wow64+0x177C0 [WoW64 exception prep]
Frame 2: kernelbase+0x64180 [before VirtualProtect]
Frame 3: ntdll+0x50F80 [near RtlCreateUserThread]
Frame 4: win32u+0x(NOP gap) [excluded by whitelist]
Frame 5: ntdll!RtlUserThreadStart+0x21 [thread root]
A defender reading this sees:
"A thread that went through VirtualProtect from thread-creation code,
passed through WoW64 exception preparation, and originated at the
standard thread start."
All frames in signed Microsoft binaries.
No .pdata modification.
Syntactically valid, semantically plausible, forensically clean.
Detection Surface

| Detection Layer | Status | Why |
|---|---|---|
| Module-of-origin check | EVADED | All frames in ntdll / kernelbase / wow64 / win32u |
| Unwind walk correctness | EVADED | All lacuna frames are leaf → valid RSP+8 |
| .pdata forensic scan | EVADED | Zero modification; gaps are pre-existing |
| CET shadow stack | EVADED | Pure leaf chain; shadow stack not consulted |
| Semantic frame analysis | EVADED | WoW64 exception + VirtualProtect adjacency |
| Win32u rule exemption | EVADED | Layer 4 explicitly excluded by all rules |
| ETW-Ti STACKWALK | EVADED | APC window attack controls snapshot timing |
| Parameter inspection | EVADED | HW breakpoint VEH decryption |
| Kernel callbacks | PARTIAL | Handle operations still fire ObRegCallbacks |
The LACUNA Chain defeats all EDR layers of call-stack-based detection. Kernel callbacks remain the only surviving telemetry source — and they provide behavioral signals, not stack-based signatures, forcing defenders into a much harder tuning tradeoff with higher false-positive rates.
Lab Measurement Results

I tested each layer in isolation and the composite technique against controlled detection configurations. Lab: Windows 11 22H2, CET enabled per-process, Sysmon v15 full config, ETW-Ti consumer with STACKWALK flag.
| Technique | Sysmon Event 10 | ETW-Ti STACKWALK | Shadow/User Compare | Kernel Callbacks | Net |
|---|---|---|---|---|---|
| Direct syscall | DETECTED | DETECTED | DETECTED | DETECTED | Caught |
| HookChain (Part I only) | DETECTED | DETECTED | DETECTED | DETECTED | Caught |
| Naive frame push | Evaded | DETECTED | DETECTED | DETECTED | Caught |
| SilentMoonwalk Desync | Evaded | Evaded | DETECTED | DETECTED | Partial |
| VEH + HW BP | Evaded | Evaded | DETECTED | DETECTED | Partial |
| BYOUD (klezVirus) | Evaded | Evaded | Partial | DETECTED | Partial |
| BYOUD-Gap (ours) | Evaded | Evaded | Evaded | DETECTED | Near-full |
| BYOUD-MF (ours) | Evaded | Evaded | Partial | DETECTED | Near-full |
| Full LACUNA Chain | Evaded | Evaded | Evaded | DETECTED | Near-full |
BYOUD-Gap leaves shadow/user stacks consistent — no RSP manipulation means no frame count divergence. The only remaining signal is behavioral kernel callback correlation, which requires behavioral rules with significantly higher false-positive rates than call-stack rules.
Detection Engineering: What Catches What
I’m putting this section in because I think offense and defense should live in the same paper. If you’re a defender, here’s what you need to know.
Rules That Are Dead
Stop investing in these — they’ve been defeated since Gen 2:
# DEAD: module-of-origin first-frame only
not call_trace startswith "ntdll.dll"
# DEAD: whitelisted modules
not call_trace startswith ("ntdll.dll", "win32u.dll", "wow64cpu.dll")
The Win32u NOP Gap Chain means the “win32u.dll whitelisted” rule is weaponized against the defender.
What Actually Works


New detection specific to BYOUD-Gap: A “gap” address is one inside a DLL’s mapped range but between two RUNTIME_FUNCTION entries. Legitimate programs almost never have gap addresses in their call chains. An address in a .pdata gap is highly anomalous in a call trace, even though it’s inside a signed DLL. No public EDR implements this yet.
Effective detection layers moving forward:
- Enumerate
.pdataat module load time - Flag call-trace frames in
.pdatagaps - Correlate parameter sequences across syscalls
- Monitor memory allocation patterns before code execution
- Baseline behavioral thread signatures per process class
Challenges for EDR Solutions

Modern EDRs defend in layers: kernel callbacks (ObRegisterCallbacks) intercept handle acquisition, heuristic engines flag dangerous memory permissions, behavioral correlators match syscall sequences, and execution-origin rules kill code running from anonymous pages. Against Bitdefender alone, we triggered five distinct detection layers before achieving full bypass — handle access rights, RWX page allocation, anonymous-memory execution, the AllocVM + WriteVM + ProtectVM + QueueAPC sequence correlator, and payload behavior in the target process.
Not one of those five detections was the call stack.
The LACUNA Chain — the ghost-frame spoofing, the BYOUD-MF teleport, the win32u NOP gap, the ETW-Ti APC window — stayed clean across every test, against every product. The call-stack layer was never the reason we got caught. Every detection came from a different surface: how the handle was opened, how memory was allocated, what syscall sequence preceded the APC, or what the shellcode did after landing.
HookChain exposed that 94% of EDR solutions do not hook the subsystem layer above NTDLL. LACUNA Chain exploits a deeper blind spot: .pdata lacunae — executable regions inside signed DLLs with no exception-handling metadata. These ghost regions are invisible to RtlLookupFunctionEntry, absent from any hook table, and structurally indistinguishable from legitimate leaf functions during stack unwinding.
This gap cannot be closed by adding more hooks. Every layer in the chain — wow64.dll, kernelbase.dll, win32u.dll, and .pdata gap regions — sits in address ranges that are structurally invisible to current call-stack inspection. Closing it requires enumerating .pdata gaps at runtime and flagging any call-trace frame that lands in a gap. No production EDR does this today.
Real-World Results
The LACUNA Chain injector was tested against enterprise EDR solutions in a controlled lab environment. Both targets were running current signature and behavioral engine versions at time of testing.
Bitdefender — full bypass, shellcode executed without detection.

Kaspersky Endpoint Security — full bypass, shellcode executed without detection.

The proof-of-concept implementation is available at github.com/MazX0p/LACUNA-Chain.
Key Takeaways
- Kernel-mode telemetry is the new battleground. Userland-hook bypass alone no longer hides syscalls — ETW-Ti and kernel callbacks see the operation regardless.
.pdatagaps are architecturally invisible.RtlVirtualUnwindtreats any address without aRUNTIME_FUNCTIONentry as a leaf and advances RSP by exactly 8 bytes — the legal opening that BYOUD-Gap exploits.- Ghost functions are high-value primitives. 1,031 ghost functions in ntdll (48,805 bytes) and 432 in kernelbase provide stable, signed-DLL leaf-frame positions; some sit immediately before
VirtualProtect,VirtualAlloc, andWriteProcessMemory. - win32u.dll is categorically whitelisted. 1,242 NOP gaps between syscall stubs are excluded from every module-of-origin rule and deliver perfect 8-byte RSP transitions.
- ETW-Ti collects via USER_APC. The thread’s alertable-state transitions decouple “when the event fired” from “what the stack looks like” — a window the attacker controls.
- BYOUD-MF teleports RSP in one frame. Four
KiUser*dispatchers carryUWOP_PUSH_MACHFRAME; a 40-byte fake machine frame setsContextRecord->Rspto any value without modifying.pdataor using a gadget. - BYOUD-RT removes pre-calibration. Reading
TEB.StackBaseat call time makes the technique work in arbitrary injection contexts. - The full chain bypassed Bitdefender and Kaspersky. Every catch was on a non-stack signal — handle rights, RWX allocations, syscall sequence correlation — never the call stack.
Defensive Recommendations
- Enumerate
.pdataat module-load time and flag call-trace frames that land in gaps. This is the single new signal that breaks BYOUD-Gap. Implement it in the EDR sensor; no production product ships this today. - Stop trusting whitelists keyed on module-of-origin. The win32u NOP gap chain weaponizes the
win32u.dll/wow64cpu.dllexemptions used by Elastic-style rules. Strip those exemptions or pair them with a.pdatagap check. - Correlate syscall sequences, not single calls. Bitdefender’s
AllocVM + WriteVM + ProtectVM + QueueAPCchain detector caught the operation even when the stacks were clean — behavioral correlators are the surviving layer. - Measure ETW-Ti APC queue depth. An unusual count of queued ETW-Ti APCs before a single alertable wait (>3) is an anomaly indicator that no current EDR publishes.
- Audit handle-access patterns via
ObRegisterCallbacks. Kernel callbacks are the only survivor against LACUNA Chain; tune them on requested handle rights, not on the apparent caller. - Baseline thread call-stack signatures per process class. Even a chain that passes module-of-origin and unwind-walk checks will diverge statistically from real call patterns — learn the distribution and alert on outliers.
- Hunt for Win32k shadow-SSDT paths reaching sensitive operations. If
ObRegisterCallbacksdoes not fire on certain win32k paths, that is where the next bypass will live. - Treat the call stack as one signal, not the anchor. Until production sensors enumerate
.pdatagaps at runtime, stack-based rules are unreliable and must be combined with parameter, sequence, and origin telemetry.
Conclusion: The Arms Race Moved Deeper
Part I showed that userland hook bypass defeats a huge fraction of EDR deployments. That gap was real in 2024.
Part II shows what the next layer looks like.
BYOUD-Gap achieves call-stack spoofing with zero .pdata modification. The ETW-Ti APC window lets you control when the stack snapshot happens. Win32u’s 1,242 NOP gaps provide categorically whitelisted leaf frames. Ghost functions in ntdll and kernelbase provide semantically convincing cover. BYOUD-MF enables arbitrary RSP teleport in a single frame. BYOUD-RT makes everything work without pre-calibration. Parameter encryption makes the arguments opaque at rest.
Put them all together and you get the LACUNA Chain — syntactically valid, semantically plausible, forensically clean. The only remaining reliable detection is behavioral kernel callback correlation.
Three Open Problems
These are what the next researcher should focus on:
1. Gap-address detection at scale. Flagging call-trace frames in .pdata uncovered ranges is theoretically sound but nobody has built a production-quality implementation. It requires enumerating .pdata gaps for all loaded DLLs at runtime and cross-referencing every call-trace address. Feasible, but not trivial.
2. ETW-Ti APC queue depth monitoring. If the EDR can measure how many ETW-Ti APCs were queued before the alertable-wait that delivered them, an unusual count (>3 before a single wait) is anomalous. No current EDR publishes this signal.
3. Win32k shadow SSDT surface. Which sensitive operations are reachable via win32k syscalls, and whether ObRegisterCallbacks fires on those paths, remains unmeasured publicly. That’s where the next bypass may live.
The arms race didn’t end with HookChain. It moved to a deeper layer each time a layer was closed. The call stack is no longer trustworthy. The .pdata section is no longer trustworthy. The only anchor that currently holds is behavioral correlation — and that’s where the next attack will focus.
Original text: “LACUNA Chain: Ghost Frames — defeats all EDR layers of call-stack-based detection” by Mohamed Alzhrani (@0xmaz) at 0xmaz.me. Licensed CC BY 4.0.

