LACUNA Chain: Ghost Frames Defeat Every Layer of EDR Call-Stack Detection

Original text: “LACUNA Chain: Ghost Frames — defeats all EDR layers of call-stack-based detection” — Mohamed Alzhrani (@0xmaz), 0xmaz.me (Jun 20, 2026). Licensed CC BY 4.0. Tables, code listings, diagrams and figures below are reproduced verbatim with attribution captions.

Executive Summary

Modern endpoint detection has moved its trustworthiness anchor from userland hooks to kernel-mode telemetry: kernel callbacks (ObRegisterCallbacks, PsSetCreate*NotifyRoutine, minifilters) and the Microsoft-Windows-Threat-Intelligence (ETW-Ti) provider with STACKWALK enabled. Together they collect the call stack at the syscall boundary inside the kernel, so traditional NTDLL-bypass tricks no longer hide the originating frame. The LACUNA Chain — introduced by Mohamed Alzhrani as Part II of his HookChain research — demonstrates that this kernel-collected call stack is still spoofable, and that every documented call-stack-based detection layer can be evaded at once.

The chain combines several primitives discovered by direct reverse engineering of ntdll!RtlVirtualUnwind, scanning .pdata sections of ntdll.dll, kernelbase.dll, win32u.dll and wow64.dll, and timing analysis of ETW-Ti’s APC-based stack collection: BYOUD-Gap (zero-modification stack spoofing through .pdata gaps), the ETW-Ti APC window attack, the Win32u NOP gap chain through 1,242 categorically whitelisted leaf positions, the Ghost Gadget — an uncovered JMP [RBX] at ntdll+0xFC47B, BYOUD-RT (runtime-adaptive RSP calculation), BYOUD-MF (single-frame RSP teleport via UWOP_PUSH_MACHFRAME), and parameter encryption via a hardware-breakpoint VEH handler. The end result is a six-layer call chain that is syntactically valid, semantically plausible, and forensically clean — tested to full bypass against Bitdefender and Kaspersky Endpoint Security. The remaining detection surface is behavioral kernel callback correlation, not stack inspection.

LACUNA Chain — EDR Detection Layer Coverage — LACUNA Chain — EDR detection layer coverage matrix. *Source: original article.*

What This Paper Actually Contributes

Before we get into it, let me be upfront about what’s new here versus what I’m building on top of. I spent months in Ghidra reversing RtlVirtualUnwind, analyzing .pdata sections across multiple Windows DLLs, and testing against controlled detection setups. Here’s what came out of that:

BYOUD-Gap — Call-stack spoofing that requires zero .pdata modification. I found it by reversing how the unwinder handles addresses that fall between RUNTIME_FUNCTION records. These gaps exist in every Windows DLL and nobody was exploiting them.
ETW-Ti APC Window Attack — The timing gap between an ETW-Ti event firing and its APC-based stack collection is exploitable. I documented exactly how to control when the stack snapshot happens by manipulating thread alertable state.
Parameter Encryption in BYOUD Context — Carrying over our Part I parameter encryption into the new BYOUD world. Syscall params are encrypted at staging and decrypted inside a hardware-breakpoint VEH handler right at the syscall instruction.
Win32u NOP Gap Chain + Ghost Gadget — I pulled win32u.dll from my lab host and scanned every byte. Zero stack-pivot gadgets — just syscall stubs and 8-byte NOP gaps. Those 1,242 NOP gaps are perfect BYOUD-Gap leaf frames. I also found 1,031 ghost functions in ntdll and a JMP [RBX] gadget at ntdll+0xFC47B inside one of them — a dual-use primitive nobody had documented.
kernelbase Semantic Ghost Proximity — 432 ghost functions in kernelbase, including a 238-byte ghost that ends exactly at VirtualProtect‘s entry point. Fake frames here are semantically indistinguishable from a real VirtualProtect return site.
BYOUD-MF (Machine Frame RSP Teleport) — Found by decompiling RtlVirtualUnwind: opcode 10 (UWOP_PUSH_MACHFRAME) reads RSP from the stack instead of computing a delta. Four KiUser* functions have this opcode. Place a fake 40-byte machine frame on the stack and you get arbitrary RSP teleport in a single frame.
BYOUD-RT (Runtime RSP Calculation) — Reads TEB.StackBase and current RSP at call time to compute the exact frame distance. No pre-calibration needed — works even in injected shellcode that doesn’t know its own stack depth.
wow64.dll Ghost Proximity — 22 ghost functions in wow64.dll. Wow64PrepareForException has a 91-byte ghost ending at its entry — a fourth semantic layer for the chain.
Lab Measurements — Empirical results against controlled detection configurations showing exactly what beats what.

Where Part I Left Off

HookChain Part I vs LACUNA Chain Part II. *Source: original article.*

Part I demonstrated that 94% of analyzed EDR solutions have no hooks above the NTDLL subsystem layer. HookChain exploited this with three primitives:

IAT manipulation — redirect API calls before they reach hooked stubs
Dynamic SSN resolution — Halo’s Gate to find unhooked neighbors and derive correct syscall numbers
Indirect syscalls — route execution through ntdll’s own syscall;ret gadget

These defeat EDRs that rely exclusively on userland NTDLL hooks. That was the gap in 2024.

EDR vendors responded — not by adding more userland hooks, but by moving their telemetry below user-mode entirely, into the kernel. The new telemetry doesn’t care that you bypassed ntdll. It sees your call at the kernel boundary and captures the stack at the moment it crosses.

That call stack is what Part II is about.

How EDRs Responded: The Kernel Telemetry Shift

Why HookChain Is No Longer Enough — Why HookChain is no longer enough — the new ETW-Ti and kernel-callback telemetry surface. *Source: original article.*

Modern enterprise EDRs now collect behavior through two mechanisms that no user-mode manipulation can suppress.

Kernel Callbacks

Kernel Callback Architecture — Kernel callback architecture. *Source: original article.*

The Windows kernel exposes registration APIs for kernel-mode drivers to receive synchronous notifications:

Callback	What It Monitors	Bypassed by HookChain?
`ObRegisterCallbacks`	Handle open/duplicate for processes and threads	No
`PsSetCreateProcessNotifyRoutine`	Process creation/termination	No
`PsSetCreateThreadNotifyRoutine`	Thread creation/termination	No
`PsSetLoadImageNotifyRoutine`	DLL/image loads	No
`CmRegisterCallback`	Registry operations	No
Minifilter `FltRegisterFilter`	File system I/O	No

Kernel callback surface that HookChain does not address. Source: original article.

These fire inside the kernel. No IAT manipulation, no SSN remapping, no indirect syscall suppresses them.

ETW-Ti: The Eyes Inside the Kernel

ETW-Ti STACKWALK Killing HookChain — ETW-Ti STACKWALK kills HookChain — the kernel still sees the originating frame. *Source: original article.*

Microsoft-Windows-Threat-Intelligence (ETW-Ti) is a kernel-mode ETW provider. Unlike user-mode ETW which malware trivially suppresses by patching ntdll!EtwEventWrite, ETW-Ti events are generated inside the kernel at the moment of each security-sensitive operation:

KERNEL_THREATINT_TASK_ALLOCVM — NtAllocateVirtualMemory
KERNEL_THREATINT_TASK_PROTECTVM — NtProtectVirtualMemory
KERNEL_THREATINT_TASK_MAPVIEW — NtMapViewOfSection
KERNEL_THREATINT_TASK_QUEUEUSERAPC — APC queuing
KERNEL_THREATINT_TASK_SETTHREADCONTEXT — NtSetContextThread
KERNEL_THREATINT_TASK_WRITEVM — cross-process memory writes

When STACKWALK mode is enabled, the kernel collects the full call stack and attaches it to each event. This is what kills HookChain-class evasion — the syscall still reaches the kernel, the kernel still fires the event, and your shellcode’s address appears in the collected stack.

The new problem: how to make that collected stack look legitimate.

x64 Stack Walking Internals: What EDRs Actually Read

To defeat call-stack collection, you need to understand exactly how it works. I spent a lot of time in Ghidra with ntdll.dll and ntoskrnl.exe to figure this out.

The Death of Frame Pointers on x64

.pdata Section Layout — .pdata section layout — `RUNTIME_FUNCTION` entries describe every covered function. *Source: original article.*

On x86 (32-bit), EBP formed a linked list — every frame stored the previous frame’s base pointer. Spoofing that was trivial.

On x64, Microsoft eliminated RBP as a frame pointer. Instead, every function is described in the .pdata section.

The UNWIND_CODE operations that matter for spoofing:

Operation	What It Does	RSP Delta
`UWOP_PUSH_NONVOL`	Register push	+8
`UWOP_ALLOC_SMALL`	`sub rsp, N*8+8`	+N*8+8
`UWOP_ALLOC_LARGE`	Large allocation	variable
`UWOP_SET_FPREG`	Frame pointer set	0

UNWIND_CODE operations that matter for spoofing. Source: original article.

RtlVirtualUnwind traverses these codes in reverse for each frame, computing the RSP delta and locating the next return address. An attacker who manufactures fake frames must produce addresses that have valid RUNTIME_FUNCTION entries with correct UNWIND_CODEs — or the unwinder aborts and exposes the real stack.

The Critical Branch I Found in Ghidra

.pdata Coverage Map of a Typical DLL — .pdata coverage map of a typical DLL — gaps between functions are unwinder-invisible. *Source: original article.*

Disassembling ntdll!RtlVirtualUnwind (Windows 11 22H2, SHA256 verified), I identified a branch that changes everything:

RtlVirtualUnwind pseudocode (from Ghidra decompilation):

    RuntimeFunction = RtlLookupFunctionEntry(ControlPc, &ImageBase, NULL);
    if (RuntimeFunction == NULL) {
       // No RUNTIME_FUNCTION for this address = "leaf function"
       // Leaf functions never modify RSP
       // Return address is simply at [RSP]
       *EstablisherFrame = ContextRecord->Rsp;
       ContextRecord->Rip = *(PULONG64)ContextRecord->Rsp;
       ContextRecord->Rsp += 8;   // just consume the return address
       return NULL;
    }

When RtlLookupFunctionEntry returns NULL — meaning the address has no RUNTIME_FUNCTION coverage — the unwinder treats it as a leaf function and advances RSP by exactly 8 bytes. It doesn’t crash. It doesn’t abort. It doesn’t flag anything. It just reads the next 8 bytes from RSP as the return address and moves on.

These uncovered “gaps” exist in every DLL. They are the spaces between one function’s end address and the next function’s begin address. This is the foundation of everything that follows.

How Sysmon Collects Stacks

SysmonDrv.sys registers ObRegisterCallbacks for process handle operations (Event ID 10). When the callback fires, it calls RtlWalkFrameChain with flag=1 (user-mode frames only). The collection is synchronous — it happens in the triggering thread at the exact moment of the operation. No race window here.

How ETW-Ti Collects Stacks (Different Mechanism)

ETW-Ti Stack Collection Timeline — ETW-Ti stack collection timeline — the USER_APC is delivered when the thread becomes alertable, not synchronously. *Source: original article.*

ETW-Ti does not collect synchronously. My Ghidra analysis of the ETW-Ti callback path shows something interesting:

The APC is a USER_APC, not a KERNEL_APC. It only delivers when the thread enters an alertable wait. This timing gap is what we exploit later.

The Four Generations of Call-Stack Evasion

Evolution of Call-Stack Evasion — Evolution of call-stack evasion across four generations. *Source: original article.*

Call-Stack Evasion Generations — Call-stack evasion generations summary. *Source: original article.*

Before getting into my own work, here’s the progression of techniques by other researchers that I’m building on top of.

My contributions extend Generation 2 (BYOUD-Gap, Win32u NOP Gap Chain, Ghost Gadget), Generation 3 (ETW-Ti APC window), and Generation 4 (BYOUD-RT, parameter encryption, BYOUD-MF).

BYOUD-Gap: Zero-Modification Stack Spoofing

Every existing call-stack spoofing technique modifies something: return addresses (Gen 2/3), .pdata entries (Gen 4 BYOUD), or synthesizes fake RUNTIME_FUNCTION records. Each one leaves a forensic artifact.

BYOUD-Gap leaves no artifact because it modifies nothing.

The Core Idea

From the Ghidra analysis above: when RtlVirtualUnwind encounters an address with no RUNTIME_FUNCTION coverage, it treats it as a leaf and advances RSP by 8. Every Windows DLL has these uncovered address ranges between functions — the gap between one function’s EndAddress and the next function’s BeginAddress. These gaps are legitimate memory: part of the DLL image, mapped read-only, backed by the PE file.

Using Gaps as Bridge Frames

BYOUD-Gap Hiding Shellcode Frames — BYOUD-Gap hiding three real shellcode frames behind chained 8-byte leaf jumps. *Source: original article.*

The gap address acts as a leaf “function.” When the unwinder encounters it:

No RUNTIME_FUNCTION found → treated as leaf
RSP advances by 8 (just the return address consumed)
Control passes to the address at [RSP] — which is the next frame in your chain

This gives you a free RSP-skip of 8 bytes per gap frame. Chain N gap frames and you consume N*8 bytes of stack, hiding N frames of real execution.

Gap Availability: What I Measured from Real Binaries

ntdll.dll .pdata Gap Classification — ntdll.dll .pdata gap classification — small, medium, large and ghost-function-bearing. *Source: original article.*

I extracted these DLLs from a Windows 10.00 lab host and ran .pdata gap analysis directly against the PE binaries:

DLL	RUNTIME_FUNCTIONs	Gaps Found	Total Gap Bytes	Ghost Functions
ntdll.dll	4,725	3,913	73,745 bytes	1,031 (48,805 B)
win32u.dll	1,244	1,243	9,960 bytes	0

Gap availability across two key DLLs. Source: original article.

ntdll.dll gap breakdown (3,913 total):

Small gaps (8–64 bytes): 2,847 gaps
Medium gaps (65–256 bytes): 892 gaps
Large gaps (257–1,468 bytes): 174 gaps

The Ghost Function Discovery

The most significant finding from this analysis: 1,031 of ntdll’s 3,913 gaps contain real executable code — 48,805 bytes of live, runnable instructions that have no .pdata RUNTIME_FUNCTION entry. I call these ghost functions.

The largest ghost function starts at ntdll+0x000F5004 with 1,468 bytes of code — clearly a functioning routine, not alignment filler. It just isn’t registered in .pdata.

Ghost functions appear to be compiler-generated helper routines, inlined thunks, or __declspec(nothrow) functions where the compiler deliberately omitted exception metadata.

Why ghost functions are the richest BYOUD-Gap positions:

Stable code addresses that don’t shift with alignment changes between builds
Recognizable to reverse engineers as “inside ntdll” — nothing anomalous
The largest ghost function alone provides 183 distinct leaf-frame addresses

Why BYOUD-Gap Goes Undetected

Detection Matrix for BYOUD-Gap — Detection matrix for BYOUD-Gap. *Source: original article.*

BYOUD-Gap vs Prior Techniques — BYOUD-Gap vs prior techniques. *Source: original article.*

BYOUD-Gap modifies nothing: no .pdata writes, no return-address changes, no gadget calls. Gap addresses are in signed DLLs. The call chain passes RtlVirtualUnwind validation because leaf-frame treatment is architecturally correct. Shadow stacks record gap addresses as valid leaf exits. Gap locations predate the attack — they’re not allocated, not written, not synthesized.

ETW-Ti APC Window Attack

The Ghidra analysis confirmed that ETW-Ti stack collection uses USER_APC queuing — not synchronous collection. Between the kernel returning to user-mode (T+3) and your thread entering an alertable state (T+5), your thread is executing normally with no monitoring looking at its stack.

The call stack that gets collected at T+6 is whatever your stack looks like at T+5 — not what it looked like at T+0 when the operation occurred.

The Attack Flow

ETW-Ti APC Window Exploitation — ETW-Ti APC window exploitation flow. *Source: original article.*

T+0: Your code calls NtAllocateVirtualMemory(shellcode_address, ...)
T+1: Syscall instruction fires, kernel sees the operation
T+2: ETW-Ti event queues a USER_APC callback to your thread
T+3: Kernel returns to user-mode; your thread resumes executing
T+4–T+5: Your thread is running in the scheduler — clean execution, no hooks active
T+5: Your thread happens to call NtDelayExecution(..., alertable=TRUE)
T+6: Thread enters alertable wait, queued APCs deliver, RtlWalkFrameChain collects the stack
T+7: ETW event is logged with the collected stack

For more precise control, you can suppress APC delivery entirely during sensitive operations by keeping the thread in a non-alertable state. APCs just pile up in the queue. Then you clean your stack, enter an alertable wait, and all the queued ETW-Ti APCs fire — seeing nothing but a legitimate call chain.

Combining with BYOUD-Gap

For the strongest variant: use BYOUD-Gap to construct a synthetic call chain before entering NtDelayExecution. The APC delivers into a BYOUD-Gap-constructed frame chain where every address is in a signed DLL, every frame passes RtlVirtualUnwind traversal, and no .pdata modification exists.

The ETW-Ti event records the right operation. The collected stack shows kernelbase!BaseThreadInitThunk → [gap frames] → NtAllocateVirtualMemory. Clean.

Limitation: This requires the shellcode to control the call chain when NtDelayExecution is called — trivially achievable for injected code running in a thread you control, harder for shellcode in a hijacked thread with an existing stack.

The CET Wall and BYOUD

How BYOUD Defeats CET — How BYOUD defeats Intel CET shadow stack validation. *Source: original article.*

Intel CET (Control-flow Enforcement Technology) introduces a hardware-maintained, read-only shadow stack. Every CALL pushes the return address to both RSP and the shadow stack. Every RET validates they match. Mismatch → #CP fault.

This breaks everything in Gen 2 and Gen 3. They all manipulate return addresses on the RSP stack, which no longer matches the shadow stack.

BYOUD (klezVirus, Black Hat Europe 2025) solves this by manipulating .pdata unwind metadata instead. CET validates return addresses. CET does not validate .pdata. They are separate systems.

I don’t repeat the full BYOUD derivation — that’s klezVirus’s work. What I add are the extensions below.

BYOUD-RT: Runtime Adaptive Variant

Every published BYOUD variant requires knowing the RSP distance from the thread entry point to the current frame before constructing the fake chain. In practice this means pre-calibration: measure distances in a test environment and hard-code them.

Pre-calibration fails when:

Shellcode is injected into a thread at unknown stack depth
The caller’s stack depth varies at runtime
A reflective loader creates threads with non-standard stack layouts

BYOUD-RT computes the RSP distance at call time using the Thread Environment Block. TEB.StackBase (GS:[0x08]) gives you the highest stack address, and _AddressOfReturnAddress() + 8 gives you the current RSP. The difference is your total consumed stack — the exact distance you need for the BYOUD bridge frame.

I verified that TEB.StackBase is reliable across every common injection method:

Injection Method	TEB.StackBase Accurate?
`NtCreateThreadEx` (fresh thread)	Yes — set by kernel
`NtSetContextThread` (thread hijack)	Yes — thread’s own TEB
`NtQueueUserAPC` (APC injection)	Yes — runs in target thread’s TEB
Reflective DLL Injection	Yes — loads into existing thread
Process Hollowing	Yes — main thread TEB preserved

TEB.StackBase reliability across injection methods. Source: original article.

This makes BYOUD work in any injected context without pre-calibration.

Win32u NOP Gap Chain + The Ghost Gadget

Two original discoveries from direct binary analysis of win32u.dll and ntdll.dll extracted from my lab host.

What win32u.dll Actually Contains

win32u.dll Stub Layout — win32u.dll stub layout — every entry, no exceptions: 24-byte syscall stubs with 8-byte NOP gaps. *Source: original article.*

I extracted win32u.dll and scanned its entire executable section for stack-pivot gadgets (add rsp,N; ret, jmp [rbx], jmp [rax]).

Result: zero gadgets. Every byte in the .text section is one of:

24-byte win32k syscall stubs (1,244 stubs, SSNs 0x1000–0x14DB)
8-byte alignment NOPs between stubs

No function prologues, no matching epilog gadgets. Zero.

What win32u CAN Do: The 1,242 NOP Gap Chain

Although win32u has no stack-pivot gadgets, it has 1,242 perfectly uniform, deterministically whitelisted leaf-frame positions — the 8-byte NOPs between every pair of syscall stubs.

Each NOP gap address is simultaneously:

Whitelisted — inside win32u.dll, explicitly excluded from all current module-of-origin rules
Leaf frame — no RUNTIME_FUNCTION covers it, RSP advances exactly 8 bytes
Stable — same relative position between same-SSN stubs across builds

Win32u NOP Gap Chain:

    RSP → [win32u NOP gap #1]   ← leaf, RSP+=8
          [win32u NOP gap #2]   ← leaf, RSP+=8
          [win32u NOP gap #3]   ← leaf, RSP+=8
          ...×N...
          [ntdll!RtlUserThreadStart+0x21]   ← thread root

Frame summary seen by Elastic detection rule:
    win32u.dll | ... | ntdll.dll
    Categorically WHITELISTED, not evaded probabilistically.
    For a 256-byte hide: chain 32 NOP gaps.
    No arithmetic errors possible — gap size is always exactly 8.

The Ghost Gadget: Uncovered JMP[RBX] in ntdll

My .pdata gap classification revealed that 1,031 of ntdll’s 3,913 gaps contain real executable code with no exception-table coverage. Scanning all ghost function content for gadgets found a JMP [RBX] at ntdll+0x000FC47B, inside an 80-byte ghost function:

ntdll+0x000FC46C  [ghost function — no RUNTIME_FUNCTION entry]
    +0x000  CC CC CC CC CC CC CC CC CC   int3 × 9
    +0x009  48 83 41 28 FE               dec  qword ptr [rcx+28h]
    +0x00E  41 B8 FF 23 00 00            mov  r8d, 23FFh
    +0x014  FF 23                        JMP [RBX]  ← RVA 0x000FC47B
    +0x016  ...
ntdll+0x000FC4BC  [next RUNTIME_FUNCTION begins here]

This ghost gadget has a property no previously documented gadget possesses — it’s a dual-use primitive:

Property	Normal JMP[RBX]	Ghost JMP[RBX] at ntdll+0xFC47B
Inside signed DLL	Yes	Yes
Has RUNTIME_FUNCTION	Yes — unwinder processes it	No — unwinder skips as leaf (RSP+8)
.pdata forensic artifact	Gadget RVA in named function	Nothing — no .pdata entry

Ghost gadget property comparison. Source: original article.

When used as a BYOUD-Gap bridge frame:

Real execution: CALL ntdll+0xFC47B → JMP [RBX] → jumps to target
Shadow stack: records ntdll+0xFC47B as return site (normal CET behavior)
Traditional unwind walk: treats address as leaf, advances RSP+8, never sees the JMP

Execution redirect AND zero-artifact bridge frame in one address.

kernelbase Semantic Ghost Proximity

Extending the analysis to kernelbase.dll reveals a qualitatively different attack surface: ghost functions that sit right next to high-value Win32 API exports.

Binary Survey: ntdll vs kernelbase

Metric	ntdll	kernelbase
RUNTIME_FUNCTIONs	4,725	4,992
.pdata gaps	3,913	3,982
Ghost functions	1,031	432
Ghost function bytes	48,805	51,577
`add rsp,0x28;ret` gadgets	182	442 (2.4x more)
`jmp [rbx]` gadgets	3	12 (4x more)

Binary survey: ntdll vs kernelbase. Source: original article.

kernelbase is the superior gadget source — and frames showing kernelbase internals are less scrutinized by detection rules than ntdll.

The Semantic Proximity Attack

When ETW-Ti fires on NtProtectVirtualMemory, the expected call stack contains the Win32 wrapper directly above the syscall:

Expected legitimate stack:
    [0] ntdll!NtProtectVirtualMemory         ← syscall stub
    [1] kernelbase!VirtualProtect+0x??       ← Win32 wrapper
    [2] [caller code]                        ← whoever called VirtualProtect

A BYOUD-Gap chain using the 238-byte ghost that ends at VirtualProtect‘s entry point produces:

BYOUD-Gap fake stack:
    [0] ntdll!NtProtectVirtualMemory         ← real syscall stub
    [1] kernelbase+0x642xx                   ← ghost frame (leaf, RSP+8)
    [2] ntdll!RtlUserThreadStart+0x21        ← thread root

Frame [1] resolves to "20 bytes before kernelbase!VirtualProtect"
    — indistinguishable from a real VirtualProtect return site.

Ghost Functions Near High-Value Exports

Ghost Region	Size	Distance	Adjacent API
`kernelbase+0x64132`–`0x6421F`	238 bytes	0x0	`VirtualProtect` (ends at VP entry)
`kernelbase+0xfe4e0`–`0xfe51F`	64 bytes	0x90	`CreateRemoteThread`
`kernelbase+0x41f6d`–`0x41f8F`	35 bytes	0x60	`VirtualAllocEx`
`kernelbase+0x5a720`–`0x5a78F`	112 bytes	0x1E0	`VirtualAlloc`
`kernelbase+0xf9ad4`–`0xf9b43`	112 bytes	0x364	`WriteProcessMemory`

Ghost functions near high-value exports. Source: original article.

The VirtualProtect ghost is the most forensically convincing BYOUD-Gap position across all analyzed binaries: 238 usable addresses, inside a signed DLL, adjacent to an API that legitimately appears in injection call stacks.

A second ghost gadget (JMP [RBX] at kernelbase+0xC4EA2) provides a second dual-use primitive.

Multi-DLL Ghost Chain

The strongest BYOUD-Gap chain draws from both DLLs:

Optimal multi-DLL BYOUD-Gap chain:
    [0] NtProtectVirtualMemory         ← real syscall stub
    [1] kernelbase+0x6420A             ← ghost in VirtualProtect's shadow
    [2] kernelbase+0x64200             ← second ghost position (staggered)
    [3] ntdll+0x000F5040               ← ntdll ghost function (1,468B)
    [4] ntdll!RtlUserThreadStart+0x21  ← thread root

What an analyst sees:
    NtProtectVirtualMemory ← VirtualProtect-area ← ntdll internals ← thread start
    Indistinguishable from a real VirtualProtect call chain.

BYOUD-MF: Machine Frame RSP Teleport

All previous BYOUD-Gap variants advance RSP in small 8-byte increments. BYOUD-MF is fundamentally different — it teleports RSP to an arbitrary value in a single frame.

What I Found in RtlVirtualUnwind

UWOP_PUSH_MACHFRAME Handler in Ghidra — UWOP_PUSH_MACHFRAME handler from Ghidra decompile — the RSP teleport primitive. *Source: original article.*

Decompiling RtlVirtualUnwind reveals a handler for UNWIND_CODE opcode 10 (UWOP_PUSH_MACHFRAME) that nobody had exploited before:

The handler reads the current RSP value, adds an offset from the unwind info, and updates the context RSP to that computed value. This is the RSP teleportation primitive.

The Four KiUser* RUNTIME_FUNCTIONs

Binary scan of ntdll’s .pdata (4,736 entries) found exactly 4 functions with UWOP_PUSH_MACHFRAME:

Function	RVA Range	Prolog Offset
`KiUserApcDispatcher`	0xa3f20–0xa3f95	0x00
`KiUserCallbackDispatcher`	0xa4030–0xa406b	0x00
`KiUserExceptionDispatcher`	0xa4080–0xa40dc	0x00
Unnamed dispatcher	0xa4880–0xa4a3e	0x00

The four KiUser* RUNTIME_FUNCTIONs with UWOP_PUSH_MACHFRAME. Source: original article.

prolog_offset=0x00 means any PC within these functions triggers the handler. No need to target a specific byte.

Fake Machine Frame Structure

FAKE_MACHFRAME Layout — FAKE_MACHFRAME layout (op_info=0, no error code). *Source: original article.*

Place this 40-byte structure on the stack:

FAKE_MACHFRAME (op_info=0, no error code):
    Offset 0:   RIP        (8 bytes)
    Offset 8:   CS         (8 bytes)
    Offset 16:  RFLAGS     (8 bytes)
    Offset 24:  RSP        (8 bytes)
    Offset 32:  SS         (8 bytes)

When RtlVirtualUnwind processes this frame with UWOP_PUSH_MACHFRAME, it reads the RSP field (offset 24) and sets ContextRecord->Rsp to that value.

Comparison to Everything Else

Technique	RSP Change per Frame	.pdata Write	Gadget	CET	Forensic Artifact
SilentMoonwalk Desync	RSP += N (gadget)	No	Yes	No	Gadget offsets
BYOUD (klezVirus)	Delta from UNWIND_INFO	Yes	No	Yes	Modified .pdata
BYOUD-Gap	RSP += 8	No	No	Yes	Address in gap
BYOUD-MF	RSP = any value	No	No	Yes	40-byte struct

Comparison of RSP manipulation techniques. Source: original article.

BYOUD-MF is the only technique that achieves arbitrary RSP assignment in a single frame without modifying .pdata and without a gadget.

Parameter Encryption in the BYOUD Context

In Part I, I introduced parameter encryption: encrypting syscall parameters before the call and decrypting them at the syscall instruction via a hardware-breakpoint VEH handler.

Here I extend this into the BYOUD context. The combination addresses two orthogonal detection surfaces:

BYOUD-Gap / BYOUD-RT / Win32u chain: defeats call-stack inspection (who called)
Parameter encryption: defeats parameter inspection (what was called with)

How It Works

Parameter Encryption Flow — Parameter encryption flow with hardware-breakpoint VEH handler. *Source: original article.*

The challenge: parameters can’t stay encrypted all the way to the kernel. The kernel must receive real values. So you decrypt at the last possible moment — inside a VEH handler that fires on a hardware breakpoint at the syscall instruction.

Parameter Encryption Flow:

1. Before syscall:
   rax = encrypted(NtAllocateVirtualMemory opcode)
   rcx = encrypted(BaseAddress)
   rdx = encrypted(RegionSize)
   r8  = encrypted(AllocType)
   r9  = encrypted(Protect)

2. Hardware breakpoint fires at `syscall` instruction
3. VEH handler decrypts all registers
4. Control returns to `syscall`
5. Kernel receives plaintext parameters

Where Parameter Encryption Actually Helps

ETW-Ti records parameters after the kernel has them — so the kernel-side event has the decrypted values. But parameter encryption helps against:

User-mode hook intercept — any remaining hook sees encrypted parameters
Memory scanning — parameters stored encrypted at rest; PAGE_EXECUTE_READWRITE = 0x40 never appears in memory
Call-trace parameter logging — EDR rules logging parameters at hook intercept get ciphertext

Combined with BYOUD: the stack is spoofed (who called) and parameters are opaque at rest (what was prepared).

The LACUNA Chain: A Named Zero-Detection Technique

I named the complete technique LACUNA Chain — after the Latin lacuna (pl. lacunae): a gap, void, or absent part. Every frame in the chain inhabits a lacuna: an executable code region that exists in memory but has no .pdata coverage. Lacunae are invisible to the unwind machinery, leave no forensic artifact, and — when chosen near high-value exports — are semantically indistinguishable from legitimate call-site return addresses.

wow64.dll: A Fourth Semantic Layer

Binary analysis of wow64.dll adds a fourth DLL to the semantic stack. Wow64PrepareForException has a 91-byte ghost ending exactly at its entry point. Ghidra confirms the function checks CS == 0x23 (x86 WoW64 selector) before copying a 32-bit exception context. A fake frame in this ghost region appears to an EDR as code executing immediately before WoW64 exception preparation — completely routine in any WoW64-enabled process.

The Six-Layer LACUNA Chain

LACUNA Chain Six-Layer Architecture — LACUNA Chain six-layer architecture. *Source: original article.*

Layer	Frame Address	DLL	Type	Size	Semantic Cover
0	`RtlVirtualUnwind+0x21`	ntdll	Real syscall stub	N/A	Exception context copy
1	`+0x177C0`	wow64	Ghost function	91 bytes	WoW64 exception prep
2	`+0x64180`	kernelbase	Ghost function	238 bytes	Before VirtualProtect
3	`+0x50F80`	ntdll	Ghost function	1,468 bytes	Near RtlCreateUserThread
4	`+0x(NOP)`	win32u	NOP gap	8 bytes	Win32k syscall boundary
5	`RtlUserThreadStart+0x21`	ntdll	Real function	N/A	Thread start routine

The six-layer LACUNA Chain. Source: original article.

What an EDR Analyst Sees

LACUNA Chain Layers. *Source: original article.*

Walking the LACUNA Chain produces:

Frame 0:  ntdll!KiUserExceptionDispatcher+4     [exception context]
Frame 1:  wow64+0x177C0                         [WoW64 exception prep]
Frame 2:  kernelbase+0x64180                    [before VirtualProtect]
Frame 3:  ntdll+0x50F80                         [near RtlCreateUserThread]
Frame 4:  win32u+0x(NOP gap)                    [excluded by whitelist]
Frame 5:  ntdll!RtlUserThreadStart+0x21         [thread root]

A defender reading this sees:
    "A thread that went through VirtualProtect from thread-creation code,
     passed through WoW64 exception preparation, and originated at the
     standard thread start."
     All frames in signed Microsoft binaries.
     No .pdata modification.
     Syntactically valid, semantically plausible, forensically clean.

Detection Surface

LACUNA Chain — All Call-Stack Detection Layers Defeated — LACUNA Chain — all call-stack detection layers defeated. *Source: original article.*

Detection Layer	Status	Why
Module-of-origin check	EVADED	All frames in ntdll / kernelbase / wow64 / win32u
Unwind walk correctness	EVADED	All lacuna frames are leaf → valid RSP+8
.pdata forensic scan	EVADED	Zero modification; gaps are pre-existing
CET shadow stack	EVADED	Pure leaf chain; shadow stack not consulted
Semantic frame analysis	EVADED	WoW64 exception + VirtualProtect adjacency
Win32u rule exemption	EVADED	Layer 4 explicitly excluded by all rules
ETW-Ti STACKWALK	EVADED	APC window attack controls snapshot timing
Parameter inspection	EVADED	HW breakpoint VEH decryption
Kernel callbacks	PARTIAL	Handle operations still fire ObRegCallbacks

LACUNA Chain detection surface. Source: original article.

The LACUNA Chain defeats all EDR layers of call-stack-based detection. Kernel callbacks remain the only surviving telemetry source — and they provide behavioral signals, not stack-based signatures, forcing defenders into a much harder tuning tradeoff with higher false-positive rates.

Lab Measurement Results

Lab Results — Technique vs Detection Layer — Lab results matrix — technique vs detection layer. *Source: original article.*

I tested each layer in isolation and the composite technique against controlled detection configurations. Lab: Windows 11 22H2, CET enabled per-process, Sysmon v15 full config, ETW-Ti consumer with STACKWALK flag.

Technique	Sysmon Event 10	ETW-Ti STACKWALK	Shadow/User Compare	Kernel Callbacks	Net
Direct syscall	DETECTED	DETECTED	DETECTED	DETECTED	Caught
HookChain (Part I only)	DETECTED	DETECTED	DETECTED	DETECTED	Caught
Naive frame push	Evaded	DETECTED	DETECTED	DETECTED	Caught
SilentMoonwalk Desync	Evaded	Evaded	DETECTED	DETECTED	Partial
VEH + HW BP	Evaded	Evaded	DETECTED	DETECTED	Partial
BYOUD (klezVirus)	Evaded	Evaded	Partial	DETECTED	Partial
BYOUD-Gap (ours)	Evaded	Evaded	Evaded	DETECTED	Near-full
BYOUD-MF (ours)	Evaded	Evaded	Partial	DETECTED	Near-full
Full LACUNA Chain	Evaded	Evaded	Evaded	DETECTED	Near-full

Lab measurement results. Source: original article.

BYOUD-Gap leaves shadow/user stacks consistent — no RSP manipulation means no frame count divergence. The only remaining signal is behavioral kernel callback correlation, which requires behavioral rules with significantly higher false-positive rates than call-stack rules.

Detection Engineering: What Catches What

I’m putting this section in because I think offense and defense should live in the same paper. If you’re a defender, here’s what you need to know.

Rules That Are Dead

Stop investing in these — they’ve been defeated since Gen 2:

# DEAD: module-of-origin first-frame only
not call_trace startswith "ntdll.dll"

# DEAD: whitelisted modules
not call_trace startswith ("ntdll.dll", "win32u.dll", "wow64cpu.dll")

The Win32u NOP Gap Chain means the “win32u.dll whitelisted” rule is weaponized against the defender.

What Actually Works

Detection Architecture — Post-BYOUD-Gap — Detection architecture post-BYOUD-Gap. *Source: original article.*

Detection Architecture — Detection architecture. *Source: original article.*

New detection specific to BYOUD-Gap: A “gap” address is one inside a DLL’s mapped range but between two RUNTIME_FUNCTION entries. Legitimate programs almost never have gap addresses in their call chains. An address in a .pdata gap is highly anomalous in a call trace, even though it’s inside a signed DLL. No public EDR implements this yet.

Effective detection layers moving forward:

Enumerate .pdata at module load time
Flag call-trace frames in .pdata gaps
Correlate parameter sequences across syscalls
Monitor memory allocation patterns before code execution
Baseline behavioral thread signatures per process class

Challenges for EDR Solutions

LACUNA Chain EDR Bypass Results — LACUNA Chain EDR bypass results — summary across tested products. *Source: original article.*

Modern EDRs defend in layers: kernel callbacks (ObRegisterCallbacks) intercept handle acquisition, heuristic engines flag dangerous memory permissions, behavioral correlators match syscall sequences, and execution-origin rules kill code running from anonymous pages. Against Bitdefender alone, we triggered five distinct detection layers before achieving full bypass — handle access rights, RWX page allocation, anonymous-memory execution, the AllocVM + WriteVM + ProtectVM + QueueAPC sequence correlator, and payload behavior in the target process.

Not one of those five detections was the call stack.

The LACUNA Chain — the ghost-frame spoofing, the BYOUD-MF teleport, the win32u NOP gap, the ETW-Ti APC window — stayed clean across every test, against every product. The call-stack layer was never the reason we got caught. Every detection came from a different surface: how the handle was opened, how memory was allocated, what syscall sequence preceded the APC, or what the shellcode did after landing.

HookChain exposed that 94% of EDR solutions do not hook the subsystem layer above NTDLL. LACUNA Chain exploits a deeper blind spot: .pdata lacunae — executable regions inside signed DLLs with no exception-handling metadata. These ghost regions are invisible to RtlLookupFunctionEntry, absent from any hook table, and structurally indistinguishable from legitimate leaf functions during stack unwinding.

This gap cannot be closed by adding more hooks. Every layer in the chain — wow64.dll, kernelbase.dll, win32u.dll, and .pdata gap regions — sits in address ranges that are structurally invisible to current call-stack inspection. Closing it requires enumerating .pdata gaps at runtime and flagging any call-trace frame that lands in a gap. No production EDR does this today.

Real-World Results

The LACUNA Chain injector was tested against enterprise EDR solutions in a controlled lab environment. Both targets were running current signature and behavioral engine versions at time of testing.

Bitdefender — full bypass, shellcode executed without detection.

Bitdefender Bypass Screenshot — Bitdefender bypass — shellcode executed without detection. *Source: original article.*

Kaspersky Endpoint Security — full bypass, shellcode executed without detection.

Kaspersky Bypass Screenshot — Kaspersky Endpoint Security bypass — shellcode executed without detection. *Source: original article.*

The proof-of-concept implementation is available at github.com/MazX0p/LACUNA-Chain.

Key Takeaways

Kernel-mode telemetry is the new battleground. Userland-hook bypass alone no longer hides syscalls — ETW-Ti and kernel callbacks see the operation regardless.
.pdata gaps are architecturally invisible. RtlVirtualUnwind treats any address without a RUNTIME_FUNCTION entry as a leaf and advances RSP by exactly 8 bytes — the legal opening that BYOUD-Gap exploits.
Ghost functions are high-value primitives. 1,031 ghost functions in ntdll (48,805 bytes) and 432 in kernelbase provide stable, signed-DLL leaf-frame positions; some sit immediately before VirtualProtect, VirtualAlloc, and WriteProcessMemory.
win32u.dll is categorically whitelisted. 1,242 NOP gaps between syscall stubs are excluded from every module-of-origin rule and deliver perfect 8-byte RSP transitions.
ETW-Ti collects via USER_APC. The thread’s alertable-state transitions decouple “when the event fired” from “what the stack looks like” — a window the attacker controls.
BYOUD-MF teleports RSP in one frame. Four KiUser* dispatchers carry UWOP_PUSH_MACHFRAME; a 40-byte fake machine frame sets ContextRecord->Rsp to any value without modifying .pdata or using a gadget.
BYOUD-RT removes pre-calibration. Reading TEB.StackBase at call time makes the technique work in arbitrary injection contexts.
The full chain bypassed Bitdefender and Kaspersky. Every catch was on a non-stack signal — handle rights, RWX allocations, syscall sequence correlation — never the call stack.

Defensive Recommendations

Enumerate .pdata at module-load time and flag call-trace frames that land in gaps. This is the single new signal that breaks BYOUD-Gap. Implement it in the EDR sensor; no production product ships this today.
Stop trusting whitelists keyed on module-of-origin. The win32u NOP gap chain weaponizes the win32u.dll / wow64cpu.dll exemptions used by Elastic-style rules. Strip those exemptions or pair them with a .pdata gap check.
Correlate syscall sequences, not single calls. Bitdefender’s AllocVM + WriteVM + ProtectVM + QueueAPC chain detector caught the operation even when the stacks were clean — behavioral correlators are the surviving layer.
Measure ETW-Ti APC queue depth. An unusual count of queued ETW-Ti APCs before a single alertable wait (>3) is an anomaly indicator that no current EDR publishes.
Audit handle-access patterns via ObRegisterCallbacks. Kernel callbacks are the only survivor against LACUNA Chain; tune them on requested handle rights, not on the apparent caller.
Baseline thread call-stack signatures per process class. Even a chain that passes module-of-origin and unwind-walk checks will diverge statistically from real call patterns — learn the distribution and alert on outliers.
Hunt for Win32k shadow-SSDT paths reaching sensitive operations. If ObRegisterCallbacks does not fire on certain win32k paths, that is where the next bypass will live.
Treat the call stack as one signal, not the anchor. Until production sensors enumerate .pdata gaps at runtime, stack-based rules are unreliable and must be combined with parameter, sequence, and origin telemetry.

Conclusion: The Arms Race Moved Deeper

Part I showed that userland hook bypass defeats a huge fraction of EDR deployments. That gap was real in 2024.

Part II shows what the next layer looks like.

BYOUD-Gap achieves call-stack spoofing with zero .pdata modification. The ETW-Ti APC window lets you control when the stack snapshot happens. Win32u’s 1,242 NOP gaps provide categorically whitelisted leaf frames. Ghost functions in ntdll and kernelbase provide semantically convincing cover. BYOUD-MF enables arbitrary RSP teleport in a single frame. BYOUD-RT makes everything work without pre-calibration. Parameter encryption makes the arguments opaque at rest.

Put them all together and you get the LACUNA Chain — syntactically valid, semantically plausible, forensically clean. The only remaining reliable detection is behavioral kernel callback correlation.

Three Open Problems

These are what the next researcher should focus on:

1. Gap-address detection at scale. Flagging call-trace frames in .pdata uncovered ranges is theoretically sound but nobody has built a production-quality implementation. It requires enumerating .pdata gaps for all loaded DLLs at runtime and cross-referencing every call-trace address. Feasible, but not trivial.

2. ETW-Ti APC queue depth monitoring. If the EDR can measure how many ETW-Ti APCs were queued before the alertable-wait that delivered them, an unusual count (>3 before a single wait) is anomalous. No current EDR publishes this signal.

3. Win32k shadow SSDT surface. Which sensitive operations are reachable via win32k syscalls, and whether ObRegisterCallbacks fires on those paths, remains unmeasured publicly. That’s where the next bypass may live.

The arms race didn’t end with HookChain. It moved to a deeper layer each time a layer was closed. The call stack is no longer trustworthy. The .pdata section is no longer trustworthy. The only anchor that currently holds is behavioral correlation — and that’s where the next attack will focus.

Original text: “LACUNA Chain: Ghost Frames — defeats all EDR layers of call-stack-based detection” by Mohamed Alzhrani (@0xmaz) at 0xmaz.me. Licensed CC BY 4.0.