
Executive Summary
Praetorian’s Adam Crosser describes Centurion, a prototype virtualized loader that treats the host Windows process as “bare metal” and runs nearly everything — payload, PE loader, TLS stack, HTTP client — as bytecode on top of a custom x86-64-inspired ISA. The native interpreter stub clocks in at roughly 18 KB; the rest is interpreted code linked together by a thin universal thunk that routes IAT calls into Win32. Compared with traditional packers, the malicious instructions never materialize as recognizable native code in memory, which is precisely what modern EDR products are timing their scans against.
The post is as much a story about LLM-assisted capability development as it is about virtualization. The team had scoped a virtualized loader at three to six months of dedicated work. Instead, a fully functioning prototype — complete with mbedTLS, ECDHE key exchange and a TLS bind shell — shipped in about a week of background effort. The leverage came from picking tasks that match the model’s training distribution (compilers, interpreters, parsers, embedded crypto), reusing GCC and LLVM compiler test suites as a feedback loop, and splitting design from implementation. The architecture decisions described below — shared guest/host memory, an x86-64-shaped ISA, a software crypto coprocessor pattern lifted from MBEDTLS_ALT — are useful design references for anyone thinking about where virtualized loaders are headed next.
Why Virtualized Loaders Now
The evasion landscape has shifted. Static signatures and a sleep call used to be enough to push a custom packer past most EDRs. Today’s products read memory, watch runtime behavior and reliably catch the canonical injection patterns. The classic pack-crypt-inject playbook has a short shelf life because the original malicious bytes still have to exist somewhere in memory as native code — and EDRs have gotten good at timing scans for exactly that window.
Payload virtualization sidesteps that window entirely. If the “real” instructions are bytecode for a custom interpreter, they never appear as recognizable native x86-64. There is no obvious unpacking moment to scan for, and there is no canonical pattern to match against.
Prior Work in the Space
Several public projects have already mapped out the design space for offensive virtualization, and Crosser positions Centurion against three of them in particular.
- RISC-Y Business (mrexodia & oopsmishap, December 2023). Rather than designing a custom ISA, they targeted RISC-V — an architecture LLVM already supports — and inherited a real compiler toolchain for free. The pragmatic move was shared guest/host memory: pointers in the VM are valid in the host process and vice versa, which collapses the syscall bridge to a tiny surface (essentially “get the PEB” and “call an arbitrary host function”). The interpreter itself is compact, and the whole project demonstrated that a VM-based loader doesn’t need to be an enormous engineering investment if you pick the right building blocks.
- Fox-IT’s “Red Teaming in the Age of EDR” (September 2024) is the best public writeup of the operational case for VM-based loaders. It walks through Fox-IT’s own evolution — packers, polymorphic engines, full virtualization — and frames the argument in terms of the defender’s constraints: CPU budget, false-positive tolerance, scan timing. The core observation: virtualization fixes the fundamental weakness of packing, which is that the original code must rematerialize in memory at some point.
- Firebeam is 5pider’s VM embedded inside the Kaine agent, shipped as part of the commercial Havoc Professional framework. It is also RISC-V based, but the use case is narrower: post-exploitation plugin execution. Because the interpreter parses and executes everything inside the agent’s own memory, there are no RWX allocations and no module-stomping for the plugin layer, and Win32 calls can be routed through the agent’s existing evasion profile.
The shared trajectory across these projects is convergent: compile C (or another language) through LLVM, execute as bytecode in a small interpreter, and bridge to native through a thin syscall layer. They differ on scope and intent — and that is where Centurion stakes out new ground.
The Working Paradigm: “Manager of Agents”
Crosser is candid that the reason this project finally happened in 2026 is not because the techniques are new — they are well-documented and the theory is mature — but because the cost calculus changed. Building a virtualized loader requires niche knowledge that does not transfer cleanly to other work: custom ISA design, bytecode compilation, VM dispatcher internals, ABI marshaling across the interpretation boundary. In a timeboxed sprint, three to six months on one narrow capability loses every priority battle against platform improvements and client deliverables.
What LLMs change, in his framing, is letting a single engineer act as a “manager of agents”: directing several Claude (or equivalent) instances across different terminals on disparate sub-problems while attending to other deliverables. This is not pair programming. It is closer to delegating to a junior engineer who keeps working while you context-switch. The architecture still has to come from the engineer; what gets compressed is the implementation grind, not the design.
The concrete outcome: a working prototype with TLS and a bind shell in roughly one week, built as a background task while shipping features for the team’s Guard platform. The point Crosser keeps returning to — and it is worth dwelling on — is that the leverage is not faster coding. It is that “too expensive to build” projects become feasible at all.
Picking Tasks That Match the Training Distribution
One of the most underrated skills in agentic development is picking work the model has actually seen a lot of. Nicholas Carlini’s C-compiler-in-100k-Rust-lines-via-16-parallel-Claudes experiment at Anthropic works because compilers are one of the most thoroughly documented domains in computer science: the task and the training data are a near-perfect match.
A virtualized loader hits the same sweet spot. LLVM IR manipulation, bytecode interpreters, instruction dispatch loops, PE parsing, IAT/relocation fixups — all of these are well-trodden systems-programming territory. By contrast, ask a stock Claude Code instance to implement process injection and it will reach straight for WriteProcessMemory and CreateRemoteThread, because that is what the training data is saturated with — and that is exactly what every EDR detects out of the box. Pick the battles where the training distribution is on your side.
Two Designs in Parallel: WasmForge and Centurion
Rather than betting on a single design, Praetorian pursued two approaches simultaneously. Michael Weber built WasmForge, which uses WebAssembly as its bytecode format and can compile existing Go tooling without source modification. Crosser built Centurion, a custom ISA with its own transpiler pipeline, virtualizing the entire execution environment from the ground up in freestanding C.
WasmForge ultimately shipped for production red team use because of its operational ergonomics — pointing it at something like Sliver and getting an evasion-ready binary out the other end, with zero source changes, is hard to beat day-to-day. Centurion is the more interesting research artifact: it asks how much of the supporting infrastructure — not just the payload, but the PE loader, the TLS stack, the libc — you can push behind the interpretation layer. The result feels less like a traditional code virtualizer and more like an embedded RTOS running inside userspace.
Feedback Loops Beat Prompts
The other decisive choice was the testing harness. No human would attempt to write a brand-new compiler and runtime without a feedback loop, and agents are no different. The most impactful decision in the project, by Crosser’s account, was reusing the existing GCC and LLVM compiler test suites to validate the Centurion transpiler and runtime.
Those suites exist to exercise thousands of edge cases in code generation. They encode decades of accumulated knowledge about what breaks in compilers and runtimes. Pointed at Centurion, they gave the agent a tight automatic feedback loop: change the transpiler or the interpreter, run the suite, see the regression. When a register-allocation bug appeared, the failing test case told the agent exactly which code pattern triggered it, and it could iterate on a fix autonomously.
The generalizable lesson: if you want an agent to build something nontrivial, invest in the feedback loop first. Prompt quality matters less than people think. Test infrastructure matters more than almost anything else.
Design Decisions
Shared Memory vs. Isolated Memory
WebAssembly takes the isolated path: the guest gets a contiguous linear memory buffer starting at offset zero, and every guest pointer is an offset into that buffer. That model is wonderful for sandboxing and terrible for offensive tooling. Malware is tightly coupled to its host OS — it talks to COM servers, makes direct syscalls, patches the PEB, performs token manipulation. Every one of those crossings shovels real host pointers across the VM boundary, which forces translation, mirror tables and a long tail of edge cases where host APIs write pointers back into guest buffers.
Centurion uses shared memory instead, mirroring RISC-Y Business. Pointers in the Centurion execution context are valid host pointers and vice versa. Reads and writes in the interpreter degrade to a memcpy. Host calls do not need argument translation. A calloc issued through an ECALL returns a pointer the guest can dereference directly. The cost is any sandboxing guarantee — but sandboxing was never the goal. Evasion was, and shared memory dramatically reduces the engineering required to get there.
Why an x86-64-Inspired ISA Instead of RISC-V
RISC-V was the obvious target given the precedent set by RISC-Y Business and Firebeam — simple, well-documented, already supported by LLVM. Centurion goes the other way: a simplified x86-64-shaped ISA with a fixed 18-byte instruction width. The register model, calling conventions and overall feel deliberately mirror x86-64; the fixed instruction width trades code density for simpler parsing and dispatch in the interpreter.
Binary-to-Binary Translation as a Future Lever
The reason for staying close to x86-64 is operational. A meaningful portion of offensive tooling exists only as compiled artifacts: Beacon Object Files, prebuilt shellcode, third-party tools shipped as binaries. Going back to source and rebuilding through a full LLVM pipeline isn’t always realistic.
By keeping Centurion architecturally close to x86-64, the door is open to binary-to-binary translation — lifting an existing compiled x86-64 binary instruction-by-instruction into Centurion’s ISA. The register model is familiar enough that the mapping is relatively straightforward, and the fixed instruction width keeps the output predictable and easy to relocate. A production-grade translator is future work, but the architectural commitment was made with this in mind.
Bring Your Own Execution Environment (BYOEE)
The framing that gives Centurion its character — and the title of the original post — is what Crosser calls “Bring Your Own Execution Environment.” The host userland process is treated as nothing more than a minimal shell that exposes raw hardware: memory, syscalls and network sockets. Everything else — the loader, the libc, the crypto, the HTTP stack — lives inside the VM.
To make this practical, Centurion leans on libraries designed for bare-metal microcontroller targets. Two named in the post are FreeRTOS coreHTTP for HTTP requests and mbedTLS for TLS and encryption. Because both are built to run on microcontrollers without an operating system, they compile cleanly as freestanding C, without needing a C++ runtime or messy toolchain plumbing. They “just work” inside the VM.
This is a deliberate departure from WasmForge’s philosophy. WasmForge generalizes — aim it at an existing Go project, change nothing, get a binary. Centurion is built for a bespoke C C2 framework purpose-engineered for the runtime: an implant that talks raw sockets (think afd.sys), runs its own TLS stack entirely inside the VM, makes HTTP requests through coreHTTP compiled to bytecode, and crosses into native code only through a small handful of ECALLs. The userland process becomes a thin shell with a tiny native footprint.
This trade-off has real limits — networks that block raw sockets or do inline TLS inspection break this model — but where it is viable, the surface area visible to defenders shrinks substantially. And the design generalizes nicely in one important direction: BOFs. Because Centurion has a full LLVM toolchain underneath it, off-the-shelf BOF source code (recon, credential access, lateral movement) compiles to Centurion bytecode through the same pipeline as everything else — no special porting work. The C2 framework is bespoke; the post-exploitation tooling stays portable.
Crosser also sketches the BYOEE concept extending beyond network and crypto — for example, compiling an RTOS FAT filesystem library to Centurion bytecode and using it to drive an in-memory virtual filesystem for the implant. With a few header tweaks to break standard forensic parsers, the result starts to look conceptually similar to what Uroburos achieved with its hidden VFS — only realized as interpreted bytecode rather than native code. The more functionality you pull behind the interpretation boundary, the less native surface there is for defenders to analyze.
Minimizing the Native Footprint
The native interpreter stub — the only code that exists as real x86-64 instructions — lands at roughly 18 KB. Everything else is bytecode. Crosser believes there is room to shrink this further by introducing a minimal RISC-style core interpreter that implements the more complex CISC instructions as microcode inside the VM itself, pushing even more logic behind the interpretation layer.
The system uses a two-tier execution model. The PE loader that maps and links compiled binaries is itself written as freestanding C, compiled through the same transpiler and executed as bytecode. It acts as the linker-loader for everything else inside the VM — parsing PE headers, mapping sections, populating the IAT. This keeps the PE-loading logic, one of the most signature-prone components in a traditional reflective loader, entirely out of native code.
The native stub is the component most exposed to static signatures, but the bytecode layer has built-in defenses. Opcodes can be randomized at build time, producing a unique ISA mapping per binary. Runtime decryption of the opcode table with a per-build key adds another layer. The bytecode on disk therefore does not correspond to any fixed instruction encoding.
The bridge between virtualized and native execution is deliberately spartan. The Centurion register file maps directly to the x86-64 calling convention registers (RCX, RDX, R8, R9, and stack-based arguments). When a CALL targets an address outside the bytecode region — i.e., an IAT entry pointing into a real DLL — the interpreter routes it through a universal thunk that marshals the virtual register state into a native call. Win32 APIs, socket operations, heap functions: all of them cross through this one mechanism. There is no argument translation and no pointer fixup, because the shared-memory model means a pointer allocated by the VM is already a valid host pointer. The thunk is pure register marshaling.
A Software Crypto Coprocessor
The first real performance cliff appeared in TLS. Symmetric crypto under interpretation was fine — AES-GCM is cheap enough that even an instruction-by-instruction interpreter completes bulk operations in negligible time. The handshake was a different animal. Key exchange and certificate verification lean heavily on bignum arithmetic, and the per-operation cost there is already steep before you stack interpretation overhead on top. A handshake that runs in milliseconds natively was taking minutes, with the time entirely dominated by the inner-loop math kernels being interpreted instruction by instruction.
The fix came from mbedTLS itself. Because microcontrollers often lack the horsepower for heavy crypto math, mbedTLS exposes hooks for offloading expensive operations onto a hardware coprocessor or crypto accelerator. The MBEDTLS_ALT configuration pattern exists for exactly this case.
Centurion applies the same pattern in software. mbedTLS is split into two tiers:
- Bytecode tier (interpreted): the higher-level bignum API (
bignum.c,bignum_mod.c,bignum_mod_raw.c) and the rest of the TLS state machine — control flow, allocation, conditional logic, certificate parsing, cipher suite negotiation, record encryption. Not the hot path, so interpretation overhead is acceptable. - Native tier (ECALL accelerated): the inner-loop math kernels in
bignum_core.c. The bytecode orchestrates the handshake and dispatches the math out to native through an ECALL.
Result: handshakes complete in seconds instead of minutes, the rest of TLS stays virtualized, and the architecture mirrors how mbedTLS was designed to be used on a real embedded target — only here the “hardware accelerator” happens to be the host CPU via a single ECALL. Crosser notes that the bignum acceleration is an optional compile-time flag — not every workload needs TLS, and for payloads that don’t, the interpreter omits those ECALLs to shrink the native footprint further.
The Week-One Benchmark: A TLS Bind Shell Inside the VM
By the end of the first week of development, Centurion ran a TLS bind shell entirely inside the VM. The payload, compiled to Centurion bytecode, listens on a port, performs a TLS 1.2 handshake using mbedTLS with ECDHE key exchange accelerated through the ECALL coprocessor, and provides a remote shell across the encrypted connection.
Prototype applications also demonstrate coreHTTP running alongside the mbedTLS module for full HTTPS from inside the VM. Because both stacks are compiled to bytecode and execute inside the interpreter, the application never touches OS-provided HTTP or encryption APIs — no WinHTTP, no SChannel — that endpoint products commonly hook and monitor. Networking is handled through raw sockets; encryption and HTTP framing are virtualized. As noted earlier, this breaks down in environments with inline HTTP proxies or strict TLS inspection — though Crosser points out that many financial and healthcare environments exclude endpoint traffic from TLS inspection anyway.
The choice of a TLS bind shell as the first week’s benchmark is deliberate: it exercises the full stack end-to-end. The PE loader has to parse and map the payload, resolve imports through the IAT and apply relocations. The payload has to perform socket operations via the thunk layer. The handshake exercises the entire crypto stack — key exchange, certificate parsing, symmetric cipher setup. The shell itself proves command execution crosses the VM boundary cleanly. If all of that works as interpreted bytecode inside a custom VM, the architecture is sound and everything else — reverse shells, staged loaders, full C2 integration — is payload work on top of a proven foundation.
Open Questions and Future Work
Crosser closes with several engineering threads worth following.
- Shrink the native stub further, primarily by introducing a RISC-style core interpreter with CISC operations re-implemented as VM microcode.
- Auto-port existing open-source C implants. LLMs make it realistic to take an implant from an open-source C2 written in C, port it to the Centurion runtime largely automatically, and then do targeted evasion customization on top — the kind of well-documented systems work models handle well.
- Native-to-VM callbacks. A real gap today. APIs like
EnumWindowsexpect a function pointer that is invoked from native code; the current design only handles VM-to-native calls through the thunk layer. The likely solution is an ECALL interface for registering callback handlers so the interpreter can receive and dispatch inbound native calls. - Replace the PE format with a bespoke executable format for the bytecode binaries, to break standard reverse-engineering tools that auto-parse PE.
- Binary-to-binary translation from compiled x86-64 to Centurion ISA, leveraging the architectural similarity that was built in deliberately at the ISA design stage.
Key Takeaways
- Native footprint of ~18 KB. Centurion’s native interpreter stub is small; the PE loader, libc, HTTP client and TLS stack all run as interpreted bytecode rather than native code, sharply reducing the surface visible to static signatures and memory scans.
- Shared guest/host memory. Pointers are identical on both sides of the VM boundary, so host calls require no argument translation and the universal thunk is pure register marshaling. The trade-off is no sandbox isolation — intentional, since evasion (not isolation) is the goal.
- x86-64-shaped ISA with a fixed 18-byte instruction width, chosen specifically to enable future binary-to-binary translation of off-the-shelf x86-64 binaries (BOFs, prebuilt shellcode, compiled third-party tools).
- mbedTLS as a software crypto coprocessor. By splitting bignum control flow (interpreted) from bignum math kernels (native via ECALL), handshakes drop from minutes to seconds while the entire TLS state machine stays virtualized.
- Build-time opcode randomization. Per-binary ISA mappings and runtime-decrypted opcode tables mean the bytecode on disk does not match any fixed instruction encoding, frustrating static signaturing of the bytecode payload itself.
- LLM-assisted development is now the unlock. The team built a working TLS-capable VM-based loader in roughly a week as a background task — a project they had previously scoped at three to six months of dedicated engineering. The leverage comes from matching tasks to the model’s training distribution (compilers, parsers, interpreters) and investing heavily in the feedback loop (here, the GCC and LLVM test suites).
- BYOEE shifts the analysis problem. Treating the host process as bare metal and bringing your own loader, libc, crypto and HTTP stack inside the interpreter is a useful framing for the next generation of virtualized loaders — less “code virtualizer”, more “embedded RTOS in userspace.”
Defensive Recommendations
If virtualized loaders look like the next wave, defenders need to plan for what they can and cannot see. The following hardening and detection ideas follow directly from the architectural choices described above.
- Stop relying on native-code signatures for the malicious behavior. If the “payload” lives only as bytecode for a per-binary randomized ISA, there is nothing fixed to signature. Shift detection effort toward the small native interpreter stub, the syscall and ECALL pattern, and runtime telemetry from the host process.
- Hunt for embedded-RTOS-style libraries in user-mode binaries. Strings or constants from coreHTTP, mbedTLS or other freestanding-C microcontroller libraries showing up inside an apparently ordinary user-mode Windows binary is unusual. YARA-style hunts for mbedTLS constants, coreHTTP banners, FreeRTOS strings, or curve25519/secp256r1 table constants in non-crypto processes are cheap and high-signal.
- Watch raw socket activity. A BYOEE-style implant deliberately bypasses WinHTTP/SChannel by talking directly to
afd.sys. ETW telemetry on AFD/Winsock direct usage, especially from processes that have no business doing raw networking, is a meaningful signal. - Enforce TLS inspection where you can. Crosser explicitly notes that BYOEE breaks in environments doing inline TLS inspection. For high-value internal networks, push for TLS termination at the proxy — the implant’s in-VM mbedTLS handshake is invisible until it tries to actually talk to the network.
- Constrain the dispatch surface. The universal thunk routes everything — Win32, sockets, heap — through a small number of native call sites. EDRs that hook function entry points see a process where a tiny code region issues nearly all the API calls. Look for processes whose Win32 call distribution collapses to a handful of source addresses; that pattern is unusual in legitimate software.
- Limit code-execution opportunities for unsigned binaries. WDAC / App Control with strict signer policies still helps. If the native stub cannot run, none of the bytecode does either.
- Inspect IAT shape and import patterns. A BYOEE binary deliberately keeps imports minimal — the real Win32 surface is reached at runtime through the thunk. Aggressively small import tables on otherwise normal-looking PE files are themselves a hunting signal.
- Build behavioral detections around the ECALL pattern. When mbedTLS bignum work is the only thing that crosses to native at a regular cadence during a TLS handshake, the temporal pattern (tight bursts of identical native math calls during what looks like a network connection) is detectable in EDR timelines even if no individual call is malicious.
Conclusion
Centurion is a prototype, and Crosser is upfront about that — WasmForge is the system the team actually ships for engagements. But the research direction matters. The architecture answers a useful question: how much of the supporting stack can be pulled behind the interpretation layer? When the answer is “essentially all of it, including the loader and the crypto”, the design space for virtualized loaders opens up considerably. Equally important is what the project says about how capability development is changing. A project that was deprioritized for eight years got built in a week as a background task. That dynamic — previously specialist, months-long efforts becoming feasible side projects — will play out across both offensive and defensive tooling, and the cost of building these systems is dropping fast.
Original text: “Centurion: Bring Your Own Execution Environment” by Adam Crosser at Praetorian.

