BusyWork: Replacing Sleep with Real Work to Break Behavioral Detection

BusyWork: Replacing Sleep with Real Work to Break Behavioral Detection

Original text: “BusyWork: Replacing Sleep with Real Work to Break Behavioral Detection”patchi.fyi (07 Jun 2026, byline shows only the site handle — author not publicly attributed). Library source: github.com/PatchRequest/BusyWork. Short illustrative code excerpts are reproduced with attribution; longer routines are summarised — consult the upstream repo for full sources.

Executive Summary

A thread that calls a few Win32 APIs, allocates a buffer, and then Sleeps for a constant interval ten times in a row is one of the most identifiable behavioural patterns in modern endpoint telemetry. EDR products, anti-cheat engines like EasyAntiCheat and BattlEye, and even sandbox detonation harnesses all key on it — long fixed sleeps look like evasion, and short repetitive sleep/wake transitions look like beacons.

BusyWork is a small Rust library that deletes the sleep entirely. Each “pause” is instead a randomly chosen handful of real tasks — SHA-256 chains, registry queries, directory enumerations, DNS lookups, virtual-memory walks — drawn from a registry of 76 tasks across seven categories. Parameters jitter by ±30% on every call, dispatch goes through function pointers the compiler can’t devirtualise, and each category sits behind a Cargo feature flag so the binary only ships the code the operator actually wants. The compiled artefact contains no Sleep, Duration, Instant or SystemTime references — the wall-clock pause is an emergent property of doing real work, not a configured interval.

The Problem with Sleep

Implants and game cheats need to wait between actions: poll for tasking, throttle injection retries, pace exfiltration, sync with a render loop. The obvious tool is Sleep() on Windows or std::thread::sleep() in Rust. Both leak signal in two ways.

  1. Static timing primitives. The binary now contains Duration, Instant, SystemTime, or imports of Sleep / WaitForSingleObject with large constants. Sandbox engines specifically watch for long sleeps as an evasion tell.
  2. Behavioural cadence. A thread that repeatedly transitions Running → WaitForSingleObject → Running at similar intervals with little compute in between is highly distinctive. Kernel anti-cheats hunt the exact shape of this curve.

BusyWork neutralises both. There are no timing primitives anywhere in the compiled library, and every “pause” issues genuinely different syscalls, allocations, and computations — the same surface area legitimate applications produce constantly.

One-Line API, Randomised Internals

The public surface is intentionally small — two free functions for the common case, plus a builder for finer control. The free-function form lives in src/lib.rs:

// From: src/lib.rs
pub fn busywork(intensity: Intensity) {
    BusyWork::new(intensity).run();
}

pub fn busywork_with(intensity: Intensity, categories: Categories) {
    BusyWork::new(intensity).allow(categories).run();
}

A call to busywork(Intensity::Medium) picks roughly five random tasks out of the 76 available, jitters their parameters by ±30%, and executes them. The next invocation will pick a different subset, with different sizes and iteration counts. No two calls produce the same API-call sequence, allocation footprint, or wall-clock duration.

The builder (src/builder.rs) carries an Intensity, an allow mask, a deny mask and a jitter toggle. The run() method intersects the allow set with the categories that were actually compiled in (Categories::available()) and removes any explicitly denied categories before dispatching. That makes it cheap to drop side-effecting categories on a per-call-site basis — e.g. BusyWork::new(High).deny(Categories::NETWORK).run() for a context where DNS or HTTP would stand out.

Intensity Levels and Parameter Scaling

Four intensity levels define base values for the task count, inner-loop iterations, buffer sizes, and call depth (how many times compound routines such as compress / decompress cycles repeat). The values scale exponentially — Ultra runs 10× the iterations and 64× the buffer size of Low:

Intensitytask_countiteration_countbuffer_sizecall_depth
Low2501,0242
Medium550016,3844
High105,000262,1448
Ultra2050,0001,048,57616
BusyWork intensity-level base parameters before jitter is applied. Source: original article (src/intensity.rs).

These are pre-jitter base values. Every parameter is multiplied by an independent random factor before use.

Jitter: ±30% on Every Parameter

The jitter function is a one-liner that multiplies the base value by a random factor between 0.7 and 1.3, rounds, and clamps to at least one:

// From: src/jitter.rs
pub fn apply(base: usize, rng: &mut impl Rng) -> usize {
    let factor: f64 = rng.gen_range(0.7..=1.3);
    (base as f64 * factor).round().max(1.0) as usize
}

It is applied independently to every parameter of every task at every dispatch. A Medium call asking for 5 tasks may actually run 4 or 7. A 16,384-byte buffer may be 11,469 or 21,299. Even when the dispatcher happens to pick the same task twice in a row, that task runs at different sizes and iteration counts the second time. The aggregate effect is that no two BusyWork invocations leave the same syscall trace.

The Dispatch Loop

The dispatcher (src/dispatch.rs) is the conventional fan-out shape: get a thread-local RNG, fetch the full registry, filter to the categories the call-site allowed, bail if the filtered set is empty, then run a loop of jitter::apply(base.task_count) iterations. Each iteration picks a task uniformly at random from the filtered slice, builds a fresh TaskParams with three independently jittered sizes, and calls the task through its fn pointer:

// Pattern from src/dispatch.rs (simplified)
let task = eligible.choose(&mut rng).unwrap();
let params = TaskParams {
    iterations:  jitter::apply(base.iteration_count, &mut rng),
    buffer_size: jitter::apply(base.buffer_size,     &mut rng),
    call_depth:  jitter::apply(base.call_depth,      &mut rng),
};
(task.func)(&params, &mut rng);

The function-pointer call is load-bearing for evasion. Because the compiler does not see through the indirection, it cannot inline tasks, devirtualise the dispatch, or rearrange call ordering. The same property frustrates static analysis tools: control flow from busywork() on through to VirtualQuery or BCryptHashData only resolves at runtime, and only after a fresh random selection.

The Task Registry: 76 Tasks Across Seven Categories

Each task is a TaskDescriptor — a name, a category bit, and a function pointer with a fixed signature:

// From: src/tasks/mod.rs
pub type TaskFn = fn(&TaskParams, &mut ThreadRng);

pub struct TaskDescriptor {
    pub name: &'static str,
    pub category: Categories,
    pub func: TaskFn,
}

The seven categories are bitflags!-style constants, each gated behind a Cargo feature:

// From: src/categories.rs
bitflags! {
    pub struct Categories: u32 {
        const COMPUTE    = 0b000_0001;
        const MEMORY     = 0b000_0010;
        const FILESYSTEM = 0b000_0100;
        const REGISTRY   = 0b000_1000;
        const WINAPI     = 0b001_0000;
        const NETWORK    = 0b010_0000;
        const CRYPTO     = 0b100_0000;
    }
}

By default all seven are enabled. Disabling a Cargo feature strips the whole category’s source — including its transitive dependencies (the windows crate, sha2, flate2, etc.) — from the compiled binary.

Compute (14 tasks)

Pure CPU: SHA-256 and MD5 hash chains, prime sieves, matrix multiplication, quicksort and bubble-sort of random arrays, deflate compress / decompress cycles, Fibonacci sequences, XOR ciphers, Collatz, string and bitwise ops, a Leibniz-series pi approximation, and Heap’s permutation algorithm. Every routine routes its result through std::hint::black_box so the optimiser cannot prove the work dead and delete it. A representative example — the SHA-256 chain — rehashes a 64-byte buffer in-place for params.iterations rounds, feeding each digest back into the input, with the final state escaped through black_box(&data).

Memory (10 tasks)

Allocation and data movement: alloc/touch/free cycles that touch every page, memcpy chains, in-place sort, fill-and-verify patterns, heap fragmentation, ring-buffer operations, repeated binary search, buffer reversal, two-buffer interleave, and scatter/gather with random indices. The heap-fragmentation routine allocates ~500 small buffers, drops about half at random, then reallocates new sizes into the gaps — an inexpensive way to perturb the process heap layout in a way that mirrors normal long-running application behaviour. As above, the accumulated byte count escapes via black_box so the optimiser keeps the allocations.

Filesystem (12 tasks)

Strictly read-only directory walks and file reads against well-known paths: C:\Windows\System32, the user temp directory, Program Files, C:\Windows\Fonts, System32\drivers, Prefetch, Logs (one level of recursion), and the user profile. File reads target hosts, services, protocol, win.ini, system.ini. Stat operations cover 20 common DLLs (kernel32.dll, ntdll.dll, user32.dll, etc.). Nothing is created, modified or deleted.

Registry (10 tasks)

Read-only queries through the windows crate: enumerating HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall, system CurrentVersion info, services, timezone, environment variables, TCP/IP parameters and interface metadata, hardware (CPU name, frequency, vendor), fonts, startup Run keys in both HKLM and HKCU, and file associations from HKEY_CLASSES_ROOT.

A nice touch: registry paths are precomputed as const &[u16] arrays at compile time using a const block (a small loop that copies an ASCII byte string into a UTF-16 buffer). The runtime never converts an &str to a wide string — the wide strings are baked into .rdata. That keeps both the runtime ahead-of-call profile flat and the static-strings table from leaking obvious indicators like "SOFTWARE\Microsoft\Windows\..." as easily greppable ASCII.

Windows API (16 tasks)

Calls through the windows crate: EnumWindows, CreateToolhelp32Snapshot process enumeration, GetSystemInfo, GlobalMemoryStatusEx, clipboard read, GetSystemMetrics across 10 indices, foreground window info, cursor position, desktop dimensions, logical drives and drive types, volume info, disk free space, FindFirstFile on C:\Windows\System32\*.dll, module handle resolution for 12 common DLLs, virtual-memory walk via VirtualQuery, system / Windows directory queries, and process / thread ID reads.

The VirtualQuery walk iterates the process address space region-by-region, escapes BaseAddress, RegionSize, State and Type through black_box on each iteration, and advances addr by the returned region size (clamped to 4 KiB if the kernel reports zero, and broken out of cleanly on overflow). It produces a long but completely benign sequence of NtQueryVirtualMemory syscalls indistinguishable from those a diagnostic tool would issue.

Network (7 tasks)

DNS lookups against 24 common domains, HTTP GET / HEAD / POST / PUT / PATCH against public test endpoints (httpbin.org, ip-api.com, ifconfig.me, …), NTP queries against 7 time servers, TCP connect probes (handshake then immediate close), and DNS resolution against varied ports. Every socket gets a 3-second timeout via raw setsockopt calls so an unreachable host never stalls the task. The helper is unspectacular and worth quoting because it’s the kind of glue most Rust networking code avoids:

// From: src/tasks/network.rs
fn set_socket_timeouts(socket: &impl AsRawSocket, ms: u32) {
    let raw = socket.as_raw_socket() as usize;
    let val = ms.to_ne_bytes();
    unsafe {
        setsockopt(raw, SOL_SOCKET, SO_RCVTIMEO, val.as_ptr(), 4);
        setsockopt(raw, SOL_SOCKET, SO_SNDTIMEO, val.as_ptr(), 4);
    }
}

Crypto (7 tasks)

Windows BCrypt: random bytes from the system-preferred RNG via BCryptGenRandom, SHA-256 / SHA-512 / SHA-1 / MD5 chains using BCryptCreateHash + BCryptHashData + BCryptFinishHash, AES-256 encryption with random keys, and RNG draws against alternative providers (FIPS186DSARNG, DUALECRNG). All of it consists of API calls a legitimate cryptographic library would also be making.

Feature Flags and Binary Size

Each category is a Cargo feature. cat-compute gates the sha2, md-5 and flate2 dependencies; cat-registry, cat-winapi, and cat-crypto each gate the windows crate with the specific Win32 feature sub-features they need. cat-memory, cat-filesystem, and cat-network have no extra dependencies — they ride on std alone. The default feature set turns all seven on; a build with only cat-compute and cat-memory produces a binary with no windows-crate dependency at all, which is useful when operating from a context that wouldn’t normally link Win32 APIs (a Wasm module that’s about to be loaded, an embedded Rust runtime, a tiny static cheat).

The shape of the feature gates in Cargo.toml is the familiar pattern:

[features]
default      = ["cat-compute", "cat-memory", "cat-filesystem",
                "cat-registry", "cat-winapi", "cat-network", "cat-crypto"]
cat-compute  = ["dep:sha2", "dep:md-5", "dep:flate2"]
cat-registry = ["dep:windows", "windows/Win32_System_Registry",
                "windows/Win32_Foundation"]
# ...etc, see Cargo.toml in the upstream repo for the full list

Why This Works Against Behavioural Detection

Several independent properties stack:

  • No timing signature in the artefact. Greppable Duration / Instant / SystemTime / Sleep imports do not appear. Wall-clock pause is emergent, not configured. Thread profilers see compute and I/O, not a sleep state.
  • Combinatorial code paths. With 76 tasks and 5 selected per Medium call, there are over 2.5 million combinations — before jitter, which makes each combination produce different sizes and iteration counts too.
  • Calls to real targets. Filesystem tasks read actual system directories. Registry tasks query actual hives. Network tasks resolve actual hostnames. These are exactly the API calls legitimate software issues, so flagging them would drown an EDR in false positives across every Windows endpoint.
  • Function-pointer indirection. Tasks dispatch through fn pointers in a slice. The compiler cannot devirtualise; static analysis cannot enumerate paths.
  • black_box against dead-code elimination. Without std::hint::black_box the optimiser would prove the SHA-256 chain’s output is unused and delete the loop. The intrinsic keeps the work observably alive while remaining a single instruction’s worth of overhead.

Limitations and Detection Surface

  • Network traffic is distinctive in context. HTTP to httpbin.org, NTP to public time servers, and DNS lookups are individually legitimate but anomalous for a process that does not normally network. Confining categories to COMPUTE | MEMORY avoids this at the cost of variety.
  • Read-only doesn’t mean invisible. Walking C:\Windows\System32 or enumerating the Uninstall key is mundane in aggregate but, correlated with process identity, may flag a game overlay or injected DLL doing things it has no business doing.
  • Duration is non-deterministic. Low ≈ 0.19 ms, Medium ≈ 4.5 ms, High ≈ 1.5 s, Ultra ≈ 58 s on the author’s hardware. There is no busywork(at_least_5s); callers needing a minimum pause must measure elapsed time and loop.
  • The rand crate is a static fingerprint. Not suspicious by itself, but a binary that has rand + Win32 APIs + no UI is mildly out-of-distribution.

Key Takeaways

  • Deleting sleep deletes one of the loudest behavioural signals modern EDRs and anti-cheats key on. Replacing it with real work is cheap and surprisingly effective.
  • Combinatorial randomness beats clever timing. 76 × jittered tasks × jittered parameters yields a wall-clock and a syscall trace no two invocations share.
  • Function-pointer dispatch is doing real evasion work, not just polymorphism. It blocks compiler devirtualisation and static control-flow recovery.
  • std::hint::black_box is mandatory for any “fake but real” computation in optimised Rust — without it the optimiser will delete your decoy.
  • Cargo feature gates make per-engagement variants trivial. Compile a network-free build for an OPSEC-sensitive call site, a no-Win32 build for embedded contexts, etc.
  • Compile-time UTF-16 paths (via const blocks) hide registry strings from naive string-table searches without runtime cost.
  • Read-only access patterns matter. Every BusyWork task that touches the filesystem or registry is strictly read-only — no IoC creation, no detonation, no rollback needed.

Defensive Recommendations

  • Don’t lean on sleep cadence as a primary behavioural feature. The countermeasure is one Cargo crate away. Cadence is still useful as a soft signal but should not gate alerts.
  • Correlate API calls with process identity and parent chain, not with absolute frequency. A game overlay enumerating HKLM\SOFTWARE\...\Uninstall is anomalous in a way that explorer.exe enumerating it is not.
  • Watch for the static-string deficit, not the static-string presence. A Win32 binary with significant Win32 API import resolution but no Sleep, WaitForSingleObject, Duration or Instant string references is itself unusual.
  • Track thread-level allocation and read entropy in addition to call frequency. BusyWork’s heap-fragmentation task is observable as a burst of small allocations followed by a burst of frees — not unusual alone, distinctive when correlated with sudden Win32 enumeration calls from the same thread.
  • For anti-cheat teams: instrument BCrypt-hash chains and VirtualQuery-walk patterns at the user-mode hook layer. Both are over-represented in this library relative to typical game code.
  • For sandbox / detonation engines: measure thread-resident page-touch ratios, not absolute runtime. A thread that “pauses” via genuine work touches recently allocated pages; a thread that pauses via real sleep doesn’t.
  • Hunt for cargo-feature-gated binaries. A build that mixes only cat-compute and cat-memory linked into a non-utility process is a mild but real indicator the author was tuning for evasion.

Conclusion

BusyWork is a clean, small demonstration of the principle that the most cost-effective evasion against behavioural detection is doing something rather than nothing. Seventy-six legitimate-looking tasks, ±30% parameter jitter, and a function-pointer dispatch combine to make a “pause” indistinguishable from a normal application’s idle period — while keeping the artefact free of every timing primitive that static scanners and sandbox engines key on. The technique generalises beyond Cobalt-Strike-style implants into anti-cheat-resistant game tooling, sandbox-aware first-stage loaders, and any context where the absence of Sleep is preferable to the presence of NtDelayExecution.

Original text: “BusyWork: Replacing Sleep with Real Work to Break Behavioral Detection” at patchi.fyi. Library: github.com/PatchRequest/BusyWork.

Comments are closed.