z386: An Open-Source FPGA 80386 Driven by the Original Intel Microcode

Original: This article is an independent of “z386: An Open-Source 80386 Built Around Original Microcode” by nand2mario, published on Small Things Retro on May 23, 2026.

All hardware research, RTL design decisions, performance measurements, block diagrams, die-shot annotations and benchmarks belong to the original author. Three of the original images (Doom II screenshot, 80386 block diagram, 80386 die with units labelled) are reproduced here at their original positions; the article’s three vector diagrams (front-end pipeline, microcode word layout, cache path) are SVGs that the core-jmp.org install currently does not accept, so they are linked back to the source rather than re-hosted. For the full code listings, RTL discussion and every walk-through step, read the source.

Source: nand2mario.github.io/posts/2026/z386 · Project: github.com/nand2mario/z386

Doom II running on the z386 FPGA core — Doom II running on z386. *Source: original article.*

Executive Summary

z386 is an open-source FPGA implementation of Intel’s 80386 that takes an unusually principled approach: rather than re-implementing the x86 instruction set in fresh RTL, it runs the original recovered Intel microcode on a reconstructed datapath that matches what the microcode expects. The author’s previous project z8086 set the pattern; z386 is the much harder follow-up to a full protected-mode 386. The compactness numbers are striking: ~8,000 lines of code, 18 K ALUTs, 5 K registers, 116 K BRAM, and 85 MHz on a Cyclone V — comparable to or smaller than ao486, the established FPGA 486 core, while running on multiple FPGA families including Altera Cyclone V and Gowin GW5A.

For reverse engineers and hardware-security researchers, the value is two-fold. First, this is the cleanest publicly available silicon archaeology on the 386: a working CPU that depends on (and therefore validates) reenigne’s recovered 386 microcode and the SingleStepTests/80386 instruction-level corpus. Second, the testing methodology — per-instruction fuzz tests, hand-written protected-mode programs (interrupts, gates, VM86, transitions), and reference comparison against 86Box — is reusable for anyone who needs to validate an x86 implementation or analyse 386 behaviour. The core boots SeaBIOS, runs DOS 6/7/7.1, the DOS/4GW and DOS/32A extenders, HIMEM and EMM386, and plays Doom and Doom II.

From z8086 to z386

The leap from a 16-bit 8086 to a fully-featured 32-bit 386 is a step-function of complexity: paging, protected mode, four privilege levels, gate descriptors, VM86, page faults, exceptions inside exceptions, a TLB, and a cache. The author’s framing is that z8086 was a one-person evening project; z386 is a four-month deep dive that required new tooling, new test infrastructure, and new respect for the original engineers.

z386 — high-level view

Intel’s own 80386 documentation describes the CPU as eight cooperating units; z386 mirrors that organisation almost one-for-one:

Intel 80386 block diagram showing eight cooperating units — The 80386 as eight cooperating units. *Source: Intel, The Intel 80386 — Architecture and Implementation, Figure 8 (via original article).*

Intel 80386 die shot with the eight functional units labelled — The same eight-unit organisation on the 80386 die. *Base image: Intel 80386 DX die, Wikimedia Commons (via original article).*

The eight units are Prefetch, Decoder, Microcode Sequencer, ALU/Shifter, Segmentation, Protection, Paging, and BIU / Cache / Memory. The headline performance and size numbers (reproduced cell-for-cell from the source):

Metric	z386	ao486
Lines of code (cloc)	8K	17.6K
ALUTs	18K	21K
Registers	5K	6.5K
BRAM	116K	131K
FPGA clock	85 MHz	90 MHz
3DBench FPS	34	43
Doom (original) FPS, max details	16.5	21.0

z386 vs ao486 — size and performance. Source: original article.

Instruction prefetch

The 386 fetches code into a 16-byte raw queue and feeds it byte-at-a-time to the decoder. z386 preserves that byte-serial decode model but exposes a wider window when the decoder needs a multi-byte displacement or immediate, so that complex addressing modes do not cost extra clocks waiting for bytes that have already been prefetched. The author publishes the front-end pipeline as an SVG diagram in the source — view it on the original blog (we link rather than re-host because core-jmp.org’s WP install currently rejects SVG uploads).

Decode

Decode in z386 follows the 386’s own organisation: a Control PLA identifies the instruction’s structural shape (opcode form, addressing modes, prefix interactions) and an Entry PLA routes execution to the correct microcode entry point. The output is a 3-entry FIFO of decoded instructions ready for the sequencer.

Microcode sequencer — the control program

This is the heart of the project. The 386 contains a microcode ROM whose contents are the “real” control program for the CPU; every x86 instruction is implemented as a sequence of microcode operations that drive the datapath. z386 uses a 37-bit microcode word with 2,560 entries, recovered from reenigne’s ongoing 386 microcode disassembly. The word splits into source, destination, ALU operand selectors, ALU/jump op, sub-bus, and an “RNI” (Run Next Instruction) flag — structurally similar to what was published for the 8086 by the same researcher. See the SVG layout diagram in the source: microcode_word.svg.

A two-line excerpt from the disassembled microcode (reproduced verbatim from the source) shows the style:

003  SRCREG                           PASS    RNI          0
004  SIGMA  -> DSTREG

One consequence of running the original microcode is a hard minimum CPI of 2 — even register-to-register moves cost two cycles because of delay slots built into the original control program. The trade-off is faithfulness: the same microcode that ran on real silicon now runs on the FPGA, and behavioural divergence becomes a much narrower search space.

Cache

The real 386 did not have an on-die cache; that came with the 486. To make a synthesisable 386-as-emulator usable on modest FPGAs without bottlenecking on SDRAM, z386 adds a small L1 cache that sits behind the segmentation/paging units. The cache path diagram is published as cache_path.svg in the source. The parameters (verbatim from the article):

Parameter	z386 cache
Size	16 KB
Line size	16 bytes, 4 DWORDs
Associativity	4-way set associative
Replacement	PLRU
Policy	unified I+D, read-allocate, write-through
Write buffer	2 entries
Fill	SDRAM burst fill, early restart
Lookup	VIPT preread, physical tag compare
Hit latency	zero-wait hit response after preread

z386 L1 cache parameters. Source: original article.

The VIPT (virtually indexed, physically tagged) trick decouples address translation from the cache lookup — the cache index is taken from the virtual address (no TLB lookup needed) while the tag comes from the physical address (so aliasing is handled correctly). It is the same pattern Linux kernel cache documentation explains in James Bottomley’s “Understanding Caching” piece referenced in the article.

Testing

A 386 with cached protected mode is too complex for any single test suite to fully cover. nand2mario stacks five different test surfaces:

Single-instruction fuzz tests from gloriouscow’s SingleStepTests/80386 corpus, comparing per-instruction state transitions byte-for-byte against a reference.
Protected-mode isolation tests: a custom generator drives 86Box and z386 through matched protected-mode sequences, dumping diffs.
Hand-written protected-mode programs exercising gates, interrupts, VM86, ring transitions, and page faults.
The broader test386.asm compatibility suite.
Real software: SeaBIOS, DOS 6/7/7.1, DOS/4GW, DOS/32A, HIMEM, EMM386, Doom, Doom II, Cannon Fodder, 3DBench, Speed600, FastDoom.

Simulation runs in Verilator with waveform tracing; Ghidra was used to analyse DOS extender behaviour when reproducing bugs.

z386 and ao486

The article ends with a deliberate comparison against ao486 — the most prominent existing FPGA x86 implementation, targeting the 486. The two projects sit at different points on a fundamental design axis (verbatim reproduction of the comparison table):

Topic	z386 / 386 style	ao486 / 486 style
Main organization	Large cooperating units	Finer-grained pipeline stages
Control model	Original 386 microcode ROM drives reconstructed hardware	Staged command flow implements x86 behavior
Front end	16-byte raw prefetch queue, 3-entry decoded-instruction queue	32-byte raw prefetch queue and instruction aligner feeding D1/D2-style pipeline stages
Memory model	Segmentation, paging, cache, and bus timing contracts	Same architectural pieces, different implementation
Performance risk	Coarse steps, high CPI, and Fmax pressure	Pipeline hazards, forwarding, and stage scheduling

z386 vs ao486 — architectural style and performance risk. Source: original article.

In short: z386 is “original CPU, faithfully reconstructed,” while ao486 is “clean-sheet x86 implementation with modern pipeline engineering.” The current performance gap (34 vs 43 in 3DBench) reflects that trade-off rather than a bug.

Current Status

As of the article’s publication: z386 is open source on GitHub, runs on Altera Cyclone V (including DE10-Nano / MiSTer) and Gowin GW5A FPGAs, boots SeaBIOS and DOS, runs DOS extenders and the major DOS games tested, and performs at roughly the level of a fast (~70 MHz) cached 386 or a low-end 486. Windows is not yet supported. The code base is small enough (8 KLOC) to read end-to-end, which is the project’s most underrated property as teaching material.

Why this is interesting for security researchers

Verifiable silicon archaeology. The 386 microcode that this project depends on is the same control program that ran inside real Intel parts. A working FPGA validates the dump; bugs in z386 versus 86Box / real hardware narrow down where the disassembly is still wrong.
Reproducible CPU testing methodology. The SingleStepTests / protected-mode / hand-written / real-software stacked approach is reusable for anyone building or auditing an x86 implementation (including security-relevant ones: emulators, sandbox cores, deobfuscators).
Microcode-aware reverse engineering. The relationship between Control PLA / Entry PLA / microcode ROM in z386 mirrors how modern Intel CPUs are still structured at the front end. The 386 is the cleanest publicly inspectable example of that decode-then-microcode pattern.
Substrate for offensive research on legacy stacks. SeaBIOS + DOS extenders are not just retro — they live on inside firmware bring-up paths, embedded systems, and certain industrial control stacks. A small, hackable 386 core is a useful platform for studying their behaviour without paying for vintage hardware.

Key Takeaways

z386 runs the original Intel 386 microcode on a reconstructed FPGA datapath — not a behavioural x86 emulator.
Small and fast: 8 K LOC, 18 K ALUTs, 5 K registers, 116 K BRAM, 85 MHz on Cyclone V.
Performance: 34 FPS in 3DBench and 16.5 FPS in original Doom (vs ao486’s 43 and 21.0). Roughly a fast 386 / low-end 486.
Boots and runs: SeaBIOS, DOS 6/7/7.1, DOS/4GW, DOS/32A, HIMEM, EMM386, Doom, Doom II, Cannon Fodder, 3DBench, Speed600, FastDoom.
Cache adds an L1 the real 386 didn’t have: 16 KB, 4-way, 16-byte lines, VIPT lookup with physical tag compare, write-through.
Testing stacks SingleStepTests/80386, custom 86Box-comparison protected-mode tests, hand-written protected-mode programs, test386.asm, and real software.
Open source: github.com/nand2mario/z386.

Practical Reading List (for security researchers)

reenigne’s 386 microcode disassembly — the source of the 37-bit microcode words z386 runs.
gloriouscow’s SingleStepTests/80386 corpus — a reusable per-instruction state-transition test set for any x86 implementation.
James Bottomley, “Understanding Caching” (Linux Journal) — the canonical explainer for VIPT.
Intel, The Intel 80386 — Architecture and Implementation — the eight-unit organisation z386 mirrors.
86Box as a reference x86 emulator for cross-validation, plus Ghidra for DOS-extender analysis when reproducing bugs.
nand2mario’s z8086 — the smaller, friendlier predecessor project that introduces the “run-the-original-microcode” pattern.

Conclusion

z386 is an unusually disciplined piece of silicon archaeology: small enough to read in an afternoon, faithful enough to run real DOS software at speed, and structured to validate the recovered Intel microcode it depends on. For reverse engineers, hardware-security researchers and anyone interested in how decode/microcode actually works inside an x86 CPU, this is the cleanest public reference at this level of detail in 2026. The original Small Things Retro post is worth reading in full — especially for the SVG diagrams of the front-end pipeline, the microcode word, and the cache path, which we have linked back to rather than re-hosting.

This article is an independent English-language rewrite of “z386: An Open-Source 80386 Built Around Original Microcode” by nand2mario, originally published on the Small Things Retro blog on May 23, 2026. All RTL design, microcode disassembly work, benchmarks, and diagrams remain the work of the original author and the named upstream contributors (reenigne, gloriouscow, Ken Shirriff). Please cite nand2mario when referencing this material.

core-jmp

z386: An Open-Source FPGA 80386 Driven by the Original Intel Microcode