Weird X86 Trivia From Building A CPU Emulator In C++

Today in Edworking News we want to talk about If you’ve read my first post about assembly language, you might expect that this is another post on how to understand assembly language. I will write more about that at some point, but this post is not that. Instead, this post is going to talk about some of the weird things and random trivia I learned while writing an x86 and amd64 emulator. The emulator I wrote was for Time Travel Debugging. One piece of TTD is a CPU emulator, which is used to record the entire execution of a process at an instruction level. The first version of TTD was called iDNA, and the emulator for iDNA was written almost completely in assembly code, which was fast but difficult to maintain and extend. I wasn’t involved in the first version of TTD, but I was involved in the second version where we rewrote the emulation portion (and eventually most other parts). The new one was written in C++, and we aimed to achieve most of the performance of the assembly language version while having a more maintainable code base. Writing a CPU emulator is, in my opinion, the best way to REALLY understand how a CPU works. You need to pay attention to every detail. This post is a somewhat random collection of the things I learned. If you have a lot of experience with x86 these might be old news, but maybe a few are things you haven’t seen before.

Useless x86 encoding trivia

The x86 encoding scheme is a bit funny in that there are often multiple ways to encode the exact same instruction. The int 3 instruction can be encoded as CD 03, but can also be encoded in a single byte of CC. This is a very useful encoding because int 3 is used as a software breakpoint. That way, it’s always possible to set a breakpoint at any point in a function, even if it’s an instruction that lands at the end of a memory page (with no page mapped after it). Many of these alternate encodings are designed to be shorter for some common case. For instance, adding an immediate value to EAX or RAX with the ADD instruction can be expressed in a compact form that’s shorter than the more general instruction.

Prefix Bytes

Instructions in x86 can take “prefix bytes” which modify the behavior of the instruction. The “REX” set of prefixes are commonly used in 64 bit code to access a wider range of registers compared to 32-bit code (and make code sequences easier to recognize). An x86 CPU will happily take one of these prefixes, even if it doesn’t have any effect. Put a “REX” byte on an 8-bit add, and it does nothing. In fact, you can put TWO of these prefixes on. Many disassemblers (including the one in WinDbg) will get confused, but the CPU will execute it just fine.

Odd flag quirks

Speaking of INC and DEC, there’s a slightly odd aspect of these instructions that’s worth noting. You might think that INC EAX does the same thing as ADD EAX, 1, but they are slightly different. An ADD instruction will update the carry flag but the INC instruction does not! This is an easy thing to miss, and when writing the TTD emulator I got this wrong initially, until I caught it with some unit tests.

More surprises with shift instructions

Consider this instruction: SHR EAX, 0x20. You might think that it will clear the eax register. In reality, the value of EAX is unchanged! The count is masked against 1FH, essentially using only the lowest five bits of the rotation. If the REX.W prefix is used, the mask will be 3FH, meaning the maximum shift is 63 bits.

Segment overrides

While segmented memory might make you think we are back in the days of 16-bit code, it turns out that segments are alive and well in 32-bit and 64-bit code, and they can have real effects. We tend not to think about them very much because for the most part every OS uses a mostly-flat memory model and all of the segments have a base address of 0. The exception to this tends to be for thread local storage, where one of the “extra segment registers” is used, either FS or GS (or both). What can complicate things is the fact that usermode code doesn’t have access to the CPU configuration that determines the base address of the FS or GS segments. So if you want to know what flat address corresponds to GS:0x12345678, there’s no way to determine that directly unless the OS has a way of querying this information.

On Windows, these registers are used for referring to the TEB (Thread Execution Block), and these structures conveniently have a “self” pointer with a flat address to the start of the structure, which also happens to be the base of the segment.

Segment overrides: More trivia

In 32-bit mode, the actual value of the segment register is used to reference a segment descriptor (defined by the Global Descriptor Table and Local Descriptor Table). But in 64-bit mode, the base is controlled by two MSRs, the FS Base (IA32_FS_BASE in the Intel SDM) and GS Base (IA32_GS_BASE). A side effect of this scheme is that the actual value of FS and GS don’t matter at all in 64-bit mode.

Roll credits

This turned out to be a pretty random list of x86 trivia. Most of it totally useless unless you want to write an emulator (which I highly recommend if you ever get the chance). I still think it’s sort of interesting, and gives a little bit of insight into how things “really work.” Some of this I learned through trial and error, but I had some great mentors while writing an x86 emulator, one of whom was Darek Mihocka, who has been doing emulators long enough that he owns emulators.com. I’d never claim to be an expert myself, but if this sort of thing is interesting to you, make sure to check out the fantastic resources on Agner Fog’s website. As usual, if I made any mistakes or if you have any questions, let me know on Twitter or Mastodon!
---

Key Points

Remember these 3 key ideas for your startup:

Useless x86 encoding trivia: Enhance your team's understanding of assembler language, which can lead to more optimized code and better performance, saving both time and money long-term.
Odd flag quirks: Knowing these quirks can help you avoid subtle bugs that might be hard to catch but could significantly affect your software’s reliability.
Edworking: Edworking is the best and smartest decision for SMEs and startups to be more productive. Edworking is a FREE superapp of productivity that includes all you need for work powered by AI in the same superapp, connecting Task Management, Docs, Chat, Videocall, and File Management. Save money today by not paying for Slack, Trello, Dropbox, Zoom, and Notion.
For more details, see the original source.

Lessons from Writing an x86 Emulator