Lessons from Writing an x86 Emulator

BY Mark Howell 11 July 20246 MINS READ
article cover

Today in Edworking News we want to talk about If you’ve read my first post about assembly language, you might expect that this is another post on how to understand assembly language. I will write more about that at some point, but this post is not that. Instead, this post is going to talk about some of the weird things and random trivia I learned while writing an x86 and amd64 emulator. The emulator I wrote was for Time Travel Debugging. One piece of TTD is a CPU emulator, which is used to record the entire execution of a process at an instruction level. The first version of TTD was called iDNA, and the emulator for iDNA was written almost completely in assembly code, which was fast but difficult to maintain and extend. I wasn’t involved in the first version of TTD, but I was involved in the second version where we rewrote the emulation portion (and eventually most other parts). The new one was written in C++, and we aimed to achieve most of the performance of the assembly language version while having a more maintainable code base. Writing a CPU emulator is, in my opinion, the best way to REALLY understand how a CPU works. You need to pay attention to every detail. This post is a somewhat random collection of the things I learned. If you have a lot of experience with x86 these might be old news, but maybe a few are things you haven’t seen before.

Useless x86 encoding trivia

The x86 encoding scheme is a bit funny in that there are often multiple ways to encode the exact same instruction. The int 3 instruction can be encoded as CD 03, but can also be encoded in a single byte of CC. This is a very useful encoding because int 3 is used as a software breakpoint. That way, it’s always possible to set a breakpoint at any point in a function, even if it’s an instruction that lands at the end of a memory page (with no page mapped after it). Many of these alternate encodings are designed to be shorter for some common case. For instance, adding an immediate value to EAX or RAX with the ADD instruction can be expressed in a compact form that’s shorter than the more general instruction.

Prefix Bytes

Instructions in x86 can take “prefix bytes” which modify the behavior of the instruction. The “REX” set of prefixes are commonly used in 64 bit code to access a wider range of registers compared to 32-bit code (and make code sequences easier to recognize). An x86 CPU will happily take one of these prefixes, even if it doesn’t have any effect. Put a “REX” byte on an 8-bit add, and it does nothing. In fact, you can put TWO of these prefixes on. Many disassemblers (including the one in WinDbg) will get confused, but the CPU will execute it just fine.

Odd flag quirks

Speaking of INC and DEC, there’s a slightly odd aspect of these instructions that’s worth noting. You might think that INC EAX does the same thing as ADD EAX, 1, but they are slightly different. An ADD instruction will update the carry flag but the INC instruction does not! This is an easy thing to miss, and when writing the TTD emulator I got this wrong initially, until I caught it with some unit tests.

More surprises with shift instructions

Consider this instruction: SHR EAX, 0x20. You might think that it will clear the eax register. In reality, the value of EAX is unchanged! The count is masked against 1FH, essentially using only the lowest five bits of the rotation. If the REX.W prefix is used, the mask will be 3FH, meaning the maximum shift is 63 bits.

Segment overrides

While segmented memory might make you think we are back in the days of 16-bit code, it turns out that segments are alive and well in 32-bit and 64-bit code, and they can have real effects. We tend not to think about them very much because for the most part every OS uses a mostly-flat memory model and all of the segments have a base address of 0. The exception to this tends to be for thread local storage, where one of the “extra segment registers” is used, either FS or GS (or both). What can complicate things is the fact that usermode code doesn’t have access to the CPU configuration that determines the base address of the FS or GS segments. So if you want to know what flat address corresponds to GS:0x12345678, there’s no way to determine that directly unless the OS has a way of querying this information.

On Windows, these registers are used for referring to the TEB (Thread Execution Block), and these structures conveniently have a “self” pointer with a flat address to the start of the structure, which also happens to be the base of the segment.

Segment overrides: More trivia

In 32-bit mode, the actual value of the segment register is used to reference a segment descriptor (defined by the Global Descriptor Table and Local Descriptor Table). But in 64-bit mode, the base is controlled by two MSRs, the FS Base (IA32_FS_BASE in the Intel SDM) and GS Base (IA32_GS_BASE). A side effect of this scheme is that the actual value of FS and GS don’t matter at all in 64-bit mode.

Roll credits

This turned out to be a pretty random list of x86 trivia. Most of it totally useless unless you want to write an emulator (which I highly recommend if you ever get the chance). I still think it’s sort of interesting, and gives a little bit of insight into how things “really work.” Some of this I learned through trial and error, but I had some great mentors while writing an x86 emulator, one of whom was Darek Mihocka, who has been doing emulators long enough that he owns emulators.com. I’d never claim to be an expert myself, but if this sort of thing is interesting to you, make sure to check out the fantastic resources on Agner Fog’s website. As usual, if I made any mistakes or if you have any questions, let me know on Twitter or Mastodon!
---

Key Points

Remember these 3 key ideas for your startup:

  1. Useless x86 encoding trivia: Enhance your team's understanding of assembler language, which can lead to more optimized code and better performance, saving both time and money long-term.

  2. Odd flag quirks: Knowing these quirks can help you avoid subtle bugs that might be hard to catch but could significantly affect your software’s reliability.

  3. Edworking: Edworking is the best and smartest decision for SMEs and startups to be more productive. Edworking is a FREE superapp of productivity that includes all you need for work powered by AI in the same superapp, connecting Task Management, Docs, Chat, Videocall, and File Management. Save money today by not paying for Slack, Trello, Dropbox, Zoom, and Notion.
    For more details, see the original source.

article cover
About the Author: Mark Howell Linkedin

Mark Howell is a talented content writer for Edworking's blog, consistently producing high-quality articles on a daily basis. As a Sales Representative, he brings a unique perspective to his writing, providing valuable insights and actionable advice for readers in the education industry. With a keen eye for detail and a passion for sharing knowledge, Mark is an indispensable member of the Edworking team. His expertise in task management ensures that he is always on top of his assignments and meets strict deadlines. Furthermore, Mark's skills in project management enable him to collaborate effectively with colleagues, contributing to the team's overall success and growth. As a reliable and diligent professional, Mark Howell continues to elevate Edworking's blog and brand with his well-researched and engaging content.

Trendy NewsSee All Articles
CoverDecoding R1: The Future of AI Reasoning ModelsR1 is an affordable, open-source AI model emphasizing reasoning, enabling innovation and efficiency, while influencing AI advancements and geopolitical dynamics.
BY Mark Howell 26 January 2025
CoverSteam Brick: A Minimalist Gaming Console Redefines PortabilitySteam Brick: A modified, screenless Steam Deck for travel, focusing on portability by using external displays and inputs. A creative yet impractical DIY project with potential risks.
BY Mark Howell 26 January 2025
CoverVisual Prompt Injections: Essential Guide for StartupsThe Beginner's Guide to Visual Prompt Injections explores vulnerabilities in AI models like GPT-4V, highlighting security risks for startups and offering strategies to mitigate potential data compromises.
BY Mark Howell 13 November 2024
CoverGraph-Based AI: Pioneering Future Innovation PathwaysGraph-based AI, developed by MIT's Markus J. Buehler, bridges unrelated fields, revealing shared complexity patterns, accelerating innovation by uncovering novel ideas and designs, fostering unprecedented growth opportunities.
BY Mark Howell 13 November 2024
CoverRevolutionary Image Protection: Watermark Anything with Localized MessagesWatermark Anything enables embedding multiple localized watermarks in images, balancing imperceptibility and robustness. It uses Python, PyTorch, and CUDA, with COCO dataset, under CC-BY-NC license.
BY Mark Howell 13 November 2024
CoverJungle Music's Role in Shaping 90s Video Game SoundtracksJungle music in the 90s revolutionized video game soundtracks, enhancing fast-paced gameplay on PlayStation and Nintendo 64, and fostering a cultural revolution through its energetic beats and immersive experiences.
BY Mark Howell 13 November 2024
CoverMastering Probability-Generating Functions: A Guide for EntrepreneursProbability-generating functions (pgfs) are mathematical tools used in probability theory for data analysis, risk management, and predictive modeling, crucial for startups and SMEs in strategic decision-making.
BY Mark Howell 31 October 2024
CoverMastering Tokenization: Key to Successful AI ApplicationsTokenization is crucial in NLP for AI apps, influencing data processing. Understanding tokenizers enhances AI performance, ensuring meaningful interactions and minimizing Garbage In, Garbage Out issues.
BY Mark Howell 23 October 2024
Try EdworkingA new way to work from  anywhere, for everyone for Free!
Sign up Now