Reading Time: 8 minutes

TL;DR We know that ROP attacks are a useful exploiting technique, but how much executional power can an adversary have using ROP attacks? In our attempt to answer this question, we present ROPLANG, a virtual language that maps all programing operations that simulates a Turing machine (add, compare, branch, etc.) to ROP gadgets. We first move this ROPLANG to the ROP3 tool and, after running our experiments, we conclude that an adversary has enough ROP gadgets to simulate any virtual operation in Windows 7 and 10, branching operations being the least frequent. ROP3 is released under the GNU GPLv3 license, so you can expand the tool with your own operations. A more detailed discussion on the virtual language and our experiments can be found in our recent publication accepted at the 15th IEEE Workshop on Offensive Technologies (WOOT’21).


Control-hijacking attacks are one of the most popular category of memory exploits. These attacks use code injection or its evolution, code-reuse attacks, to hijack the legitimate control flow of a victim program and execute malicious code. In particular, code-reuse techniques allow an adversary to hijack the control flow of a victim program to perform malicious activities without injected code and thus bypass typical defenses such us write-xor-execute (W⊕X).

Return-Oriented-Programming (ROP) is a code-reuse attack technique introduced in 2007 by Shacham for the x86 architecture (as an evolution of the return-to-libc attacks). ROP considers and links together (relatively short) code snippets already present in the process’s memory address space, named ROP gadgets. Each code snippet ends with an instruction that changes the control flow of the program (e.g., a ret instruction on Intel architectures), allowing an attacker who controls the stack to chain them together, controlling the order of code execution through the stack values. A chain of ROP gadgets is normally called a ROP chain.

ROP attacks have been demonstrated to be feasible on numerous architectures, such as RISC, Linux/x86m Solaris/SPARC architectures, and recently on RISC-V. Additionally, ROP attacks are known to be Turing-complete. However, a question that still remains open is the quantification of the executional power of an adversary, which is determined by the code that exists in the memory of a victim program.

To answer this question, we first define a Turing-complete set of operations that make up a virtual language. This virtual language, called ROPLANG, defines a set of operations that are then mapped to specific ROP gadgets, thus representing a ROP chain in an abstract way. We then use these operations to quantify the executional power of an adversary in a given environment. In particular, we evaluate the executional power on Windows 7 and Windows 10, in their x86 and x86-64 versions, as Windows remains the predominant platform targeted by attackers.

ROPLANG

A virtual operation in ROPLANG is simulated using a specific sequence of assembly instructions in the execution of the vulnerable program made up of ROP gadgets. These operations can be divided into the following categories:

  1. Arithmetic operations: the basic arithmetic operation we consider are addition (add), subtraction (sub), and negation (neg), which can be simulated using a variety of instructions sequences. For instance, to simulate a sum we can rely on the own Intel x86 instruction add, on the instruction inc, or on the combination of clc and adc instructions..
  2. Assignment operations: these operations allow us to assign value to variables. A variable in this context is a CPU logical (also called general-purpose) register, while a value can be an immediate or other logical register as well. In this category, we have considered the move (mov) and load constant (lc) operations.
  3. Dereference operations: load (ld) and store (st) operations represent memory dereferences, which are useful for visiting a memory location to read or write it.
  4. Logical operations: this category includes the operations xor, and, or, and not.
  5. Branching operations: conditional branching operations require some tricky steps. In particular, we need to stitch together several ROP gadgets to obtain an operation equivalent to conditional branching. We first need to undertake some operation to perform the desired comparison. The carry flag (CF) is sufficient to obtain the full set of standard comparisons. As comparison operations, we define equality comparison (eqc), and less than comparison (ltc). All the comparison operations will clear (or set) the CF according to the comparison performed. After the comparison is done, we can make the conditional change to the stack pointer based on the result of the comparison. In brief, we define the following operations:
    1. First, we need to get the CF flag in a general-purpose register. In this regard, we can use assembly instructions that work explicitly with the CF, such as left/right rotations with carry (rcl, rcr), or addition with carry (adc). We named this operation as get carry flag operation (gcf). This operation needs a compare operation as a parameter.
    2. When the gcf operation is done, we have a general-purpose register that contains a value of 1 or 0. We then transform it to contain an arbitrary value δ or 0, where δ represents the offset that we want to add to the stack pointer register if the condition checked in the first step is met. In this regard, we define the load stack delta operation (lsd) that allows us to set δ or 0 in a general-purpose register.
    3. Finally, we modify the stack pointer register value appropriately. We define a stack pointer addition (spa) that performs an add to the stack pointer register of any other general-purpose register. Of course, the addition becomes a subtraction operation if the register contains a negative value

Regarding unconditional branching, a virtual operation can be simply constructed based on a similar way to conditional branching. We define the jmp operation so that it makes use of lc to load a δ offset in a register, and then uses spa to unconditionally change the control flow of the ROP chain.

ROP3

The ROP3 tool is developed in the Python programming language and relies on the Capstone disassembly framework to find gadgets, operations, and ROP chains. Thanks to Capstone, the disassembled instruction has a very fine level of detail, such as the list of implicit read/written registers, which allows the tool to make decisions based on the types of instructions and operands.

Each of the virtual operations that make up ROPLANG is natively supported by ROP3. Specifically, a virtual operation is defined in a separated file using YAML syntax. An operation is made up of one or more sets of instructions, whereas a single instruction is defined by its mnemonic and the operands (if any).

At a higher abstraction level, in ROP3 a ROP chain is the concatenation of a series of operations. Typically, there are data dependencies between the logical registers used by operations of a ROP chain. For example, a ROP chain can first load a constant value in any logical register (lc(reg1) with ROPLANG syntax) and then move that value to another register (mov(reg2, reg1)).

To search for ROP chains, ROP3 works as follows. The tool first parses the plain text file provided by the user as input, in which the operations to chain are defined with ROPLANG operations. It then finds all gadgets that comply with each operation, providing the source and destination logical registers of each operation and builds a tree structure, considering the operation order as defined in the input file. Finally, it solves the data dependencies between operations traversing the tree recursively in depth-first order with backtracking. A more detailed explanation of how the backtracking algorithm works is provided in our WOOT’21 paper.

Backtracking algorithm to find a ROP chain

Our tool is released under the GNU/GPLv3 license to foster research in this area. You can get the source code from GitHub. The dataset used for the experimental evaluation below and instructions for reproducing our results are provided in this Zenodo record.

Evaluation

In this section, we first use ROP3 to evaluate how many ROP gadgets for any arbitrary operation arise in real-world programs. In particular, we focus on different flavors of the Windows OS, checking whether the default dynamic link libraries (DLLs) shipped with the Windows OS contain ROP gadgets for any arbitrary operation, as specified by ROPLANG. We then check if there are enough gadgets to build a fully operating (classic) Turing machine.

As the number of DLLs that ship with the Windows OS is in terms of hundreds, and it is also different in Windows versions, we have considered only the subset of system DLLs contained in KnownDlls that are common in all versions of Windows considered for experimentation. This subset of DLLs is made up of a total of 20 DLL files, including DLL files that are frequently used by Windows programs. Apart from these, we have considered other DLLs like msvcrt.dll, psapi.dll, ws2_32.dll, and ntdll.dll in the selected subset.

We tested the two most used versions of the Windows OS (Windows 10, 75.68%, and Windows 7, 18.03%; numbers of the year 2020 according to this link), grouped by architecture, are: [32 bits] Windows 7 Professional 6.1.7601 Service Pack 1 Build 7601 and Windows 10 Education 10.0.14393 Build 14393; [64 bits] Windows 7 Professional 6.1.7601 Service Pack 1 Build 7601 and Windows 10 Pro 1703 Build 15063.726.

Regarding the ROP3 configuration, we configure it to search for gadgets of 10 bytes in length. Although the tool supports searching for different ROP gadget endings, we have only considered near returns (instruction ret) because the other endings introduce more complexity when building a ROP chain. In addition, we have counted ROP gadgets made up of the same sequence of instructions once, regardless of the memory addresses they are mapped to.

Note that our results are biased by the current definition of ROPLANG operations. We have considered each ROP gadget made up of several instructions as a single gadget, instead of handling each instruction as an individual gadget. Additionally, we have expanded the spa operation to also consider adding immediate values such as 4, 8, 16, and 32 (spa-4, spa-8, spa-16, and spa-32 operations, respectively) to increase the probability of finding appropriate ROP gadgets. In the same way, we have divided the gcf operation in two, embedding in them the type of comparison that is being carried out: equality comparison (gcf-eqc) and less-than comparison (gcf-ltc) operations.

For each operation, we calculate its percentage of occurrence within each DLL and plotted it on a heatmap. In addition, we have also annotated the number of results found by ROP3, setting only the most significant digit and the order of magnitude when the number of results is greater than 104 for the sake of readability. DLLs have been sorted by size in each heatmap.

Number of ROP gadgets per ROPLANG ’s operation in Windows 7
Number of ROP gadgets per ROPLANG ’s operation in Windows 10

Our experiments show that the branching virtual operations are the least frequent operations, regardless of architecture. Moreover, in 32-bit there are no results for any of the comparison operations as defined by the current ROPLANG operations, regardless of the version of Windows. In fact, these operations only appear in a 64-bit Windows 7 SP1 DLL. The case of the unconditional branching operation is very interesting, since we have (few) results in 32-bit versions, while none in 64-bit.

For the rest of virtual operations, the results are diverse, although we can find at least one result for each virtual operation. Results tend to be higher at the bottom part of the figures, which corresponds with the largest DLLs. As other authors claim, the longer the binary code, the more likely it is to find ROP gadgets to perform any arbitrary operation.

Regarding the Windows versions, the results show the number of virtual operations found in Windows 10 is always greater than in Windows 7. This may be motivated by the difference in size between the DLLs, since in Windows 10 the DLLs are normally larger than in Windows 7. Additionally, we have empirically verified that the size in 64-bit is always greater than in 32-bit (except msvcrt.dll) for both versions of Windows, which also explains the slight variations in the results between the architecture word sizes.

It is worth mentioning also that 64-bit mode in Intel introduced a new form of addressing called relative Instruction Pointer addressing (RIP-relative addressing), which is the default for many 64-bit instructions that reference memory addresses in any of its operands. Therefore, none of the 64-bits instructions contain absolute memory addresses. In contrast, 32-bit instructions within a DLL can contain references to memory addresses with regard to their base addresses, which are randomized in every Windows startup due to ASLR. Therefore, unlike 64-bit results, 32-bit results are highly dependent on the base addresses of the DLLs, and can change when the base addresses are different. More experimentation is needed to evaluate how ASLR can affect the prevalence of ROP gadgets on 32-bit Windows systems.


And that’s all, folks! In this post we briefly explained the main idea behind our paper “Evaluation of the Executional Power in Windows using Return Oriented Programming”, recently presented at the 15th IEEE Workshop on Offensive Technologies (WOOT’21). You can get our ROP3 tool for free and expand our ROPLANG language as you wish. If you want to reproduce our experiments, follow this link for further instructions. We are open to new ideas and collaborations to research offensive security, do not hesitate to contact us!