This research project is funded by the Ministry of Science, Innovation and Universities of Spain and the European Union within the Recovery, Transformation and Resilience Plan (RTRP) under the call «Proyectos Estratégicos Orientados a la Transición Ecológica y a la Transición Digital» 2021 (reference code TED2021-131115A-I00). Our project was funded with 129.835,00€, to be carried out from December 1, 2022 to November 30, 2023.
Project Goals
The main goal of this research project is the design and development of an analysis system that collects relevant Indicators of Compromise (IoC) from malware, obtained through memory forensic analysis, and provides them through a Web information system. This software system will be made up of different components that will be grouped into a single analysis workflow, feeding a database that will be consulted by third-parties through a REST API, allowing them to locate the presence of malicious artifacts in live systems very quickly, responding to potential security incidents as soon as possible. The entire software system will be designed in a modular way to allow its incremental development, also allowing the modules to be distributed in different nodes. The purpose of working in a modular and distributed way is to build independent tools adapted to specific needs that are applicable to other analysis workflows, and not only in the context of this project. The following subgoals (SG) are formulated:
- SG1: Extraction of IoC using memory forensics. This subgoal is related to the main objective of the project. To detect malware using memory forensics, we must first have a way to launch multiple runs of malware samples, extract their binary code (for instance, dump the entire address space of the process associated with the malware sample), and analyze it in an appropriate way. This binary code will be statically analyzed to locate specific malicious behaviors. In this regard, we will work with a page granularity (4096 bytes), which is the granularity used by the Windows memory subsystem. Similarity digests will be also computed per page and stored as a way to identify similar binary code. Our idea is to incorporate our tool Windows Memory Extractor directly into a virtual machine as a Python agent, and interact with it programatically from the host machine to dump a given process in execution. The malicious behaviors will be formalized and abstracted (as much as possible and using program binary analysis techniques such as symbolic execution) to facilitate further future searches. The state of the art of the techniques related to malware analysis and memory forensics will also be studied to contemplate other works that can complement our extraction of IoC, integrating them in the entire analysis workflow. The component developed as part of this subgoal will be also decoupled of the other components of the server to be distributed as a kind of EDR, interacting with our Web information system.
- SG2: Extraction of other IoC using common malware analysis tools. This subgoal complements the previous subgoal. Here, the state of the art of the tools related to malware analysis will be studied and considered to be part of our analysis system. The application of common static analysis (strings, properties of the Windows PE header, etc.) and dynamic analysis (interaction with the OS and with the Internet while the malware sample runs) will allow us to have a bigger picture of the malicious purpose of the malware samples.
- SG3: Development of the pipelined analysis and the Web information systems. This subgoal is directly related to the main objective of the project. To obtain the final software system, we must first clearly define the workflow of the analysis system and the design of the database that will be populated with the IoC extracted after the analysis. Finally, we will also develop a Web information system that provides secure endpoints such that external users can ask our system using our own client component or using theirs.
- SG4: Optimization of similarity digest comparisons and IoC database. This subgoal is indirectly related to the main objective of the project. The Web information system must be as quick as possible when providing information about similarity digests related to malware. However, the database will be very large as we plan to work with granularity of a page (4096 bytes). In addition, data structures must facilitate to find similar binary code as quick as possible. We will investigate different clustering techniques for similarity digests (such as UPGMA or its weighted variant WPGMA) and the best configuration for our purposes, as well as to apply optimization techniques in the database to make the web system faster.