{"id":796,"date":"2024-11-21T14:39:49","date_gmt":"2024-11-21T14:39:49","guid":{"rendered":"https:\/\/reversea.me\/?p=796"},"modified":"2025-05-21T09:26:04","modified_gmt":"2025-05-21T09:26:04","slug":"identifying-runtime-libraries-in-statically-linked-binaries-with-mantilla","status":"publish","type":"post","link":"https:\/\/reversea.me\/index.php\/identifying-runtime-libraries-in-statically-linked-binaries-with-mantilla\/","title":{"rendered":"Identifying Runtime Libraries in Statically Linked Binaries with MANTILLA"},"content":{"rendered":"<span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\"> 5<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span>\n<p><strong>TL;DR<\/strong>: Statically linked binaries can include vulnerabilities if not updated with the latest versions of libraries. Similarly, embedding libraries within the binary reduces dependency on the environment while running the binary. This makes identifying linked libraries in malware binaries essential for effective analysis. To help in this process, we present <a href=\"https:\/\/github.com\/reverseame\/MANTILLA\">MANTILLA<\/a>, a tool designed to identify runtime libraries in statically linked Linux binaries using static analysis and machine learning. MANTILLA extracts architecture-independent features from the binaries and uses a K-Nearest Neighbors (KNN) model to determine which libraries are linked. In <a href=\"https:\/\/www.noconname.org\/\">our talk at NoConName 2024 (on Nov 19, 2024)<\/a>, we will share deeper insights into how MANTILLA works, its architecture-agnostic features, and how it can be used for both malware analysis and vulnerability detection. Our evaluation shows high accuracy across multiple architectures, demonstrating the value of MANTILLA in both malware analysis and vulnerability detection. If this post leaves you wanting more and you want to delve deeper into our research, we recommend reading our recently published scientific article (<a href=\"https:\/\/doi.org\/10.1016\/j.future.2024.107602  Y\">here<\/a>).<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>For an attacker, a vulnerable and unpatched application is an irresistible target. Vulnerabilities often persist in applications due to outdated third-party dependencies, especially when binaries are statically linked. <em>Static linking<\/em> makes binaries self-contained and portable, but it also complicates updating libraries. Similarly, it makes reverse engineering more challenging. Interestingly, these features are precisely why malware authors prefer static linking, as it ensures compatibility across target platforms and adds complexity to analysis efforts.<\/p>\n\n\n\n<p>To help address these challenges, we developed <a href=\"https:\/\/github.com\/reverseame\/MANTILLA\">MANTILLA<\/a>, a tool to identify runtime libraries in statically linked Linux binaries. This identification helps filter out library functions, allowing analysts to focus on the core behavior of the malware and detect vulnerabilities in outdated libraries.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What is MANTILLA?<\/h2>\n\n\n\n<p>MANTILLA, a<br>system for <em>runtiMe librAries ideNtification in sTatIcally-Linked Linux binAries<\/em>, is specially designed to automatically identify runtime libraries within a binary using static analysis and KNN classification. Figure 1 shows a high-level overview of MANTILLA.<br>It is based on the <a href=\"https:\/\/rada.re\/n\/\">radare2<\/a> reverse engineering framework to extract a variety of features that are independent of the binary&#8217;s architecture, such as cyclomatic complexity, instruction count, and entropy.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"343\" src=\"https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/overview-1024x343.png\" alt=\"\" class=\"wp-image-797\" srcset=\"https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/overview-1024x343.png 1024w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/overview-300x101.png 300w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/overview-768x257.png 768w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/overview-1536x515.png 1536w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/overview-2048x687.png 2048w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/overview-1440x483.png 1440w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 1: High-level system overview of MANTILLA<\/em>.<\/figcaption><\/figure><\/div>\n\n\n<p>The system then uses these features to classify the binary through a supervised machine learning model. Specifically, MANTILLA uses K-Nearest Neighbors (KNN) to predict the runtime library linked in the binary, with final decisions made using a majority voting system across all functions in the binary.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How does it work?<\/h2>\n\n\n\n<p>MANTILLA operates in two phases:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Feature extraction<\/strong>: We extract features from each function in a given binary. These features include metrics such as cyclomatic complexity, number of basic blocks, function size, entropy, and more. Importantly, the features are chosen to be architecture-independent, allowing MANTILLA to work on different CPU architectures.<\/li>\n\n\n\n<li><strong>Prediction<\/strong>: Using the extracted features, we apply a KNN model to predict the runtime library for each function. A majority voting mechanism is used to determine the final prediction for the entire binary, ensuring robust classification even when individual functions may have ambiguous results.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Evaluation and results<\/h2>\n\n\n\n<p>We evaluated MANTILLA on a dataset of binaries built for different architectures: <code>MIPSeb<\/code>, <code>ARMel<\/code>, <code>Intel x86<\/code>, and <code>Intel x86-64<\/code>. These binaries were linked with different runtime libraries: <code>uClibc<\/code>, <code>glibc<\/code>, and <code>musl<\/code>. Additionally, we tested MANTILLA on real-world binaries, including IoT malware samples. In all tests, MANTILLA achieved very high accuracy, with results of over 95% in runtime library identification and almost 100% in architecture identification.<\/p>\n\n\n\n<p>We also evaluate the performance of MANTILLA using K-fold cross-validation on this dataset of statically linked binaries, after removing the symbols (i.e., they are stripped). Specifically, we examined how well MANTILLA can identify runtime libraries in binaries compiled with different architectures and libraries. In particular, we focused on the KNN classification model, tuning key parameters to optimize performance.<\/p>\n\n\n\n<p>We first computed distances to the K nearest neighbors (KNN) using the Euclidean distance metric. To fine-tune the model, we set a distance threshold (\ud835\udc51) and tried various settings for the number of neighbors <em>K<\/em>. We tried value of <em>K = {1, \u2026, 5}<\/em> and <em>d = {1, \u2026 , 7}<\/em> to explore trade-offs between the number of neighbors and the threshold distance. The results, shown in Figure 2, reveal a clear trend: the system performs better when more neighbors are considered and a lower distance threshold is applied. As the threshold is increased, the model starts classifying unrelated features as part of the same runtime library, negatively impacting overall performance.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"689\" src=\"https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/results_stripped_binaries-1-1024x689.png\" alt=\"\" class=\"wp-image-802\" srcset=\"https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/results_stripped_binaries-1-1024x689.png 1024w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/results_stripped_binaries-1-300x202.png 300w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/results_stripped_binaries-1-768x517.png 768w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/results_stripped_binaries-1-1536x1034.png 1536w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/results_stripped_binaries-1-2048x1378.png 2048w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/results_stripped_binaries-1-1440x969.png 1440w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 2: Evaluation metrics for the KNN prediction model considering K = {1, \u2026 , 5} and distance thresholds d = {1, \u2026 , 7}, on stripped binaries<\/em><\/figcaption><\/figure><\/div>\n\n\n<p>Based on these findings, we conclude that a configuration with <em>K &gt; 1<\/em> and a low distance threshold provides the best results. In particular, the optimal configuration achieved a 100% hit rate, thanks to the majority voting rule, and the best performance was observed for <em>K = 5<\/em>. This configuration maintained high accuracy even with a more relaxed threshold value for the distance metric, ensuring that MANTILLA could consistently predict the correct runtime library across all test cases.<\/p>\n\n\n\n<p>Furthermore, we applied MANTILLA to a dataset containing thousands of Linux-based IoT malware samples. The results showed that the majority of these malware binaries were linked against <code>uClibc<\/code>, a lightweight C library often used in embedded systems. As shown, MANTILLA analysis helped confirm runtime library usage trends across the IoT malware landscape. More experimental results and dicussion are given <a href=\"https:\/\/doi.org\/10.1016\/j.future.2024.107602\" data-type=\"link\" data-id=\"https:\/\/doi.org\/10.1016\/j.future.2024.107602\">in our paper<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Real-World Impact<\/h2>\n\n\n\n<p>MANTILLA can help malware and forensic analysts understand the libraries used within a binary, filter out library functions, and focus efforts on analyzing malware-specific code. This is especially useful in malware reverse engineering, where distinguishing between benign and malicious functionality can be extremely difficult due to static linking.<\/p>\n\n\n\n<p>Additionally, identifying the runtime library is critical for detecting vulnerabilities. If a binary contains an outdated version of a library, it may be susceptible to known attacks. By identifying the specific runtime library, MANTILLA helps assess whether a statically linked binary is at risk.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Future Work<\/h2>\n\n\n\n<p>Although MANTILLA provides good accuracy, further improvements are possible. We plan to extend the system to identify runtime libraries on other operating systems and support additional architectures, such as <code>PowerPC<\/code> or <code>SPARC<\/code>. Additionally, we will explore the possibility of providing MANTILLA as a software-as-a-service.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Funding Acknowledgment<\/strong>s<\/h2>\n\n\n\n<p>This research was supported in part by grant TED2021-131115A-I00 (MIMFA),<br>funded by MICIU\/AEI\/10.13039\/501100011033 and by the European Union NextGenerationEU\/PRTR, by grant <em>Proyecto Estrat\u00e9gico Ciberseguridad EINA UNIZAR<\/em>, funded by the Spanish National Cybersecurity Institute (INCIBE) and the European Union NextGenerationEU\/PRTR, by grant <em>Programa de Proyectos Estrat\u00e9gicos de Grupos de Investigaci\u00f3n<\/em> (DisCo research group, refs. T21-23R), funded by the University, Industry and Innovation Department of the Aragonese Government.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"90\" src=\"https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/00_BandaLogos_INCIBE_es-100-1024x90.jpg\" alt=\"\" class=\"wp-image-829\" srcset=\"https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/00_BandaLogos_INCIBE_es-100-1024x90.jpg 1024w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/00_BandaLogos_INCIBE_es-100-300x26.jpg 300w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/00_BandaLogos_INCIBE_es-100-768x67.jpg 768w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/00_BandaLogos_INCIBE_es-100-1536x135.jpg 1536w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/00_BandaLogos_INCIBE_es-100-2048x179.jpg 2048w, https:\/\/reversea.me\/wp-content\/uploads\/2024\/11\/00_BandaLogos_INCIBE_es-100-1440x126.jpg 1440w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><em>And that\u2019s all, guys &amp; gals! In this blog post, we have summarized our paper on MANTILLA, a system for identifying runtime libraries in statically linked Linux binaries. The paper provides a detailed overview of the system&#8217;s features, methodology, and evaluation. We hope that MANTILLA can serve as a useful tool for researchers and analysts working in the fields of binary analysis and malware forensics. Feel free to explore the tool, check out the source code on GitHub, and contribute to the ongoing efforts to improve static binary analysis. Thanks for reading!<\/em><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h4 class=\"wp-block-heading\">Declaration of Generative AI Technologies in the Writing Process<\/h4>\n\n\n\n<p>During the preparation of this post, the author used ChatGPT (GPT4-o model) to improve readability and language. After using this tool, the author reviewed and edited the content as necessary and takes full responsibility for the content of this publication.<\/p>\n","protected":false},"excerpt":{"rendered":"<p><span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\"> 5<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span>TL;DR: Statically linked binaries can include vulnerabilities if not updated with the latest versions of libraries. Similarly, embedding libraries within the binary reduces dependency on the environment while running the binary. This makes identifying linked libraries in malware binaries essential for effective analysis. To help in this process, we present MANTILLA, a tool designed to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17,27,15],"tags":[28,46,47],"class_list":["post-796","post","type-post","status-publish","format-standard","hentry","category-malware","category-reverse-engineering","category-tools","tag-malware","tag-program-binary-analysis","tag-static-analysis","no-featured-image"],"_links":{"self":[{"href":"https:\/\/reversea.me\/index.php\/wp-json\/wp\/v2\/posts\/796","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/reversea.me\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/reversea.me\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/reversea.me\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/reversea.me\/index.php\/wp-json\/wp\/v2\/comments?post=796"}],"version-history":[{"count":10,"href":"https:\/\/reversea.me\/index.php\/wp-json\/wp\/v2\/posts\/796\/revisions"}],"predecessor-version":[{"id":830,"href":"https:\/\/reversea.me\/index.php\/wp-json\/wp\/v2\/posts\/796\/revisions\/830"}],"wp:attachment":[{"href":"https:\/\/reversea.me\/index.php\/wp-json\/wp\/v2\/media?parent=796"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/reversea.me\/index.php\/wp-json\/wp\/v2\/categories?post=796"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/reversea.me\/index.php\/wp-json\/wp\/v2\/tags?post=796"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}