Carrillo-Mondéjar, Javier; Rodríguez, Ricardo J.
Identifying Runtime Libraries in Statically Linked Linux Binaries Journal Article
In: Future Generation Computer Systems, vol. 164, pp. 107602, 2025, ISSN: 0167-739X.
Abstract | Links | BibTeX | Tags: Binary code analysis, IoT, malware, Runtime library identification, Statically linked binaries
@article{CarrilloR-FGCS-25,
title = {Identifying Runtime Libraries in Statically Linked Linux Binaries},
author = {Javier Carrillo-Mondéjar and Ricardo J. Rodríguez},
url = {http://webdiis.unizar.es/~ricardo/files/papers/CarrilloR-FGCS-25.pdf},
doi = {10.1016/j.future.2024.107602},
issn = {0167-739X},
year = {2025},
date = {2025-01-01},
journal = {Future Generation Computer Systems},
volume = {164},
pages = {107602},
abstract = {Vulnerabilities in unpatched applications can originate from third-party dependencies in statically linked applications, as they must be relinked each time to take advantage of libraries that have been updated to fix any vulnerability. Despite this, malware binaries are often statically linked to ensure they run on target platforms and to complicate malware analysis. In this sense, identification of libraries in malware analysis becomes crucial to help filter out those library functions and focus on malware function analysis. In this paper, we introduce tt MANTILLA, a system for identifying runtime libraries in statically linked Linux-based binaries. Our system is based on radare2 to identify functions and extract their features (independent of the underlying architecture of the binary) through static binary analysis and on the K-nearest neighbors supervised machine learning model and a majority rule to predict final values. tt MANTILLA is evaluated on a dataset consisting of binaries built for different architectures (tt MIPSeb, tt ARMel, tt Intel x86, and tt Intel x86-64) and different runtime libraries (tt uClibc, tt glibc, and tt musl), achieving very high accuracy. We also evaluate it in two case studies. First, using a dataset of binary files belonging to the tt binutils collection and second, using an IoT malware dataset. In both cases, good accuracy results are obtained both in terms of runtime library detection ($94.4%$ and $95.5%$, respectively) and architecture identification ($100%$ and $98.6%$, respectively).},
keywords = {Binary code analysis, IoT, malware, Runtime library identification, Statically linked binaries},
pubstate = {published},
tppubtype = {article}
}
Vulnerabilities in unpatched applications can originate from third-party dependencies in statically linked applications, as they must be relinked each time to take advantage of libraries that have been updated to fix any vulnerability. Despite this, malware binaries are often statically linked to ensure they run on target platforms and to complicate malware analysis. In this sense, identification of libraries in malware analysis becomes crucial to help filter out those library functions and focus on malware function analysis. In this paper, we introduce tt MANTILLA, a system for identifying runtime libraries in statically linked Linux-based binaries. Our system is based on radare2 to identify functions and extract their features (independent of the underlying architecture of the binary) through static binary analysis and on the K-nearest neighbors supervised machine learning model and a majority rule to predict final values. tt MANTILLA is evaluated on a dataset consisting of binaries built for different architectures (tt MIPSeb, tt ARMel, tt Intel x86, and tt Intel x86-64) and different runtime libraries (tt uClibc, tt glibc, and tt musl), achieving very high accuracy. We also evaluate it in two case studies. First, using a dataset of binary files belonging to the tt binutils collection and second, using an IoT malware dataset. In both cases, good accuracy results are obtained both in terms of runtime library detection ($94.4%$ and $95.5%$, respectively) and architecture identification ($100%$ and $98.6%$, respectively).