Mlot, Esteban Damián Gutiérrez; Saldana, Jose; Rodríguez, Ricardo J.; Kotsiuba, Igor; Gañan, Carlos H.
A dataset to train intrusion detection systems based on machine learning models for electrical substations Journal Article
In: Data in Brief, vol. 57, pp. 111153, 2024, ISSN: 2352-3409.
Abstract | Links | BibTeX | Tags: critical infrastructure, cybersecurity, IEC104, IEC60870-5-104, IEC61850, testbed
@article{MlotSRKG-DIB-24,
title = {A dataset to train intrusion detection systems based on machine learning models for electrical substations},
author = {Esteban Damián Gutiérrez Mlot and Jose Saldana and Ricardo J. Rodríguez and Igor Kotsiuba and Carlos H. Gañan},
url = {https://webdiis.unizar.es/~ricardo/files/papers/GutierrezMlotSRKG-DIB-24.pdf},
doi = {10.1016/j.dib.2024.111153},
issn = {2352-3409},
year = {2024},
date = {2024-12-01},
journal = {Data in Brief},
volume = {57},
pages = {111153},
abstract = {The growing integration of Information and Communication Technology into Operational Technology environments in electrical substations exposes them to new cybersecurity threats. This paper presents a comprehensive dataset of substation traffic, aimed at improving the training and benchmarking of Intrusion Detection Systems (IDS) installed in these facilities that are based on machine learning techniques. The dataset includes raw network captures and flows from real substations, filtered and anonymized to ensure privacy. It covers the main protocols and standards used in substation environments: IEC61850, IEC104, NTP, and PTP. Additionally, the dataset includes traces obtained during several cyberattacks, which were simulated in a controlled laboratory environment, providing a rich resource for developing and testing machine learning models for cybersecurity applications in substations. A set of complementary tools for dataset creation and preprocessing are also included to standardize the methodology, ensuring consistency and reproducibility. In summary, the dataset addresses the critical need for high-quality, targeted data for tuning IDS at electrical substations and contributes to the advancement of secure and reliable power distribution networks.},
keywords = {critical infrastructure, cybersecurity, IEC104, IEC60870-5-104, IEC61850, testbed},
pubstate = {published},
tppubtype = {article}
}
The growing integration of Information and Communication Technology into Operational Technology environments in electrical substations exposes them to new cybersecurity threats. This paper presents a comprehensive dataset of substation traffic, aimed at improving the training and benchmarking of Intrusion Detection Systems (IDS) installed in these facilities that are based on machine learning techniques. The dataset includes raw network captures and flows from real substations, filtered and anonymized to ensure privacy. It covers the main protocols and standards used in substation environments: IEC61850, IEC104, NTP, and PTP. Additionally, the dataset includes traces obtained during several cyberattacks, which were simulated in a controlled laboratory environment, providing a rich resource for developing and testing machine learning models for cybersecurity applications in substations. A set of complementary tools for dataset creation and preprocessing are also included to standardize the methodology, ensuring consistency and reproducibility. In summary, the dataset addresses the critical need for high-quality, targeted data for tuning IDS at electrical substations and contributes to the advancement of secure and reliable power distribution networks.