Hamza Abdi
Automatically Labeling Cyber Threat Intelligence reports using Natural Language Processing
Abdi, Hamza; Bagley, Steven R.; Furnell, Steven; Twycross, Jamie
Authors
Dr STEVEN BAGLEY steven.bagley@nottingham.ac.uk
ASSISTANT PROFESSOR
Professor STEVEN FURNELL STEVEN.FURNELL@NOTTINGHAM.AC.UK
PROFESSOR OF CYBER SECURITY
Dr JAMIE TWYCROSS JAMIE.TWYCROSS@NOTTINGHAM.AC.UK
ASSOCIATE PROFESSOR
Abstract
Attribution provides valuable intelligence in the face of Advanced Persistent Threat (APT) attacks. By accurately identifying the culprits and actors behind the attacks, we can gain more insights into their motivations, capabilities, and potential future targets. Cyber Threat Intelligence (CTI) reports are relied upon to attribute these attacks effectively. These reports are compiled by security experts and provide valuable information about threat actors and their attacks.
We are interested in building a fully automated APT attribution framework. An essential step in doing so is the automated processing and extraction of information from CTI reports. However, CTI reports are largely unstructured, making extraction and analysis of the information a difficult task.
To begin this work, we introduce a method for automatically highlighting a CTI report with the main threat actor attributed within the report. This is done using a custom Natural Language Processing (NLP) model based on the spaCy library. Also, the study showcases and highlights the performance and effectiveness of various pdf-to-text Python libraries that were used in this work. Additionally, to evaluate the effectiveness of our model, we experimented on a dataset consisting of 605 English documents, which were randomly collected from various sources on the internet and manually labeled. Our method achieved an accuracy of 97%. Finally, we discuss the challenges associated with processing these documents automatically and propose some methods for tackling them.
Citation
Abdi, H., Bagley, S. R., Furnell, S., & Twycross, J. (2023, August). Automatically Labeling Cyber Threat Intelligence reports using Natural Language Processing. Presented at DocEng 2023 - Proceedings of the 2023 ACM Symposium on Document Engineering, Limerick, Ireland
Presentation Conference Type | Edited Proceedings |
---|---|
Conference Name | DocEng 2023 - Proceedings of the 2023 ACM Symposium on Document Engineering |
Start Date | Aug 22, 2023 |
End Date | Aug 25, 2023 |
Acceptance Date | Jul 1, 2023 |
Online Publication Date | Aug 22, 2023 |
Publication Date | Aug 22, 2023 |
Deposit Date | Aug 31, 2023 |
Publicly Available Date | Sep 5, 2023 |
Publisher | Association for Computing Machinery (ACM) |
Book Title | DocEng ’23 : Proceedings of the 2023 ACM Symposium on Document Engineering |
ISBN | 9798400700279 |
DOI | https://doi.org/10.1145/3573128.3609348 |
Public URL | https://nottingham-repository.worktribe.com/output/24148195 |
Publisher URL | https://dl.acm.org/doi/10.1145/3573128.3609348 |
Files
Automatically Labeling Cyber Threat Intelligence reports
(373 Kb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/
You might also like
Pre-Signature Scheme for Trustworthy Offline V2V Communication
(2023)
Presentation / Conference Contribution
Evaluation of Contextual and Game-Based Training for Phishing Detection
(2022)
Journal Article
Accessible authentication: Assessing the applicability for users with disabilities
(2021)
Journal Article
Developing a cyber security culture: Current practices and future needs
(2021)
Journal Article
An empirical analysis of the information security culture key factors framework
(2021)
Journal Article
Downloadable Citations
About Repository@Nottingham
Administrator e-mail: discovery-access-systems@nottingham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search