Reinforcement Q-Learning-Based Adaptive Encryption Model for Cyberthreat Mitigation in Wireless Sensor Networks

Premakumari, Sreeja Balachandran Nair; Sundaram, Gopikrishnan; Rivera, Marco; Wheeler, Patrick; Guzmán, Ricardo E.Pérez

doi:10.3390/s25072056

Reinforcement Q-Learning-Based Adaptive Encryption Model for Cyberthreat Mitigation in Wireless Sensor Networks

Premakumari, Sreeja Balachandran Nair; Sundaram, Gopikrishnan; Rivera, Marco; Wheeler, Patrick; Guzmán, Ricardo E.Pérez

Authors

Sreeja Balachandran Nair Premakumari

Gopikrishnan Sundaram

Professor MARCO RIVERA MARCO.RIVERA@NOTTINGHAM.AC.UK
PROFESSOR

Professor PATRICK WHEELER pat.wheeler@nottingham.ac.uk
PROFESSOR OF POWER ELECTRONIC SYSTEMS

Ricardo E.Pérez Guzmán

Abstract

The increasing prevalence of cyber threats in wireless sensor networks (WSNs) necessitates adaptive and efficient security mechanisms to ensure robust data transmission while addressing resource constraints. This paper proposes a reinforcement learning-based adaptive encryption framework that dynamically scales encryption levels based on real-time network conditions and threat classification. The proposed model leverages a deep learning-based anomaly detection system to classify network states into low, moderate, or high threat levels, which guides encryption policy selection. The framework integrates dynamic Q-learning for optimizing energy efficiency in low-threat conditions and double Q-learning for robust security adaptation in high-threat environments. A Hybrid Policy Derivation Algorithm is introduced to balance encryption complexity and computational overhead by dynamically switching between these learning models. The proposed system is formulated as a Markov Decision Process (MDP), where encryption level selection is driven by a reward function that optimizes the trade-off between energy efficiency and security robustness. The adaptive learning strategy employs an 𝜖-greedy exploration-exploitation mechanism with an exponential decay rate to enhance convergence in dynamic WSN environments. The model also incorporates a dynamic hyperparameter tuning mechanism that optimally adjusts learning rates and exploration parameters based on real-time network feedback. Experimental evaluations conducted in a simulated WSN environment demonstrate the effectiveness of the proposed framework, achieving a 30.5% reduction in energy consumption, a 92.5% packet delivery ratio (PDR), and a 94% mitigation efficiency against multiple cyberattack scenarios, including DDoS, black-hole, and data injection attacks. Additionally, the framework reduces latency by 37% compared to conventional encryption techniques, ensuring minimal communication delays. These results highlight the scalability and adaptability of reinforcement learning-driven adaptive encryption in resource-constrained networks, paving the way for real-world deployment in next-generation IoT and WSN applications.

Citation

Premakumari, S. B. N., Sundaram, G., Rivera, M., Wheeler, P., & Guzmán, R. E. (2025). Reinforcement Q-Learning-Based Adaptive Encryption Model for Cyberthreat Mitigation in Wireless Sensor Networks. Sensors, 25(7), Article 2056. https://doi.org/10.3390/s25072056

Journal Article Type	Article
Acceptance Date	Mar 18, 2025
Online Publication Date	Mar 26, 2025
Publication Date	Mar 26, 2025
Deposit Date	Apr 25, 2025
Publicly Available Date	Apr 29, 2025
Journal	Sensors
Print ISSN	1424-8220
Electronic ISSN	1424-8220
Publisher	MDPI
Peer Reviewed	Peer Reviewed
Volume	25
Issue	7
Article Number	2056
DOI	https://doi.org/10.3390/s25072056
Public URL	https://nottingham-repository.worktribe.com/output/47260812
Publisher URL	https://www.mdpi.com/1424-8220/25/7/2056