Explainable Deep Reinforcement Learning for Autonomous Cyber Defense Systems

Journal of Advanced Engineering Technology and Management

ISSN (Online): 3049-3684  

Volume: 1 Issue: 1 | Open Access | 09 April 2025

Explainable Deep Reinforcement Learning for Autonomous Cyber Defense Systems

Sreelatha Madhuri, Student, Sarvajanik College of Engineering & Technology

Abstract

Autonomous cyber defense systems are increasingly required to respond to sophisticated and rapidly evolving cyber threats. Deep Reinforcement Learning (DRL) has emerged as a promising approach for dynamic threat mitigation due to its ability to learn optimal defense policies through interaction with complex environments. However, the black-box nature of DRL models limits their adoption in high-stakes cybersecurity applications where interpretability, trust, and accountability are essential. This paper proposes an Explainable Deep Reinforcement Learning (XDRL) framework designed specifically for autonomous cyber defense systems. The framework integrates a Deep Q-Network (DQN) with a post-hoc explanation module combining SHAP-based feature attribution and policy summarization through decision-tree distillation. Experimental evaluation in a simulated enterprise network environment demonstrates that the proposed model achieves a 23% higher threat mitigation rate compared to baseline rule-based systems while providing human-interpretable explanations for over 92% of defense actions. The results indicate that explainability can be integrated into DRL-based cyber defense without significantly degrading performance.

Keywords

Deep Reinforcement Learning, Explainable AI, Cybersecurity, Autonomous Defense Systems, DQN, Intrusion Response

References

[1] V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.

[2] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA, USA: MIT Press, 2018.

[3] H. V. Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double Q-learning,” in Proc. AAAI, 2016, pp. 2094–2100.

[4] T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” in Proc. ICLR, 2016.

[5] Z. Wang et al., “Dueling network architectures for deep reinforcement learning,” in Proc. ICML, 2016, pp. 1995–2003.

[6] M. Hessel et al., “Rainbow: Combining improvements in deep reinforcement learning,” in Proc. AAAI, 2018, pp. 3215–3222.

[7] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 4765–4774.

[8] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you? Explaining the predictions of any classifier,” in Proc. ACM SIGKDD, 2016, pp. 1135–1144.

[9] C. Molnar, Interpretable Machine Learning. Lulu.com, 2022.

[10] F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” arXiv:1702.08608, 2017.

[11] D. Arrieta et al., “Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI,” Information Fusion, vol. 58, pp. 82–115, 2020.

[12] Y. He and X. Zhu, “Deep reinforcement learning for cyber security intrusion detection,” in Proc. IEEE Int. Conf. Data Mining Workshops (ICDMW), 2017, pp. 953–960.

[13] J. Nguyen and R. Reddi, “Deep reinforcement learning for cyber security,” IEEE Security & Privacy Workshops (SPW), 2019, pp. 183–188.

[14] M. Lopez-Martin, B. Carro, and A. Sanchez-Esguevillas, “Application of deep reinforcement learning to intrusion detection for supervised problems,” Expert Systems with Applications, vol. 141, 2020.

[15] N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems,” in Proc. Military Communications and Information Systems Conference (MilCIS), 2015, pp. 1–6.

[16] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.

[17] P. Anderson et al., “Deep reinforcement learning for autonomous network defense,” in Proc. IEEE Conf. Communications and Network Security (CNS), 2018.

[18] K. Amin, J. Louppe, and Y. Huang, “Learning explainable defense strategies against adversarial attacks,” in Proc. IEEE European Symposium on Security and Privacy (EuroS&P), 2020.

[19] T. Miller, “Explanation in artificial intelligence: Insights from the social sciences,” Artificial Intelligence, vol. 267, pp. 1–38, 2019.

[20] A. Gunning and D. Aha, “DARPA’s explainable artificial intelligence (XAI) program,” AI Magazine, vol. 40, no. 2, pp. 44–58, 2019.


Submit your article for peer review and publication. You can email your paper to info@iqrjournals.com, or editor@iqrjournals.com. You can expect to get an instant reply from the team. IQR Journals take 5 working days for first decision, 10 days for review process and 5 days for publication (upon acceptance of your article).