SECURE: Benchmarking Large Language Models for Cybersecurity
Dipkamal Bhusal, Md Tanvirul Alam, Le Nguyen, Ashim Mahara, Zachary Lightcap, Rodney Frazier, Romy Fieblinger, Grace Long Torales, Benjamin A. Blakely, and Nidhi Rastogi
Proceedings of the Annual Computer Security Applications Conference, 2024
SECURE introduces a broad cybersecurity benchmark spanning knowledge extraction, understanding, and reasoning tasks, and shows that current large language models remain inconsistent on applied cyber problems.