Publications | Md Tanvirul Alam

Selected Publications

NeurIPS 2024
Spotlight

CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Md Tanvirul Alam, Dipkamal Bhusal, Le Nguyen, and Nidhi Rastogi

In Advances in Neural Information Processing Systems, 2024

Datasets and Benchmarks Track

Abstract arXiv Code

CTIBench introduces a task-focused benchmark for cyber threat intelligence that probes factual recall, applied reasoning, and robustness, showing that even strong language models remain unreliable on practical CTI workflows.
CVPR 2026
Findings

SPHINX: A Synthetic Environment for Visual Perception and Reasoning

Md Tanvirul Alam, Saksham Aggarwal, Justin Yang Chae, and Nidhi Rastogi

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026

Abstract arXiv Code

SPHINX builds a synthetic visual reasoning environment with procedurally generated puzzles and verifiable answers, enabling precise evaluation of multimodal reasoning and showing that RLVR can improve large vision-language models on these tasks.
arXiv 2026

Minerva: Reinforcement Learning with Verifiable Rewards for Cyber Threat Intelligence LLMs

Md Tanvirul Alam, Aritran Piplai, Ionut Cardei, Nidhi Rastogi, and Peter J. Worth Jr

arXiv preprint, 2026

Abstract arXiv

Minerva studies RL with verifiable rewards for cyber threat intelligence tasks, using structured CTI outputs and deterministic verifiers to improve model accuracy and robustness beyond supervised fine-tuning.
RAID 2023

Looking Beyond IoCs: Automatically Extracting Attack Patterns from External CTI

Md Tanvirul Alam, Dipkamal Bhusal, Youngja Park, and Nidhi Rastogi

In International Symposium on Research in Attacks, Intrusions and Defenses, 2023

Abstract arXiv

Looking Beyond IoCs introduces LADDER, a framework for extracting text-based attack patterns from CTI reports and mapping them to MITRE ATT&CK so analysts can reason about evolving threats beyond brittle indicators.

Other Publications

2025

WAITI 2025

AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Md Tanvirul Alam, Dipkamal Bhusal, Salman Ahmad, Nidhi Rastogi, and Peter Worth

WAITI Workshop, 2025

Abstract arXiv Code

AthenaBench extends CTIBench with a stronger data pipeline, refined metrics, and a new risk-mitigation task, revealing that current frontier models still struggle on reasoning-heavy CTI problems.
MATH-AI 2025

Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

Md Tanvirul Alam and Nidhi Rastogi

MATH-AI Workshop, 2025

Abstract arXiv Code

This paper tests RLVR on combinatorial reasoning problems with fully verifiable solutions and finds that better scores often come from exploiting shortcuts rather than learning genuinely transferable reasoning strategies.
ACSAC 2025

R+R: Revisiting Static Feature-Based Android Malware Detection using Machine Learning

Md Tanvirul Alam, Dipkamal Bhusal, and Nidhi Rastogi

Proceedings of the Annual Computer Security Applications Conference, 2025

Abstract arXiv

R+R revisits static feature-based Android malware detection with modern experimental controls, showing where traditional machine learning pipelines still work well and where they fail under realistic distribution shifts.
RAID 2025

ADAPT: A Pseudo-labeling Approach to Combat Concept Drift in Malware Detection

Md Tanvirul Alam, Aritran Piplai, and Nidhi Rastogi

International Symposium on Research in Attacks, Intrusions and Defenses, 2025

Abstract arXiv

ADAPT tackles concept drift in malware detection by using pseudo-labeling to refresh models on shifting data distributions, improving robustness when labeled updates are scarce.
MATH-AI 2025

Towards Understanding Self-play for LLM Reasoning

Justin Yang Chae, Md Tanvirul Alam, and Nidhi Rastogi

MATH-AI Workshop, 2025

Abstract arXiv

This work studies how self-play improves language-model reasoning by analyzing the problems generated during training and the mechanisms that help reasoning ability emerge without external supervision.

2024

ACSAC 2024

SECURE: Benchmarking Large Language Models for Cybersecurity

Dipkamal Bhusal, Md Tanvirul Alam, Le Nguyen, Ashim Mahara, Zachary Lightcap, Rodney Frazier, Romy Fieblinger, Grace Long Torales, Benjamin A. Blakely, and Nidhi Rastogi

Proceedings of the Annual Computer Security Applications Conference, 2024

Abstract arXiv

SECURE introduces a broad cybersecurity benchmark spanning knowledge extraction, understanding, and reasoning tasks, and shows that current large language models remain inconsistent on applied cyber problems.
EuroS&PW 2024

Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models

Romy Fieblinger, Md Tanvirul Alam, and Nidhi Rastogi

In IEEE European Symposium on Security and Privacy Workshops, 2024

Abstract arXiv

This paper combines open-source language models with knowledge-graph construction to extract actionable cyber threat intelligence from unstructured reports and study how well structured CTI can be built at scale.
EuroS&P 2024

PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction and Attribution Sensitivity Analysis

Dipkamal Bhusal, Md Tanvirul Alam, Monish Kumar Manikya Veerabhadran, Michael Clifford, Sara Rampazzi, and Nidhi Rastogi

IEEE European Symposium on Security and Privacy, 2024

Abstract arXiv

PASA detects adversarial examples without attack-specific tuning by measuring how unstable both predictions and feature attributions become under noise, using thresholds learned only from benign data.

2022

arXiv 2022

CyNER: A Python Library for Cybersecurity Named Entity Recognition

Md Tanvirul Alam, Dipkamal Bhusal, Youngja Park, and Nidhi Rastogi

arXiv preprint, 2022

Abstract arXiv Code

CyNER packages transformer models, heuristic extractors, and generic NER components into an open-source library for extracting cybersecurity entities and indicators of compromise from heterogeneous threat reports.

2020

arXiv 2020

Bangla Text Classification using Transformers

Tanvirul Alam, Akib Khan, and Firoj Alam

arXiv preprint, 2020

Abstract arXiv Code

This paper fine-tunes multilingual transformer models for Bangla sentiment analysis, emotion detection, news categorization, and authorship attribution, reporting state-of-the-art gains across six benchmark datasets.
W-NUT 2020

Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

Tanvirul Alam, Akib Khan, and Firoj Alam

In Proceedings of the 6th Workshop on Noisy User-generated Text, 2020

Abstract PDF Code

This paper studies punctuation restoration for both high-resource and low-resource settings, explores transformer-based models for ASR post-processing, and introduces an augmentation strategy that improves robustness while establishing a public baseline for Bangla.
SPECOM 2020

Lightweight CNN for Robust Voice Activity Detection

Tanvirul Alam and Akib Khan

In International Conference on Speech and Computer, 2020

Abstract Code

Voice activity detection is a critical preprocessing step for speech systems. This work introduces a lightweight CNN architecture and improves noisy-condition robustness with strong regularization and knowledge distillation, yielding large relative EER gains over a parameter-matched DNN baseline.