Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software cover art

Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software

Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software

Listen for free

View show details
Security teams are increasingly exploring whether large language models can automatically detect vulnerabilities in source code — a task with serious consequences if done poorly. This paper delivers a sobering assessment: even fine-tuned models that score well on benchmarks may be learning surface-level patterns rather than genuine security reasoning. Using carefully curated Linux kernel samples with a strict temporal split to prevent data leakage, the authors show that fine-tuning shifts output calibration without changing underlying decision logic. The implications are significant for any organization considering LLM-assisted code review, penetration testing, or automated vulnerability triage in production systems.
adbl_web_anon_alc_button_suppression_t1
No reviews yet