SPECA: Specification-Anchored Agentic Auditing for Security Verification

NyxFoundation/speca

364

+19/day

PythonSecurity💎 Hidden Gem

SPECA: Specification-to-Checklist Agentic Auditing Framework

AI Analysis

An agentic framework that audits code against natural-language specifications rather than patterns.

Built for Security engineers and protocol developers auditing complex, specification-governed systems.

From the README

SPECA: A Specification-to-Checklist Agentic Auditing Framework

Paper: Masato Kamba, Hirotake Murakami, Akiyoshi Sannai. Beyond Code Reasoning: A Specification-Anchored Audit Framework for Expert-Augmented Security Verification. arXiv preprint arXiv:2604.26495, 2026.

Abstract

Security-critical software is routinely audited by tools that reason about vulnerabilities as repository-local code patterns. Yet specification-governed systems — protocol stacks, consensus implementations, cryptographic libraries — are constrained by invariants and correctness conditions defined in natural-language specifications. When a vulnerability arises from what the specification requires rather than how code is written, code-level approaches lack the representational vocabulary to detect it, and their false positives resist systematic diagnosis.

SPECA is a specification-anchored security audit framework that derives explicit, typed security properties from natural-language specifications and audits implementations through structured proof-attempt reasoning grounded in each property. The framework yields three capabilities absent from code-driven auditing:

Spec-dependent detections that no code-local pattern matcher can express.
Controlled cross-implementation comparison under a shared property vocabulary.
False positives that decompose into interpretable, pipeline-phase-traceable root causes.

Headline Results

Sherlock Ethereum Fusaka Audit Contest (366 submissions, 10 implementations): SPECA recovers all 15 in-scope H/M/L vulnerabilities (5H/2M/8L) and independently discovers 4 bugs confirmed by developer fix commits — including a cryptographic invariant violation absent from all 366 adjudicated contest submissions.
RepoAudit C/C++ benchmark (15 projects, 35 non-disputed ground-truth bugs): SPECA matches the best published precision (88.9%, Sonnet 4.5) while surfacing 12 author-validated candidate bugs beyond the established ground truth — two confirmed by upstream maintainers.
All false positives in the deep analysis (N=16) trace to three interpretable root causes — trust boundary misunderstanding (50%), code reading error (37.5%), specification misinterpretation (12.5%) — each mapped to a specific pipeline phase.

See Evaluation for full numbers and charts.

Why SPECA?
Quick Start
Demo
Architecture
Phases
Running on GitHub Actions
Configuration
Evaluation — RQ1 Sherlock + RQ2 RepoAudit, with charts
Reproducing the Benchmarks
Contributing
Citation
License

Why SPECA?

Existing LLM-based auditors begin from the code and work outward — scanning a repository for bug-pattern templates, dataflow anoma