Spurious Counterexample Detection in Program Verification
Overview
This project focuses on investigating the capabilities of large language models (LLMs) in identifying spurious counterexamples during program verification processes. The core challenge addressed is determining whether counterexamples generated during verification are genuine violations or artifacts of the proof technique.
Benchmark Construction Methodology
Our benchmark construction employs two distinct approaches to generate spurious counterexamples:
Approach 1: Invariant Generation-Based Method
Utilize the invariant generation tool LaM4Inv to generate clauses via large language models
Apply SMT solvers to get inductive counterexamples
Approach 2: Boundary State Analysis Method
Use OMT solving on the entire inductive invariant obtained from LaM4Inv to derive boundary values for each variable
Determine the corresponding boundary states
TAG
For states satisfying the program’s precondition:
Insert the state back at the beginning of the original C program’s loop
For states not satisfying the precondition:
Insert the state at the end of the loop body
Insert assertion in the loop body: assert(!(target state));
Run Ultimate Automizer::
FALSE (counterexample) → state is reachable → (must GENUINE)
TRUE (no counterexample) → state is unreachable (must SPURIOUS)
UNKNOWN (not unsed) → cannot decide
Research Goals
This project aims to evaluate and enhance the ability of LLMs to distinguish between genuine and spurious counterexamples in formal program verification, contributing to more robust and reliable verification methodologies.
Spurious Counterexample Detection in Program Verification
Overview
This project focuses on investigating the capabilities of large language models (LLMs) in identifying spurious counterexamples during program verification processes. The core challenge addressed is determining whether counterexamples generated during verification are genuine violations or artifacts of the proof technique.
Benchmark Construction Methodology
Our benchmark construction employs two distinct approaches to generate spurious counterexamples:
Approach 1: Invariant Generation-Based Method
Approach 2: Boundary State Analysis Method
TAG
For states satisfying the program’s precondition:
For states not satisfying the precondition:
Insert assertion in the loop body:
assert(!(target state));Run Ultimate Automizer::
FALSE (counterexample) → state is reachable → (must GENUINE)
TRUE (no counterexample) → state is unreachable (must SPURIOUS)
UNKNOWN (not unsed) → cannot decide
Research Goals
This project aims to evaluate and enhance the ability of LLMs to distinguish between genuine and spurious counterexamples in formal program verification, contributing to more robust and reliable verification methodologies.
omt_satisfy_pre_models 中 SPURIOUS 数据的源文件(已经人工更正):
文件: 176_verification_result.json
文件: 93_verification_result.json
model_values未出现,即没有提取到反例的程序:
[110, 137, 162, 163, 164, 172, 234, 236, 240, 243, 262, 264, 266, 269, 313, 67] 共有16个 因此,我们的测试集合中共有316-16=300个程序,即300个程序有反例。