可控开源社区

2202-fgs/spurious

关注点赞复刻(Fork)

目录

electronics version

9天前1次提交

GPT4TurboFullelectronics version9天前
c (copy)electronics version9天前
celectronics version9天前
c_smt2electronics version9天前
ce_tagelectronics version9天前
configelectronics version9天前
figelectronics version9天前
svcomp_c (copy)electronics version9天前
svcomp_celectronics version9天前
verification_resultselectronics version9天前
.gitignoreelectronics version9天前
OMT_verifier.pyelectronics version9天前
README.mdelectronics version9天前
SMT_verifier.pyelectronics version9天前
TimeController.pyelectronics version9天前
Ultimate.logelectronics version9天前
UltimateCounterExample.errorpathelectronics version9天前
convert.pyelectronics version9天前
distribute.pyelectronics version9天前
distribute_ce_tag.pyelectronics version9天前
faceted-heatmap.pyelectronics version9天前
gen.pyelectronics version9天前
llm_result_analysis.pyelectronics version9天前
llm_time_analysis.pyelectronics version9天前
model_distribute_fig.pyelectronics version9天前
overall_consitent.txtelectronics version9天前
overall_time.txtelectronics version9天前
remove_post_condition.pyelectronics version9天前
rq_figs.pyelectronics version9天前
tag.pyelectronics version9天前
tendency.pyelectronics version9天前
tendency_ds_direct.txtelectronics version9天前
time_bar.pyelectronics version9天前
verify_with_llm.pyelectronics version9天前

Spurious Counterexample Detection in Program Verification

Overview

This project focuses on investigating the capabilities of large language models (LLMs) in identifying spurious counterexamples during program verification processes. The core challenge addressed is determining whether counterexamples generated during verification are genuine violations or artifacts of the proof technique.

Benchmark Construction Methodology

Our benchmark construction employs two distinct approaches to generate spurious counterexamples:

Approach 1: Invariant Generation-Based Method

Utilize the invariant generation tool LaM4Inv to generate clauses via large language models
Apply SMT solvers to get inductive counterexamples

Approach 2: Boundary State Analysis Method

Use OMT solving on the entire inductive invariant obtained from LaM4Inv to derive boundary values for each variable
Determine the corresponding boundary states

TAG

For states satisfying the program’s precondition:
- Insert the state back at the beginning of the original C program’s loop
For states not satisfying the precondition:
- Insert the state at the end of the loop body
Insert assertion in the loop body: assert(!(target state));
Run Ultimate Automizer::
- FALSE (counterexample) → state is reachable → (must GENUINE)
- TRUE (no counterexample) → state is unreachable (must SPURIOUS)
- UNKNOWN (not unsed) → cannot decide

Research Goals

This project aims to evaluate and enhance the ability of LLMs to distinguish between genuine and spurious counterexamples in formal program verification, contributing to more robust and reliable verification methodologies.

omt_satisfy_pre_models 中 SPURIOUS 数据的源文件(已经人工更正):

文件: 176_verification_result.json

索引 1: {‘b’: ‘1’, ‘k’: ‘1’, ‘n’: ‘0’, ‘j’: ‘-2147483648’, ‘i’: ‘-2147483648’}
索引 5: {‘k’: ‘1’, ‘n’: ‘0’, ‘b’: ‘1’, ‘j’: ‘0’, ‘i’: ‘0’}

文件: 93_verification_result.json

索引 6: {‘n’: ‘1’, ‘i’: ‘0’, ‘y’: ‘0’, ‘x’: ‘0’}

model_values未出现，即没有提取到反例的程序：

[110, 137, 162, 163, 164, 172, 234, 236, 240, 243, 262, 264, 266, 269, 313, 67] 共有16个因此，我们的测试集合中共有316-16=300个程序，即300个程序有反例。

关于

1.4 MB

邀请码

Gitlink（确实开源）

加入我们
官网邮箱：gitlink@ccf.org.cn

QQ群

QQ群

公众号

公众号

©Copyright 2023 CCF 开源发展委员会
Powered by Trustie& IntelliDE 京ICP备13000930号