课题三算法工具集成仓库

复现步骤

工具文档入口

工具类型	文档	说明
日志单模态工具	`log_tools/README.md`	已包含日志异常检测、日志故障定位、LogBERT baseline 等内容
单指标工具	`metric_tools/README.md`	按日志工具文档格式说明 metric-only 的算法、运行方式和三任务评测
指标 + 日志融合工具	`metric_log_tools/README.md`	按日志工具文档格式说明 metric+log 的算法、运行方式和三任务评测

日志单模态工具

scripts/log_tool.sh中包含了对日志单模态算法模型工具的测试和对Baseline复现的脚本调用。

# Ours
python evaluate_log_tools.py \
    --groundtruth datasets/benchmark-groundtruth-test-0316-0325.csv \
    --services-file log_tools/services.txt \
    --model-dir log_tools/models/fault_type_cls \
    --output-json outputs/log_tool_result.json > outputs/log_tool_result.txt


# Baseline: LogBERT
python evaluate_log_tools.py \
  --groundtruth datasets/benchmark-groundtruth-test-0316-0325.csv \
  --services-file log_tools/services.txt \
  --ad-baseline logbert \
  --loc-baseline logbert \
  --cls-baseline logbert \
  --logbert-model-dir log_tools/logbert/models \
  --logbert-cls-model-path log_tools/logbert/models/fault_type_cls/logbert_fault_classifier.pkl

已复现 baseline 的统一输入口径（摘要）

详细可运行命令见 metric_tools/README.md 和 metric_log_tools/README.md。后续 baseline 工具统一提供两类输入：

单指标工具（metric-only）：只使用指标数据。
指标 + 日志融合工具（metric+log）：使用指标和日志数据。

Trace 不作为默认输入；除非调试历史三模态实验，不需要传入 trace。

Baseline	metric-only	metric+log
Hades	`tianchi_hades_driver.py --data_type kpi`	`tianchi_hades_driver.py --data_type fuse`
AnoFusion	`tianchi_anofusion_driver.py --top-log-tokens 0`	`tianchi_anofusion_driver.py --top-log-tokens <N>`
DiagFusion	`tianchi_diagfusion_driver.py --modalities metric`	`tianchi_diagfusion_driver.py --modalities metric,log`
ART AD	`evaluate_opsaug_ad.py --modalities metric`	`evaluate_opsaug_ad.py --modalities metric,log`
ART FT/RCL	`evaluate_opsaug_diag.py --modalities metric`	`evaluate_opsaug_diag.py --modalities metric,log`
DejaVu-style	`tianchi_dejavu_driver.py --modalities metric`	`tianchi_dejavu_driver.py --modalities metric,log`
MULAN-style	`tianchi_mulan_driver.py --modalities metric`	`tianchi_mulan_driver.py --modalities metric,log`
Nezha-style	`tianchi_nezha_driver.py --modalities metric`	`tianchi_nezha_driver.py --modalities metric,log`

上述支持 --modalities 的工具现在默认值均为 metric,log，不会拉取 trace。

时间安排

主要时间点：2月底

在2月底前完成本仓库，3月初由阿里工程师完成集成测试

每个算法部分需要同学准备的内容

每位同学可clone本仓库后，在{metric/trace/log}_tools文件夹中添加自己负责部分的如下文件：

算法本体代码以及（训练后的）模型参数文件
README文件以及复现所需的requirements.txt等环境文件，描述说明代码运行方式、所需库等
实验报告，包括算法介绍，在数据集上的测试结果，与基线方法的对比等内容
在server.py中，参考log_anomaly_detect工具，对自己的算法模型进行API封装，主要要求：通过阿里云SLS接口读取输入数据而非本地输入

集成方面的问题，可与@刘进步在钉钉讨论。

测试数据环境说明

将在rca-benchmark上进行测试，具体信息如下：

REGION=cn-hongkong
WORKSPACE=rca-benchmark

该workspace仅限阿里实习生在内网通过开发机访问，其他同学请咨询@窦梓峻以及@刘玉河。

数据方面的问题，可与@曙云在钉钉讨论。