目录

This dataset captures performance metrics averaged over 20-second intervals from approximately one million Pods belonging to over 90,000 applications across selected clusters. The data was collected during weekday evening peak hours in October 2024, outside of major promotional sale periods.

Structure

.
├── README.md
└── data
    ├── raw_data.csv  # dataset 
    └── raw_data_sampled.csv  # random 100 rows of the dataset as a sample

dataset structure

Each row in the dataset corresponds to a Pod and includes basic metadata (e.g., machine type, application) as well as performance metrics.

1.Basic Information

  • pod_id : Anonymous Pod identifier — a unique numeric ID assigned to each Pod.
  • app_id : Anonymous application identifier — a unique numeric ID assigned to each application.
  • business_type : Business type, including e-commerce and search-recommendation.
  • cpu_model : CPU microarchitecture model, such as Skylake or Ice Lake.

2.CPU Performance Metrics of Pod

  • pod_cpu_used_cores : Number of CPU cores used by the Pod.
  • pod_cpu_cpi : Cycles Per Instruction for the Pod.
  • pod_cpu_lv1_retiring etc. : Top-Down Microarchitecture Analysis (TMA) performance metrics.

3.Node-Level Metrics

  • node_cpu_usage: CPU utilization for the entire node.
  • node_uncore_imc_bw_util: Memory bandwidth utilization for the entire node.
  • node_uncore_imc_latency_ns: Memory access latency for the entire node (in nanoseconds).
关于
33.7 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号