These traces are published by Alibaba Group to help researchers understand the real-world workload in the cloud.
They are collected from a cluster in production of the elastic block service of Alibaba Cloud (i.e. storage for virtual disks). The cluster is located in Beijing region, one of the most popular regions of Alibaba Cloud.
There are 1000 virtual disks randomly sampled from that cluster, and all their I/O activities are recorded over the month of January 2020. These virtual disks are ultra disk products. Ultra disks are backed by a storage cluster and offer high data reliability. Ultra disks are cheaper and offer lower random I/O performance, compared to standard SSD and enhanced SSD disks link. Typical applications of ultra disks are running operating systems, big data processing software, web servers, etc..
Download
The data are available for download from Alibaba OSS. You will get the download link after taking a short survey. If you have any questions or ideas about the trace data, feel free to contact us. The current maintainer is Chao Shi <chao.shi AT alibaba-inc.com>. We are happy to see research work based on the trace data.
Just a kind reminder, the tarball is very large, 181GB gzip-compressed and 751GB uncompressed, so make sure you have enough space on your disk.
Here are MD5 checksums of the tarball and files inside.
Filename
MD5 checksum
alibaba_block_traces_2020.tar.gz
95780fc531a60fd4ca0513ef88ef469c
io_traces.csv
c60dd8f771738d4d8df56271e56dd308
device_size.csv
6641abe8a0f3625f13776120d2884e84
Schema
There are two files in CSV format. Their file format is defined as follow.
io_traces.csv
Each row is a read or write operation.
Column
Type
Example
Description
device_id
uint32
0
ID of the virtual disk
opcode
char
R
Either of ‘R’ or ‘W’, indicating this operation is read or write
offset
uint64
126703644672
Offset of this operation, in bytes
length
uint32
4096
Length of this operation, in bytes
timestamp
uint64
1577808000000626
Timestamp of this operation received by server, in microseconds
device_size.csv
Each row is a device with is capacity.
Column
Type
Example
Description
device_id
uint32
0
ID of the virtual disk
capacity
uint64
536870912000
Capacity of the virtual disk, in bytes
All IDs of virtual disks are re-mapped to the range of 0 - 999.
Research outcome
Here is a list of research work based on the trace data. If your paper uses the data, it would be great to let us know and add your work to this list.
Alibaba Innovative Reseach (AIR) program sponsors research every year on various area in computer science that solve the real problems in industry scenarios. If you have fancy ideas and are interested in participating in this program, feel free to contact Chao Shi <chao.shi@alibaba-inc.com>.
Acknowledgements
Thanks to Qiuping Wang and Jinhong Li from the Chinese University of Hong Kong for analyzing and validating the data at an early stage.
License
The trace data and document are licensed under CC-4.0.
Alibaba Block Traces
These traces are published by Alibaba Group to help researchers understand the real-world workload in the cloud.
They are collected from a cluster in production of the elastic block service of Alibaba Cloud (i.e. storage for virtual disks). The cluster is located in Beijing region, one of the most popular regions of Alibaba Cloud.
There are 1000 virtual disks randomly sampled from that cluster, and all their I/O activities are recorded over the month of January 2020. These virtual disks are ultra disk products. Ultra disks are backed by a storage cluster and offer high data reliability. Ultra disks are cheaper and offer lower random I/O performance, compared to standard SSD and enhanced SSD disks link. Typical applications of ultra disks are running operating systems, big data processing software, web servers, etc..
Download
The data are available for download from Alibaba OSS. You will get the download link after taking a short survey. If you have any questions or ideas about the trace data, feel free to contact us. The current maintainer is Chao Shi <chao.shi AT alibaba-inc.com>. We are happy to see research work based on the trace data.
Just a kind reminder, the tarball is very large, 181GB gzip-compressed and 751GB uncompressed, so make sure you have enough space on your disk.
Here are MD5 checksums of the tarball and files inside.
alibaba_block_traces_2020.tar.gz95780fc531a60fd4ca0513ef88ef469cio_traces.csvc60dd8f771738d4d8df56271e56dd308device_size.csv6641abe8a0f3625f13776120d2884e84Schema
There are two files in CSV format. Their file format is defined as follow.
io_traces.csv
Each row is a read or write operation.
device_iduint320opcodecharRoffsetuint64126703644672lengthuint324096timestampuint641577808000000626device_size.csv
Each row is a device with is capacity.
device_iduint320capacityuint64536870912000All IDs of virtual disks are re-mapped to the range of 0 - 999.
Research outcome
Here is a list of research work based on the trace data. If your paper uses the data, it would be great to let us know and add your work to this list.
Alibaba Innovative Reseach (AIR) program sponsors research every year on various area in computer science that solve the real problems in industry scenarios. If you have fancy ideas and are interested in participating in this program, feel free to contact Chao Shi <chao.shi@alibaba-inc.com>.
Acknowledgements
Thanks to Qiuping Wang and Jinhong Li from the Chinese University of Hong Kong for analyzing and validating the data at an early stage.
License
The trace data and document are licensed under CC-4.0.