This repo contains starter code for training and evaluating machine learning
models over the YouTube-8M dataset.
This is the starter code for our
3rd Youtube8M Video Understanding Challenge on Kaggle
and part of the International Conference on Computer Vision (ICCV) 2019 selected
workshop session. The code gives an end-to-end working example for reading the
dataset, training a TensorFlow model, and evaluating the performance of the
model.
The starter code requires Tensorflow. If you haven’t installed it yet, follow
the instructions on tensorflow.org. This
code has been tested with Tensorflow 1.14. Going forward, we will continue to
target the latest released version of Tensorflow.
Please verify that you have Python 3.6+ and Tensorflow 1.14 or higher installed
by running the following commands:
python --version
python -c 'import tensorflow as tf; print(tf.__version__)'
Download Dataset Locally
Please see our
dataset website for
up-to-date download instructions.
In this document, we assume you download all the frame-level feature dataset to
~/yt8m/2/frame and segment-level validation/test dataset to ~/yt8m/3/frame.
So the structure should look like
Clone this git repo: mkdir -p ~/yt8m/code cd ~/yt8m/code git clone https://github.com/google/youtube-8m.git
Train video-level model on frame-level features and inference at segment-level.
Train using train.py, selecting a frame-level model (e.g.
FrameLevelLogisticModel), and instructing the trainer to use
--frame_features. TLDR - frame-level features are compressed, and this flag
uncompresses them.
NOTE: This script can be slow for the first time running. It will read
TFRecord data and build label cache. Once label cache is built, the evaluation
will be much faster later on.
Tensorboard
You can use Tensorboard to compare your frame-level or video-level models, like:
We find it useful to keep the tensorboard instance always running, as we train
and evaluate different models.
Using GPUs
If your Tensorflow installation has GPU support, e.g., installed with pip install tensorflow-gpu, this code will make use of all of your compatible GPUs.
You can verify your installation by running
python -c 'import tensorflow as tf; tf.Session()'
This will print out something like the following for each of your compatible
GPUs.
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: Tesla M40
major: 5 minor: 2 memoryClockRate (GHz) 1.112
pciBusID 0000:04:00.0
Total memory: 11.25GiB
Free memory: 11.09GiB
...
If at least one GPU was found, the forward and backward passes will be computed
with the GPUs, whereas the CPU will be used primarily for the input and output
pipelines. If you have multiple GPUs, the current default behavior is to use
only one of them.
Running on Google’s Cloud Machine Learning Platform
Requirements
This option requires you to have an appropriately configured Google Cloud
Platform account. To create and configure your account, please make sure you
follow the instructions
here.
Please also verify that you have Python 3.6+ and Tensorflow 1.14 or higher
installed by running the following commands:
python --version
python -c 'import tensorflow as tf; print(tf.__version__)'
Accessing Files on Google Cloud
You can browse the storage buckets you created on Google Cloud, for example, to
access the trained models, prediction CSV files, etc. by visiting the
Google Cloud storage browser.
Alternatively, you can use the ‘gsutil’ command to download the files directly.
For example, to download the output of the inference code from the previous
section to your local machine, run:
All gcloud commands should be done from the directory immediately above the
source code. You should be able to see the source code directory if you run
‘ls’.
As you are developing your own models, you will want to test them quickly to
flush out simple problems without having to submit them to the cloud.
Here is an example command line for frame-level training:
The following commands will train a model on Google Cloud over frame-level
features.
BUCKET_NAME=gs://${USER}_yt8m_train_bucket
# (One Time) Create a storage bucket to store training logs and checkpoints.
gsutil mb -l us-east1 $BUCKET_NAME
# Submit the training job.
JOB_NAME=yt8m_train_$(date +%Y%m%d_%H%M%S); gcloud --verbosity=debug ai-platform jobs \
submit training $JOB_NAME \
--package-path=youtube-8m --module-name=youtube-8m.train \
--staging-bucket=$BUCKET_NAME --region=us-east1 \
--config=youtube-8m/cloudml-gpu.yaml \
-- --train_data_pattern='gs://youtube8m-ml/2/frame/train/train*.tfrecord' \
--frame_features --model=FrameLevelLogisticModel \
--feature_names='rgb,audio' --feature_sizes='1024,128' \
--train_dir=$BUCKET_NAME/yt8m_train_frame_level_logistic_model --start_new_model
In the ‘gsutil’ command above, the ‘package-path’ flag refers to the directory
containing the ‘train.py’ script and more generally the python package which
should be deployed to the cloud worker. The module-name refers to the specific
python script which should be executed (in this case the train module).
It may take several minutes before the job starts running on Google Cloud. When
it starts you will see outputs like the following:
training step 270| Hit@1: 0.68 PERR: 0.52 Loss: 638.453
training step 271| Hit@1: 0.66 PERR: 0.49 Loss: 635.537
training step 272| Hit@1: 0.70 PERR: 0.52 Loss: 637.564
At this point you can disconnect your console by pressing “ctrl-c”. The model
will continue to train indefinitely in the Cloud. Later, you can check on its
progress or halt the job by visiting the
Google Cloud ML Jobs console.
You can train many jobs at once and use tensorboard to compare their performance
visually.
tensorboard --logdir=$BUCKET_NAME --port=8080
Once tensorboard is running, you can access it at the following url:
http://localhost:8080. If you are using Google Cloud
Shell, you can instead click the Web Preview button on the upper left corner of
the Cloud Shell window and select “Preview on port 8080”. This will bring up a
new browser tab with the Tensorboard view.
Evaluation and Inference
Here’s how to evaluate a model on the validation dataset:
Note the confusing use of ‘training’ in the above gcloud commands. Despite the
name, the ‘training’ argument really just offers a cloud hosted
python/tensorflow service. From the point of view of the Cloud Platform, there
is no distinction between our training and inference jobs. The Cloud ML platform
also offers specialized functionality for prediction with Tensorflow models, but
discussing that is beyond the scope of this readme.
Once these job starts executing you will see outputs similar to the following
for the evaluation code:
You can create your dataset files from your own videos. Our
feature extractor code creates tfrecord files,
identical to our dataset files. You can use our starter code to train on the
tfrecord files output by the feature extractor. In addition, you can fine-tune
your YouTube-8M models on your new dataset.
Training without this Starter Code
You are welcome to use our dataset without using our starter code. However, if
you’d like to compete on Kaggle, then you must make sure that you are able to
produce a prediction CSV file produced by our inference.py. In particular, the
predictions CSV file must
have two fields: Class Id,Segment Ids where Class Id must be class ids
listed in segment_label_ids.csv and Segment Ids is a space-delimited list of
<video ID>:<segment start time>sorted in a descending order of confidence
score.
YouTube-8M Tensorflow Starter Code
This repo contains starter code for training and evaluating machine learning models over the YouTube-8M dataset. This is the starter code for our 3rd Youtube8M Video Understanding Challenge on Kaggle and part of the International Conference on Computer Vision (ICCV) 2019 selected workshop session. The code gives an end-to-end working example for reading the dataset, training a TensorFlow model, and evaluating the performance of the model.
Table of Contents
Running on Your Own Machine
Requirements
The starter code requires Tensorflow. If you haven’t installed it yet, follow the instructions on tensorflow.org. This code has been tested with Tensorflow 1.14. Going forward, we will continue to target the latest released version of Tensorflow.
Please verify that you have Python 3.6+ and Tensorflow 1.14 or higher installed by running the following commands:
Download Dataset Locally
Please see our dataset website for up-to-date download instructions.
In this document, we assume you download all the frame-level feature dataset to
~/yt8m/2/frameand segment-level validation/test dataset to~/yt8m/3/frame. So the structure should look likeTry the starter code
Clone this git repo:
mkdir -p ~/yt8m/code cd ~/yt8m/code git clone https://github.com/google/youtube-8m.gitTrain video-level model on frame-level features and inference at segment-level.
Train using
train.py, selecting a frame-level model (e.g.FrameLevelLogisticModel), and instructing the trainer to use--frame_features. TLDR - frame-level features are compressed, and this flag uncompresses them.Evaluate the model by
This will provide some comprehensive metrics, e.g., gAP, mAP, etc., for your models.
Produce CSV (
kaggle_solution.csv) by doing inference:(Optional) If you wish to see how the models are evaluated in Kaggle system, you can do so by
NOTE: This script can be slow for the first time running. It will read TFRecord data and build label cache. Once label cache is built, the evaluation will be much faster later on.
Tensorboard
You can use Tensorboard to compare your frame-level or video-level models, like:
We find it useful to keep the tensorboard instance always running, as we train and evaluate different models.
Using GPUs
If your Tensorflow installation has GPU support, e.g., installed with
pip install tensorflow-gpu, this code will make use of all of your compatible GPUs. You can verify your installation by runningThis will print out something like the following for each of your compatible GPUs.
If at least one GPU was found, the forward and backward passes will be computed with the GPUs, whereas the CPU will be used primarily for the input and output pipelines. If you have multiple GPUs, the current default behavior is to use only one of them.
Running on Google’s Cloud Machine Learning Platform
Requirements
This option requires you to have an appropriately configured Google Cloud Platform account. To create and configure your account, please make sure you follow the instructions here.
Please also verify that you have Python 3.6+ and Tensorflow 1.14 or higher installed by running the following commands:
Accessing Files on Google Cloud
You can browse the storage buckets you created on Google Cloud, for example, to access the trained models, prediction CSV files, etc. by visiting the Google Cloud storage browser.
Alternatively, you can use the ‘gsutil’ command to download the files directly. For example, to download the output of the inference code from the previous section to your local machine, run:
Testing Locally
All gcloud commands should be done from the directory immediately above the source code. You should be able to see the source code directory if you run ‘ls’.
As you are developing your own models, you will want to test them quickly to flush out simple problems without having to submit them to the cloud.
Here is an example command line for frame-level training:
Training on the Cloud over Frame-Level Features
The following commands will train a model on Google Cloud over frame-level features.
In the ‘gsutil’ command above, the ‘package-path’ flag refers to the directory containing the ‘train.py’ script and more generally the python package which should be deployed to the cloud worker. The module-name refers to the specific python script which should be executed (in this case the train module).
It may take several minutes before the job starts running on Google Cloud. When it starts you will see outputs like the following:
At this point you can disconnect your console by pressing “ctrl-c”. The model will continue to train indefinitely in the Cloud. Later, you can check on its progress or halt the job by visiting the Google Cloud ML Jobs console.
You can train many jobs at once and use tensorboard to compare their performance visually.
Once tensorboard is running, you can access it at the following url: http://localhost:8080. If you are using Google Cloud Shell, you can instead click the Web Preview button on the upper left corner of the Cloud Shell window and select “Preview on port 8080”. This will bring up a new browser tab with the Tensorboard view.
Evaluation and Inference
Here’s how to evaluate a model on the validation dataset:
And here’s how to perform inference with a model on the test set:
Note the confusing use of ‘training’ in the above gcloud commands. Despite the name, the ‘training’ argument really just offers a cloud hosted python/tensorflow service. From the point of view of the Cloud Platform, there is no distinction between our training and inference jobs. The Cloud ML platform also offers specialized functionality for prediction with Tensorflow models, but discussing that is beyond the scope of this readme.
Once these job starts executing you will see outputs similar to the following for the evaluation code:
and the following for the inference code:
Export Your Model for MediaPipe Inference
To run inference with your model in MediaPipe inference demo, you need to export your checkpoint to a SavedModel.
Example command:
Create Your Own Dataset Files
You can create your dataset files from your own videos. Our feature extractor code creates
tfrecordfiles, identical to our dataset files. You can use our starter code to train on thetfrecordfiles output by the feature extractor. In addition, you can fine-tune your YouTube-8M models on your new dataset.Training without this Starter Code
You are welcome to use our dataset without using our starter code. However, if you’d like to compete on Kaggle, then you must make sure that you are able to produce a prediction CSV file produced by our
inference.py. In particular, the predictions CSV file must have two fields:Class Id,Segment IdswhereClass Idmust be class ids listed insegment_label_ids.csvandSegment Idsis a space-delimited list of<video ID>:<segment start time>sorted in a descending order of confidence score.Examples:
More Documents
More documents can be found in docs folder.
About This Project
This project is meant help people quickly get started working with the YouTube-8M dataset. This is not an official Google product.