sunzhongwei/jittor-comp-human

Combined Skeleton and Skin Prediction Model

This project implements a combined model for predicting both skeleton joint locations and skinning weights from 3D point cloud data. The model utilizes a shared transformer backbone and predicts skinning weights based on the predicted skeleton, enabling an end-to-end learning approach.

Features

Combined Learning: Jointly trains for skeleton prediction and skin weight estimation.
Shared Backbone: Employs a shared Point_Transformer2 as a feature extractor for both tasks.
Dependent Skin Prediction: Skin weight prediction leverages the predicted skeleton joints, ensuring consistency.
Flexible Prediction Modes: Supports predicting only skeleton, only skin, or both.
Evaluation Metrics: Includes Mean Squared Error (MSE), L1 Loss, and Joint-to-Joint (J2J) distance for evaluation.
Visualization: Provides utilities for rendering and visualizing predicted skeletons, point clouds, and skin weights.

Model Architecture

The core of the model is the CombinedSkeletonSkinModel defined in models/combined.py.

shared_transformer: A Point_Transformer2 (from PCT.networks.cls.pct) extracts shared features from the input vertices.
skeleton_head: A multi-layer perceptron (MLP) branch takes the shared features and predicts the flattened 3D coordinates of the skeleton joints.
joint_mlp and vertex_mlp: These MLPs process the predicted joint locations (for joint_mlp) and input vertices (for vertex_mlp), concatenated with the shared features, to produce latent representations for joints and vertices, respectively.
Skin Weight Calculation: Skin weights are computed by performing a scaled dot-product attention-like operation between the vertices_latent and joints_latent, followed by a softmax activation to ensure weights sum to 1 for each vertex across all joints.

Getting Started

Prerequisites

Jittor (0.1.18 or newer recommended)
NumPy
SciPy
tqdm

You can install Jittor by following the instructions on their official GitHub page or by running:

pip install jittor

Other dependencies can be installed via pip:

pip install numpy scipy tqdm

Data Preparation

The model expects data to be organized in a specific structure. The data_root argument points to the root directory containing your dataset. Data lists (e.g., train_data_list.txt, predict_data_list.txt) should specify the paths to individual data samples relative to data_root.

Each data sample should ideally contain:

vertices: Point cloud data.
joints: Ground truth skeleton joint locations (for training).
skin: Ground truth skinning weights (for training).
origin_vertices: Original untransformed vertices (for prediction output).

The dataset/dataset.py and dataset/sampler.py scripts handle data loading and sampling.

Training

To train the combined model, use the launch/train_combined.sh script.

./launch/train_combined.sh

Before running, ensure you edit launch/train_combined.sh to configure your desired training parameters, such as data paths, output directory, epochs, batch size, learning rate, and loss weights.

Example parameters you might set in the script:

--train_data_list data/train_list.txt
--val_data_list data/val_list.txt
--data_root /path/to/your/data
--output_dir output/combined_training
--epochs 500
--batch_size 16
--learning_rate 1e-4
--optimizer adamw
--skeleton_weight 1.0
--skin_weight 1.0
--feat_dim 256
--apply_rotation False
--apply_z_scaling False

Training Arguments (configured in `launch/train_combined.sh`):

--train_data_list (required): Path to the list file containing training data samples.
--val_data_list: Path to the list file containing validation data samples. (Optional)
--data_root: Root directory of your dataset.
--output_dir: Directory to save models, logs, and visualizations.
--epochs: Number of training epochs.
--batch_size: Batch size for training.
--learning_rate: Initial learning rate.
--optimizer: Optimizer to use (sgd, adam, adamw).
--skeleton_weight: Weight for the skeleton loss in the total loss.
--skin_weight: Weight for the skin loss in the total loss.
--feat_dim: Feature dimension for the shared transformer.
--pretrained_model: Path to a pre-trained model checkpoint to resume training or fine-tune.
--apply_z_scaling: Whether to apply z-axis scaling during data augmentation.
--apply_rotation: Whether to apply random rotations during data augmentation.
--print_freq: How often to print training progress (batches).
--save_freq: How often to save model checkpoints (epochs).
--val_freq: How often to run validation (epochs).

Prediction

To make predictions using a trained model, use the launch/predict_combined.sh script.

./launch/predict_combined.sh

Before running, ensure you edit launch/predict_combined.sh to configure your prediction parameters, including data paths, the path to your trained model, output directory, and prediction mode.

Example parameters you might set in the script:

--predict_data_list data/predict_list.txt
--data_root /path/to/your/data
--pretrained_model output/combined_training/best_combined_model.pkl
--predict_output_dir prediction_results
--mode both
--feat_dim 256

Prediction Arguments (configured in `launch/predict_combined.sh`):

--predict_data_list (required): Path to the list file containing data samples for prediction.
--data_root: Root directory of your dataset.
--pretrained_model (required): Path to the pre-trained model checkpoint for prediction.
--predict_output_dir (required): Directory to save prediction results.
--mode: Prediction mode (skeleton, skin, or both). Determines what outputs are saved.
--feat_dim: Feature dimension used during training (must match the trained model).
--batch_size: Batch size for prediction. Note: Currently must be 1 due to unpadded origin_vertices.

Output

Training Output:

training_log.txt: A log file detailing training progress, losses, and validation results.
best_skeleton_model.pkl: Model checkpoint with the best skeleton (J2J) loss on the validation set.
best_skin_model.pkl: Model checkpoint with the best skin (L1) loss on the validation set.
best_combined_model.pkl: Model checkpoint with the best combined loss on the validation set.
checkpoint_epoch_X.pkl: Periodic model checkpoints.
final_combined_model.pkl: The model saved at the end of training.
tmp/combined/epoch_X/: Directory containing visualization outputs (skeleton renderings, point clouds, skin heatmaps) during validation.

Prediction Output: For each sample in your predict_data_list, a directory will be created under predict_output_dir (e.g., prediction_results/<class_id>/<sample_id>). This directory will contain:

predict_skeleton.npy: Predicted skeleton joint locations (NumPy array, if mode is ‘skeleton’ or ‘both’).
predict_skin.npy: Resampled skinning weights for the original vertices (NumPy array, if mode is ‘skin’ or ‘both’).
transformed_vertices.npy: The original (untransformed) vertices used for prediction (NumPy array).

Code Structure

launch/train_combined.sh: Shell script to execute the training process.
launch/predict_combined.sh: Shell script to execute the prediction process.
train_combined.py: Python script containing the main training logic.
predict_combined.py: Python script containing the main prediction logic.
models/combined.py: Defines the CombinedSkeletonSkinModel architecture.
dataset/dataset.py: Handles data loading and batching.
dataset/sampler.py: Defines the SamplerMix for data sampling.
dataset/asset.py: Likely contains asset-related utilities (e.g., for loading specific data formats).
dataset/format.py: Defines id_to_name and parents for skeleton structure.
dataset/exporter.py: Contains utilities for exporting and visualizing results.
models/metrics.py: Defines evaluation metrics like J2J.
models/basics.py: Contains basic neural network modules like MLP.
models/transformers.py: Might contain custom transformer implementations, if any.
PCT/networks/cls/pct.py: Contains the Point_Transformer and Point_Transformer2 used as the backbone.

Notes

Batch Size in Prediction: The predict_combined.py script currently forces batch_size=1 during prediction because origin_vertices are not padded, which would cause issues with variable-length point clouds in a batch.
Jittor Flags: jt.flags.use_cuda = 1 is set to ensure GPU utilization if CUDA is available.
Random Seeds: seed_all(123) is used to ensure reproducibility.