VeGiantModel is a torch based high efficient training library developed by the Applied Machine Learning team at Bytedance. This repository is for ongoing research to make giant model (such as GPT, BERT and T5) training easy, efficient, and effective. VeGiantModel builds on top of Megatron and DeepSpeed, improves communication efficiency by integrating high efficient communication library BytePs and providing customized pipline partitioning.
from veGiantModel.module import ColumnParallelLinear, RowParallelLinear
class PositionWiseFeedForward(nn.Module):
""" FeedForward Neural Networks for each position """
def __init__(self, config: Config):
super().__init__()
if self.config.use_mp_linear_in_ffn:
assert ColumnParallelLinear is not None
assert RowParallelLinear is not None
self.fc1 = ColumnParallelLinear(config.dim, config.dim_ff, use_ft=False)
self.fc2 = RowParallelLinear(config.dim_ff, config.dim, use_ft=False)
else:
self.fc1 = nn.Linear(config.dim, config.dim_ff)
self.fc2 = nn.Linear(config.dim_ff, config.dim)
self.act = Activation(config.act)
self.dropout = nn.Dropout(config.p_drop_hidden)
def forward(self, x) -> torch.Tensor:
# (bsz, seq_len, dim) -> (bsz, seq_len, dim_ff / model_parallel_size) -> (bsz, seq_len, dim)
fc1_out = self.act(self.fc1(x))
if self.config.dropout_in_ffn:
fc1_out = self.dropout(fc1_out)
fc2_out = self.fc2(fc1_out)
if self.config.use_ffn_output_dropout:
fc2_out = self.dropout(fc2_out)
return fc2_out
Examples
GPT Pretraining
The examples/gpt/pretrain_gpt2_distributed.sh scrips runs 345M parameter GPT pretraining on single 8 GPUs node. It follows largely the same as Megatron GPT script with a few notable differences. It shows good compatiblility with current megatron/Deepseed training job with little changes to adpot VeGiantModel.
veGiantModel
VeGiantModel is a torch based high efficient training library developed by the Applied Machine Learning team at Bytedance. This repository is for ongoing research to make giant model (such as GPT, BERT and T5) training easy, efficient, and effective. VeGiantModel builds on top of Megatron and DeepSpeed, improves communication efficiency by integrating high efficient communication library BytePs and providing customized pipline partitioning.
initialization
modules
Examples
GPT Pretraining
The
examples/gpt/pretrain_gpt2_distributed.shscrips runs 345M parameter GPT pretraining on single 8 GPUs node. It follows largely the same as Megatron GPT script with a few notable differences. It shows good compatiblility with current megatron/Deepseed training job with little changes to adpot VeGiantModel.