Name: How to Avoid the Staggering Cost of Training State-of-the-art Large Language Models | GTC Digital Spring 2022 | NVIDIA On-Demand
Uploaded: 2022-03-22T10:00:00Z
Duration: 2611 s
Description: As state-of-the-art language models grow in capability and performance, so does their size, which now measures in billions or even trillions of parameters

详情

字幕

As state-of-the-art language models grow in capability and performance, so does their size, which now measures in billions or even trillions of parameters. Consequently, the time and cost of training these models has become higher than ever, making it impossible to train multiple models to find the best-performing hyperparameters. As an example, a 175 billion-parameter GPT-3 model takes 35 days to converge on 128 DGXA100 80GB nodes, making it incredibly difficult to train such large models in setups that have limited compute and time. We'll explain how to automatically select the best hyperparameters to maximize the training speed for such large language models using NVIDIA’s NeMo-Megatron containers. For instance, for a 5B parameter GPT3 model, the best configuration will train 11.61 times faster than the worst hyperparameter setting, saving a lot of compute time and energy in the process. Learn how to arrive at such configurations in as little time as possible.

活动: GTC Digital Spring

日期: March 2022

行业: All Industries

话题: Deep Learning - Training

级别: Intermediate Technical

语言: English

所在地: