Name: Accelerating DNN Inference with End-to-End Compilation | GTC 24 2024 | NVIDIA On-Demand
Uploaded: 2024-03-20T11:00:00Z
Duration: 1496 s
Description: Optimizing deep learning (DL) workloads (e.g

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

详情

字幕

Optimizing deep learning (DL) workloads (e.g., large language models and diffusion models) is crucial to achieve low latency and high throughput on various hardware platforms, such as different series of NVIDIA GPUs. The major part of this talk introduces an open-source, purely-written-in-Python, DL compiler named Hidet developed by CentML. Hidet can be used by DL developers as an end-to-end model compiler that is integrated with (hence, directly used from) frontend ML frameworks (e.g., PyTorch) and compiles the models into efficient executable binaries. When doing so, Hidet automatically performs graph-level optimizations, then generates and optimizes the needed CUDA kernels targeting the users' specific GPUs. Hidet can also be used by system engineers as a tensor-program compiler to implement efficient CUDA kernels directly in Python via a streamlined parallel programming model, thereby drastically boosting their productivity. In addition to Hidet, this talk briefly touches upon the other open-source tools and products that CentML built, including a DL training performance profiler and predictor named DeepView.

活动: GTC 24

日期: March 2024

级别: Advanced Technical

NVIDIA technology: CUDA,NCCL,Nsight Compute,Nsight Systems

话题: Deep Learning Frameworks

行业: HPC / Scientific Computing

语言: English

所在地: