Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
详情
字幕
Accelerating DNN Inference with End-to-End Compilation
, Co-Founder and CEO, CentML
, Research Software Development Engineer, CentML, Inc.
Optimizing deep learning (DL) workloads (e.g., large language models and diffusion models) is crucial to achieve low latency and high throughput on various hardware platforms, such as different series of NVIDIA GPUs. The major part of this talk introduces an open-source, purely-written-in-Python, DL compiler named Hidet developed by CentML. Hidet can be used by DL developers as an end-to-end model compiler that is integrated with (hence, directly used from) frontend ML frameworks (e.g., PyTorch) and compiles the models into efficient executable binaries. When doing so, Hidet automatically performs graph-level optimizations, then generates and optimizes the needed CUDA kernels targeting the users' specific GPUs. Hidet can also be used by system engineers as a tensor-program compiler to implement efficient CUDA kernels directly in Python via a streamlined parallel programming model, thereby drastically boosting their productivity. In addition to Hidet, this talk briefly touches upon the other open-source tools and products that CentML built, including a DL training performance profiler and predictor named DeepView.
活动: GTC 24
日期: March 2024
级别: Advanced Technical
NVIDIA technology: CUDA,NCCL,Nsight Compute,Nsight Systems