Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected
      • Quality

      Accelerating DNN Inference with End-to-End Compilation

      , Co-Founder and CEO, CentML
      , Research Software Development Engineer, CentML, Inc.
      Optimizing deep learning (DL) workloads (e.g., large language models and diffusion models) is crucial to achieve low latency and high throughput on various hardware platforms, such as different series of NVIDIA GPUs. The major part of this talk introduces an open-source, purely-written-in-Python, DL compiler named Hidet developed by CentML. Hidet can be used by DL developers as an end-to-end model compiler that is integrated with (hence, directly used from) frontend ML frameworks (e.g., PyTorch) and compiles the models into efficient executable binaries. When doing so, Hidet automatically performs graph-level optimizations, then generates and optimizes the needed CUDA kernels targeting the users' specific GPUs. Hidet can also be used by system engineers as a tensor-program compiler to implement efficient CUDA kernels directly in Python via a streamlined parallel programming model, thereby drastically boosting their productivity. In addition to Hidet, this talk briefly touches upon the other open-source tools and products that CentML built, including a DL training performance profiler and predictor named DeepView.
      活动: GTC 24
      日期: March 2024
      级别: Advanced Technical
      NVIDIA technology: CUDA,NCCL,Nsight Compute,Nsight Systems
      话题: Deep Learning Frameworks
      行业: HPC / Scientific Computing
      语言: English
      所在地: