Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • subtitles off, selected
      • Quality

      Optimizing and Scaling LLMs With TensorRT-LLM for Text Generation

      , Senior Solution Architect, NVIDIA
      , Machine Learning Engineer, Grammarly
      , Machine Learning Engineer, Grammarly
      The landscape of large language models (LLMs) is evolving quickly. With model parameters and size increasing, optimizing and deploying LLMs for inference gets very complex. This requires a framework with better API support for easy extension, where there is little emphasis on memory management or CUDA calls. Learn how we used NVIDIA’s suite of solutions for optimizing LLM models and deploying in multi-GPU environments.
      活动: GTC 24
      日期: March 2024
      话题: AI Inference
      行业: Consumer Internet
      级别: Intermediate Technical
      NVIDIA technology: TensorRT
      语言: English
      所在地: