Name: Optimizing and Scaling LLMs With TensorRT-LLM for Text Generation | GTC 24 2024 | NVIDIA On-Demand
Uploaded: 2024-03-20T14:00:00Z
Duration: 6729 s
Description: The landscape of large language models (LLMs) is evolving quickly

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

详情

字幕

The landscape of large language models (LLMs) is evolving quickly. With model parameters and size increasing, optimizing and deploying LLMs for inference gets very complex. This requires a framework with better API support for easy extension, where there is little emphasis on memory management or CUDA calls. Learn how we used NVIDIA’s suite of solutions for optimizing LLM models and deploying in multi-GPU environments.

活动: GTC 24

日期: March 2024

话题: AI Inference

行业: Consumer Internet

级别: Intermediate Technical

NVIDIA technology: TensorRT

语言: English

所在地: