Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
详情
字幕
Universal Model Serving via Triton and TensorRT
, Machine Learning Engineer, Snap, Inc.
Serving multiple models trained from multiple deep learning frameworks (TensorFlow, PyTorch, etc.) is tedious, as we need to maintain multiple serving infrastructure such as TFServing and TorchServe. The Triton inference server provides a uniformed serving backend solution for the models trained from different training frameworks. Using a single universal serving platform greatly reduces the hassles in model deployment. We'll show you how to deploy a TensorFlow savedmodel instance, as well as an ensemble model containing a Python/PyTorch preprocessing model and an ONNX inference model. We'll also demonstrate how to use the Triton model analyzer to find the optimal serving parameters. Finally, we'll show how to incorporate TensorRT in serving to significantly reduce the serving cost without compromising the quality of service.
活动: GTC 24
日期: March 2024
话题: AI Inference
NVIDIA technology: Cloud / Data Center GPU,TensorRT,Triton