You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "damccorm (via GitHub)" <gi...@apache.org> on 2023/02/06 19:50:14 UTC
[GitHub] [beam] damccorm commented on a diff in pull request #25226: Add TensorRT runinference example for Text Classification

damccorm commented on code in PR #25226:
URL: https://github.com/apache/beam/pull/25226#discussion_r1097844820


##########
website/www/site/layouts/partials/section-menu/en/documentation.html:
##########
@@ -225,6 +225,7 @@
     <li><a href="/documentation/ml/anomaly-detection/">Anomaly Detection</a></li>
     <li><a href="/documentation/ml/large-language-modeling">Large Language Model Inference in Beam</a></li>
     <li><a href="/documentation/ml/per-entity-training">Per Entity Training in Beam</a></li>
+    <li><a href="/documentation/ml/tensorrt-runinference">TensorRT Text Classification Inference</a></li>

Review Comment:
   ```suggestion
       <li><a href="/documentation/ml/tensorrt-runinference">TensorRT Inference</a></li>
   ```
   
   This will help this render more naturally and illustrates the main point of the document



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,150 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Use TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network to efficiently run inference on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the [TensorRTEngineHandler](https://beam.apache.org/releases/pydoc/2.43.0/apache_beam.ml.inference.tensorrt_inference.html#apache_beam.ml.inference.tensorrt_inference.TensorRTEngineHandlerNumPy), which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML inference pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+The following example that demonstrates how to use TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+## Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, that is, it classifies any text into two classes: positive or negative. The trained model is available [from HuggingFace](https://huggingface.co/textattack/bert-base-uncased-SST-2). To convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.
+
+### Conversion to ONNX
+
+You can use the HuggingFace `transformers` library to convert a PyTorch model to ONNX. For details, see the blog post [Convert Transformers to ONNX with Hugging Face Optimum](https://huggingface.co/blog/convert-transformers-to-onnx). The blog post explains which required packages to install. The following code that is used for the conversion.

Review Comment:
   ```suggestion
   You can use the HuggingFace `transformers` library to convert a PyTorch model to ONNX. For details, see the blog post [Convert Transformers to ONNX with Hugging Face Optimum](https://huggingface.co/blog/convert-transformers-to-onnx). The blog post explains which required packages to install. The following code is used for the conversion.
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,150 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Use TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network to efficiently run inference on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the [TensorRTEngineHandler](https://beam.apache.org/releases/pydoc/2.43.0/apache_beam.ml.inference.tensorrt_inference.html#apache_beam.ml.inference.tensorrt_inference.TensorRTEngineHandlerNumPy), which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML inference pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+The following example that demonstrates how to use TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+## Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, that is, it classifies any text into two classes: positive or negative. The trained model is available [from HuggingFace](https://huggingface.co/textattack/bert-base-uncased-SST-2). To convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.

Review Comment:
   ```suggestion
   To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis and classifies any text into two classes: positive or negative. The trained model is available [from HuggingFace](https://huggingface.co/textattack/bert-base-uncased-SST-2). To convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org