You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by "shub-kris (via GitHub)" <gi...@apache.org> on 2023/01/31 07:19:27 UTC

[GitHub] [beam] shub-kris opened a new pull request, #25226: Add TensorRT runinference example for Text Classification

shub-kris opened a new pull request, #25226:
URL: https://github.com/apache/beam/pull/25226

   This PR aims to illustrates an example of how to use TensorRT with RunInference for a text classification model. The pipeline performs the following steps:
   
   - Reads data, 
   - Does some preprocessing
   - Does Inference
   - Post process RunInference outputs and print results
   
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on pull request #25226: Add TensorRT runinference example for Text Classification

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.

AnandInguva commented on PR #25226:
URL: https://github.com/apache/beam/pull/25226#issuecomment-1419567752

   CCing @damccorm  for final review. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] shub-kris commented on pull request #25226: Add TensorRT runinference example for Text Classification

Posted by "shub-kris (via GitHub)" <gi...@apache.org>.

shub-kris commented on PR #25226:
URL: https://github.com/apache/beam/pull/25226#issuecomment-1409883304

   @damccorm please, have a look. I will add documentation page soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #25226: Add TensorRT runinference example for Text Classification

Posted by "rszper (via GitHub)" <gi...@apache.org>.

rszper commented on code in PR #25226:
URL: https://github.com/apache/beam/pull/25226#discussion_r1092283439


##########
website/www/site/content/en/documentation/ml/overview.md:
##########
@@ -91,4 +91,5 @@ You can find examples of end-to-end AI/ML pipelines for several use cases:
 * [Online Clustering in Beam](/documentation/ml/online-clustering): Demonstrates how to set up a real-time clustering pipeline that can read text from Pub/Sub, convert the text into an embedding using a transformer-based language model with the RunInference API, and cluster the text using BIRCH with stateful processing.
 * [Anomaly Detection in Beam](/documentation/ml/anomaly-detection): Demonstrates how to set up an anomaly detection pipeline that reads text from Pub/Sub in real time and then detects anomalies using a trained HDBSCAN clustering model with the RunInference API.
 * [Large Language Model Inference in Beam](/documentation/ml/large-language-modeling): Demonstrates a pipeline that uses RunInference to perform translation with the T5 language model which contains 11 billion parameters.
-* [Per Entity Training in Beam](/documentation/ml/per-entity-training): Demonstrates a pipeline that trains a Decision Tree Classifier per education level for predicting if the salary of a person is >= 50k.
\ No newline at end of file
+* [Per Entity Training in Beam](/documentation/ml/per-entity-training): Demonstrates a pipeline that trains a Decision Tree Classifier per education level for predicting if the salary of a person is >= 50k.
+* [TensorRT Text Classification Inference](/documentation/ml/tensorrt-runinference): Demonstrates a pipeline to utilize TensorRT with the RunInference using a BERT-based text classification model.

Review Comment:
   ```suggestion
   * [TensorRT Text Classification Inference](/documentation/ml/tensorrt-runinference): Demonstrates a pipeline that uses TensorRT with the RunInference transform and a BERT-based text classification model.
   ```



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads in memory data,

Review Comment:
   ```suggestion
   for a text classification model. This pipeline reads in-memory data,
   ```



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads in memory data,
+does some preprocessing and then uses RunInference for getting prediction
+from the text classification TensorRT engine. Afterwards, it post process

Review Comment:
   ```suggestion
   from the text classification TensorRT engine. Next, it postprocesses
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.

Review Comment:
   ```suggestion
   - [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network to efficiently run inference on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
   ```



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads in memory data,
+does some preprocessing and then uses RunInference for getting prediction
+from the text classification TensorRT engine. Afterwards, it post process
+the RunInference outputs to print the input and the predicted class label.
+It also prints different metrics provided by RunInference.
+"""
+
+import argparse
+import logging
+
+import numpy as np
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.tensorrt_inference import TensorRTEngineHandlerNumPy
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from transformers import AutoTokenizer
+
+
+class Preprocess(beam.DoFn):
+  """Processes the input sentence to tokenize them.

Review Comment:
   ```suggestion
     """Processes the input sentences to tokenize them.
   ```



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads in memory data,
+does some preprocessing and then uses RunInference for getting prediction
+from the text classification TensorRT engine. Afterwards, it post process
+the RunInference outputs to print the input and the predicted class label.
+It also prints different metrics provided by RunInference.

Review Comment:
   ```suggestion
   It also prints metrics provided by RunInference.
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+Below, you can find an example that demonstrates how to utilize TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+# Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, i.e. classifies any text into two classes: positive or negative. The trained model is easily available [here](https://huggingface.co/textattack/bert-base-uncased-SST-2). In order to convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.

Review Comment:
   ```suggestion
   To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, that is, it classifies any text into two classes: positive or negative. The trained model is available [from Hugging Face](https://huggingface.co/textattack/bert-base-uncased-SST-2). To convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+Below, you can find an example that demonstrates how to utilize TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+# Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, i.e. classifies any text into two classes: positive or negative. The trained model is easily available [here](https://huggingface.co/textattack/bert-base-uncased-SST-2). In order to convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.
+
+### Conversion to ONNX
+
+Using HuggingFace `transformers` libray, one can easily convert a PyTorch model to ONNX. A detailed blogpost can be found [here](https://huggingface.co/blog/convert-transformers-to-onnx) which also mentions the required packages to install. The code that we used for the conversion can be found below.
+
+```
+from pathlib import Path
+import transformers
+from transformers.onnx import FeaturesManager
+from transformers import AutoConfig, AutoTokenizer, AutoModelForMaskedLM, AutoModelForSequenceClassification
+
+
+# load model and tokenizer
+model_id = "textattack/bert-base-uncased-SST-2"
+feature = "sequence-classification"
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+
+# load config
+model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=feature)
+onnx_config = model_onnx_config(model.config)
+
+# export
+onnx_inputs, onnx_outputs = transformers.onnx.export(
+        preprocessor=tokenizer,
+        model=model,
+        config=onnx_config,
+        opset=12,
+        output=Path("bert-sst2-model.onnx")
+)
+```
+
+### From ONNX to TensorRT engine
+
+In order to convert an ONNX model to a TensorRT engine you can use the following command from `CLI`:

Review Comment:
   ```suggestion
   To convert an ONNX model to a TensorRT engine, use the following command from the `CLI`:
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+Below, you can find an example that demonstrates how to utilize TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+# Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, i.e. classifies any text into two classes: positive or negative. The trained model is easily available [here](https://huggingface.co/textattack/bert-base-uncased-SST-2). In order to convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.
+
+### Conversion to ONNX
+
+Using HuggingFace `transformers` libray, one can easily convert a PyTorch model to ONNX. A detailed blogpost can be found [here](https://huggingface.co/blog/convert-transformers-to-onnx) which also mentions the required packages to install. The code that we used for the conversion can be found below.
+
+```
+from pathlib import Path
+import transformers
+from transformers.onnx import FeaturesManager
+from transformers import AutoConfig, AutoTokenizer, AutoModelForMaskedLM, AutoModelForSequenceClassification
+
+
+# load model and tokenizer
+model_id = "textattack/bert-base-uncased-SST-2"
+feature = "sequence-classification"
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+
+# load config
+model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=feature)
+onnx_config = model_onnx_config(model.config)
+
+# export
+onnx_inputs, onnx_outputs = transformers.onnx.export(
+        preprocessor=tokenizer,
+        model=model,
+        config=onnx_config,
+        opset=12,
+        output=Path("bert-sst2-model.onnx")
+)
+```
+
+### From ONNX to TensorRT engine
+
+In order to convert an ONNX model to a TensorRT engine you can use the following command from `CLI`:
+```
+trtexec --onnx=<path to onnx model> --saveEngine=<path to save TensorRT engine> --useCudaGraph --verbose
+```
+
+For using `trtexec`, you can follow this [blogpost](https://developer.nvidia.com/blog/simplifying-and-accelerating-machine-learning-predictions-in-apache-beam-with-nvidia-tensorrt/) which builds a docker image built from a DockerFile. The dockerFile that we used is similar to it and can be found below:

Review Comment:
   ```suggestion
   To use `trtexec`, follow the steps in the blog post [Simplifying and Accelerating Machine Learning Predictions in Apache Beam with NVIDIA TensorRT](https://developer.nvidia.com/blog/simplifying-and-accelerating-machine-learning-predictions-in-apache-beam-with-nvidia-tensorrt/). The post explains how to build a docker image from a Docker file. We use the following Docker file, which is similar to the file used in the blog post:
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference

Review Comment:
   ```suggestion
   # Use TensorRT with RunInference
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+Below, you can find an example that demonstrates how to utilize TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+# Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, i.e. classifies any text into two classes: positive or negative. The trained model is easily available [here](https://huggingface.co/textattack/bert-base-uncased-SST-2). In order to convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.
+
+### Conversion to ONNX
+
+Using HuggingFace `transformers` libray, one can easily convert a PyTorch model to ONNX. A detailed blogpost can be found [here](https://huggingface.co/blog/convert-transformers-to-onnx) which also mentions the required packages to install. The code that we used for the conversion can be found below.
+
+```
+from pathlib import Path
+import transformers
+from transformers.onnx import FeaturesManager
+from transformers import AutoConfig, AutoTokenizer, AutoModelForMaskedLM, AutoModelForSequenceClassification
+
+
+# load model and tokenizer
+model_id = "textattack/bert-base-uncased-SST-2"
+feature = "sequence-classification"
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+
+# load config
+model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=feature)
+onnx_config = model_onnx_config(model.config)
+
+# export
+onnx_inputs, onnx_outputs = transformers.onnx.export(
+        preprocessor=tokenizer,
+        model=model,
+        config=onnx_config,
+        opset=12,
+        output=Path("bert-sst2-model.onnx")
+)
+```
+
+### From ONNX to TensorRT engine
+
+In order to convert an ONNX model to a TensorRT engine you can use the following command from `CLI`:
+```
+trtexec --onnx=<path to onnx model> --saveEngine=<path to save TensorRT engine> --useCudaGraph --verbose
+```
+
+For using `trtexec`, you can follow this [blogpost](https://developer.nvidia.com/blog/simplifying-and-accelerating-machine-learning-predictions-in-apache-beam-with-nvidia-tensorrt/) which builds a docker image built from a DockerFile. The dockerFile that we used is similar to it and can be found below:
+
+```
+ARG BUILD_IMAGE=nvcr.io/nvidia/tensorrt:22.05-py3
+
+FROM ${BUILD_IMAGE}
+
+ENV PATH="/usr/src/tensorrt/bin:${PATH}"
+
+WORKDIR /workspace
+
+RUN apt-get update && \
+    apt-get install -y software-properties-common && \
+    add-apt-repository universe && \
+    apt-get update && \
+    apt-get install -y python3.8-venv
+
+RUN pip install --no-cache-dir apache-beam[gcp]==2.43.0
+COPY --from=apache/beam_python3.8_sdk:2.43.0 /opt/apache/beam /opt/apache/beam
+
+RUN pip install --upgrade pip \
+    && pip install torch==1.13.1 \
+    && pip install torchvision>=0.8.2 \
+    && pip install pillow>=8.0.0 \
+    && pip install transformers>=4.18.0 \
+    && pip install cuda-python
+
+ENTRYPOINT [ "/opt/apache/beam/boot" ]
+```
+The blogpost also contains the instructions on how to test the TensorRT engine locally.
+
+
+## Running TensorRT Engine with RunInference in a Beam Pipeline

Review Comment:
   ```suggestion
   ## Run TensorRT engine with RunInference in a Beam pipeline
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+Below, you can find an example that demonstrates how to utilize TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+# Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, i.e. classifies any text into two classes: positive or negative. The trained model is easily available [here](https://huggingface.co/textattack/bert-base-uncased-SST-2). In order to convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.
+
+### Conversion to ONNX
+
+Using HuggingFace `transformers` libray, one can easily convert a PyTorch model to ONNX. A detailed blogpost can be found [here](https://huggingface.co/blog/convert-transformers-to-onnx) which also mentions the required packages to install. The code that we used for the conversion can be found below.
+
+```
+from pathlib import Path
+import transformers
+from transformers.onnx import FeaturesManager
+from transformers import AutoConfig, AutoTokenizer, AutoModelForMaskedLM, AutoModelForSequenceClassification
+
+
+# load model and tokenizer
+model_id = "textattack/bert-base-uncased-SST-2"
+feature = "sequence-classification"
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+
+# load config
+model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=feature)
+onnx_config = model_onnx_config(model.config)
+
+# export
+onnx_inputs, onnx_outputs = transformers.onnx.export(
+        preprocessor=tokenizer,
+        model=model,
+        config=onnx_config,
+        opset=12,
+        output=Path("bert-sst2-model.onnx")
+)
+```
+
+### From ONNX to TensorRT engine
+
+In order to convert an ONNX model to a TensorRT engine you can use the following command from `CLI`:
+```
+trtexec --onnx=<path to onnx model> --saveEngine=<path to save TensorRT engine> --useCudaGraph --verbose
+```
+
+For using `trtexec`, you can follow this [blogpost](https://developer.nvidia.com/blog/simplifying-and-accelerating-machine-learning-predictions-in-apache-beam-with-nvidia-tensorrt/) which builds a docker image built from a DockerFile. The dockerFile that we used is similar to it and can be found below:
+
+```
+ARG BUILD_IMAGE=nvcr.io/nvidia/tensorrt:22.05-py3
+
+FROM ${BUILD_IMAGE}
+
+ENV PATH="/usr/src/tensorrt/bin:${PATH}"
+
+WORKDIR /workspace
+
+RUN apt-get update && \
+    apt-get install -y software-properties-common && \
+    add-apt-repository universe && \
+    apt-get update && \
+    apt-get install -y python3.8-venv
+
+RUN pip install --no-cache-dir apache-beam[gcp]==2.43.0
+COPY --from=apache/beam_python3.8_sdk:2.43.0 /opt/apache/beam /opt/apache/beam
+
+RUN pip install --upgrade pip \
+    && pip install torch==1.13.1 \
+    && pip install torchvision>=0.8.2 \
+    && pip install pillow>=8.0.0 \
+    && pip install transformers>=4.18.0 \
+    && pip install cuda-python
+
+ENTRYPOINT [ "/opt/apache/beam/boot" ]
+```
+The blogpost also contains the instructions on how to test the TensorRT engine locally.
+
+
+## Running TensorRT Engine with RunInference in a Beam Pipeline
+
+Now that you have the TensorRT engine, you can use TensorRT engine with RunInferece in a Beam pipeline that can be run both locally and on GCP.

Review Comment:
   ```suggestion
   Now that you have the TensorRT engine, you can use TensorRT engine with RunInference in a Beam pipeline that can run both locally and on Google Cloud.
   ```



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads in memory data,
+does some preprocessing and then uses RunInference for getting prediction

Review Comment:
   ```suggestion
   preprocesses the data, and then uses RunInference for to generate predictions
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+Below, you can find an example that demonstrates how to utilize TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+# Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, i.e. classifies any text into two classes: positive or negative. The trained model is easily available [here](https://huggingface.co/textattack/bert-base-uncased-SST-2). In order to convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.
+
+### Conversion to ONNX
+
+Using HuggingFace `transformers` libray, one can easily convert a PyTorch model to ONNX. A detailed blogpost can be found [here](https://huggingface.co/blog/convert-transformers-to-onnx) which also mentions the required packages to install. The code that we used for the conversion can be found below.
+
+```
+from pathlib import Path
+import transformers
+from transformers.onnx import FeaturesManager
+from transformers import AutoConfig, AutoTokenizer, AutoModelForMaskedLM, AutoModelForSequenceClassification
+
+
+# load model and tokenizer
+model_id = "textattack/bert-base-uncased-SST-2"
+feature = "sequence-classification"
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+
+# load config
+model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=feature)
+onnx_config = model_onnx_config(model.config)
+
+# export
+onnx_inputs, onnx_outputs = transformers.onnx.export(
+        preprocessor=tokenizer,
+        model=model,
+        config=onnx_config,
+        opset=12,
+        output=Path("bert-sst2-model.onnx")
+)
+```
+
+### From ONNX to TensorRT engine
+
+In order to convert an ONNX model to a TensorRT engine you can use the following command from `CLI`:
+```
+trtexec --onnx=<path to onnx model> --saveEngine=<path to save TensorRT engine> --useCudaGraph --verbose
+```
+
+For using `trtexec`, you can follow this [blogpost](https://developer.nvidia.com/blog/simplifying-and-accelerating-machine-learning-predictions-in-apache-beam-with-nvidia-tensorrt/) which builds a docker image built from a DockerFile. The dockerFile that we used is similar to it and can be found below:
+
+```
+ARG BUILD_IMAGE=nvcr.io/nvidia/tensorrt:22.05-py3
+
+FROM ${BUILD_IMAGE}
+
+ENV PATH="/usr/src/tensorrt/bin:${PATH}"
+
+WORKDIR /workspace
+
+RUN apt-get update && \
+    apt-get install -y software-properties-common && \
+    add-apt-repository universe && \
+    apt-get update && \
+    apt-get install -y python3.8-venv
+
+RUN pip install --no-cache-dir apache-beam[gcp]==2.43.0
+COPY --from=apache/beam_python3.8_sdk:2.43.0 /opt/apache/beam /opt/apache/beam
+
+RUN pip install --upgrade pip \
+    && pip install torch==1.13.1 \
+    && pip install torchvision>=0.8.2 \
+    && pip install pillow>=8.0.0 \
+    && pip install transformers>=4.18.0 \
+    && pip install cuda-python
+
+ENTRYPOINT [ "/opt/apache/beam/boot" ]
+```
+The blogpost also contains the instructions on how to test the TensorRT engine locally.
+
+
+## Running TensorRT Engine with RunInference in a Beam Pipeline
+
+Now that you have the TensorRT engine, you can use TensorRT engine with RunInferece in a Beam pipeline that can be run both locally and on GCP.
+
+The following code example is a part of the pipeline, where you use `TensorRTEngineHandlerNumPy` to load the TensorRT engine and set other inference parameters.
+
+```
+  model_handler = TensorRTEngineHandlerNumPy(
+      min_batch_size=1,
+      max_batch_size=1,
+      engine_path=known_args.trt_model_path,
+  )
+
+  task_sentences = [
+      "Hello, my dog is cute",
+      "I hate you",
+      "Shubham Krishna is a good coder",
+  ] * 4000
+
+  tokenizer = AutoTokenizer.from_pretrained(known_args.model_id)
+
+  with beam.Pipeline(options=pipeline_options) as pipeline:
+    _ = (
+        pipeline
+        | "CreateInputs" >> beam.Create(task_sentences)
+        | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+        | "RunInference" >> RunInference(model_handler=model_handler)
+        | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer)))
+```
+
+The full code can be found [here]().

Review Comment:
   Link text (currently [here]) should be the title of the page being linked to. If it's code on GitHub, use [on GitHub].



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+Below, you can find an example that demonstrates how to utilize TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.

Review Comment:
   ```suggestion
   The following example demonstrates how to use TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
   ```



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads in memory data,
+does some preprocessing and then uses RunInference for getting prediction
+from the text classification TensorRT engine. Afterwards, it post process
+the RunInference outputs to print the input and the predicted class label.
+It also prints different metrics provided by RunInference.
+"""
+
+import argparse
+import logging
+
+import numpy as np
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.tensorrt_inference import TensorRTEngineHandlerNumPy
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from transformers import AutoTokenizer
+
+
+class Preprocess(beam.DoFn):
+  """Processes the input sentence to tokenize them.
+
+  The input sentences are tokenized as the

Review Comment:
   ```suggestion
     The input sentences are tokenized because the
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+Below, you can find an example that demonstrates how to utilize TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+# Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, i.e. classifies any text into two classes: positive or negative. The trained model is easily available [here](https://huggingface.co/textattack/bert-base-uncased-SST-2). In order to convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.
+
+### Conversion to ONNX
+
+Using HuggingFace `transformers` libray, one can easily convert a PyTorch model to ONNX. A detailed blogpost can be found [here](https://huggingface.co/blog/convert-transformers-to-onnx) which also mentions the required packages to install. The code that we used for the conversion can be found below.

Review Comment:
   ```suggestion
   You can use the Hugging Face `transformers` library to convert a PyTorch model to ONNX. For details, see the blog post [Convert Transformers to ONNX with Hugging Face Optimum](https://huggingface.co/blog/convert-transformers-to-onnx). The blog post explains which required packages to install. The following code is used for the conversion.
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+Below, you can find an example that demonstrates how to utilize TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+# Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, i.e. classifies any text into two classes: positive or negative. The trained model is easily available [here](https://huggingface.co/textattack/bert-base-uncased-SST-2). In order to convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.
+
+### Conversion to ONNX
+
+Using HuggingFace `transformers` libray, one can easily convert a PyTorch model to ONNX. A detailed blogpost can be found [here](https://huggingface.co/blog/convert-transformers-to-onnx) which also mentions the required packages to install. The code that we used for the conversion can be found below.
+
+```
+from pathlib import Path
+import transformers
+from transformers.onnx import FeaturesManager
+from transformers import AutoConfig, AutoTokenizer, AutoModelForMaskedLM, AutoModelForSequenceClassification
+
+
+# load model and tokenizer
+model_id = "textattack/bert-base-uncased-SST-2"
+feature = "sequence-classification"
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+
+# load config
+model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=feature)
+onnx_config = model_onnx_config(model.config)
+
+# export
+onnx_inputs, onnx_outputs = transformers.onnx.export(
+        preprocessor=tokenizer,
+        model=model,
+        config=onnx_config,
+        opset=12,
+        output=Path("bert-sst2-model.onnx")
+)
+```
+
+### From ONNX to TensorRT engine
+
+In order to convert an ONNX model to a TensorRT engine you can use the following command from `CLI`:
+```
+trtexec --onnx=<path to onnx model> --saveEngine=<path to save TensorRT engine> --useCudaGraph --verbose
+```
+
+For using `trtexec`, you can follow this [blogpost](https://developer.nvidia.com/blog/simplifying-and-accelerating-machine-learning-predictions-in-apache-beam-with-nvidia-tensorrt/) which builds a docker image built from a DockerFile. The dockerFile that we used is similar to it and can be found below:
+
+```
+ARG BUILD_IMAGE=nvcr.io/nvidia/tensorrt:22.05-py3
+
+FROM ${BUILD_IMAGE}
+
+ENV PATH="/usr/src/tensorrt/bin:${PATH}"
+
+WORKDIR /workspace
+
+RUN apt-get update && \
+    apt-get install -y software-properties-common && \
+    add-apt-repository universe && \
+    apt-get update && \
+    apt-get install -y python3.8-venv
+
+RUN pip install --no-cache-dir apache-beam[gcp]==2.43.0
+COPY --from=apache/beam_python3.8_sdk:2.43.0 /opt/apache/beam /opt/apache/beam
+
+RUN pip install --upgrade pip \
+    && pip install torch==1.13.1 \
+    && pip install torchvision>=0.8.2 \
+    && pip install pillow>=8.0.0 \
+    && pip install transformers>=4.18.0 \
+    && pip install cuda-python
+
+ENTRYPOINT [ "/opt/apache/beam/boot" ]
+```
+The blogpost also contains the instructions on how to test the TensorRT engine locally.

Review Comment:
   ```suggestion
   The blog post also contains instructions explaining how to test the TensorRT engine locally.
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+Below, you can find an example that demonstrates how to utilize TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+# Build a TensorRT engine for inference

Review Comment:
   ```suggestion
   ## Build a TensorRT engine for inference
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+Below, you can find an example that demonstrates how to utilize TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+# Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, i.e. classifies any text into two classes: positive or negative. The trained model is easily available [here](https://huggingface.co/textattack/bert-base-uncased-SST-2). In order to convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.
+
+### Conversion to ONNX
+
+Using HuggingFace `transformers` libray, one can easily convert a PyTorch model to ONNX. A detailed blogpost can be found [here](https://huggingface.co/blog/convert-transformers-to-onnx) which also mentions the required packages to install. The code that we used for the conversion can be found below.
+
+```
+from pathlib import Path
+import transformers
+from transformers.onnx import FeaturesManager
+from transformers import AutoConfig, AutoTokenizer, AutoModelForMaskedLM, AutoModelForSequenceClassification
+
+
+# load model and tokenizer
+model_id = "textattack/bert-base-uncased-SST-2"
+feature = "sequence-classification"
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+
+# load config
+model_kind, model_onnx_config = FeaturesManager.check_supported_model_or_raise(model, feature=feature)
+onnx_config = model_onnx_config(model.config)
+
+# export
+onnx_inputs, onnx_outputs = transformers.onnx.export(
+        preprocessor=tokenizer,
+        model=model,
+        config=onnx_config,
+        opset=12,
+        output=Path("bert-sst2-model.onnx")
+)
+```
+
+### From ONNX to TensorRT engine
+
+In order to convert an ONNX model to a TensorRT engine you can use the following command from `CLI`:
+```
+trtexec --onnx=<path to onnx model> --saveEngine=<path to save TensorRT engine> --useCudaGraph --verbose
+```
+
+For using `trtexec`, you can follow this [blogpost](https://developer.nvidia.com/blog/simplifying-and-accelerating-machine-learning-predictions-in-apache-beam-with-nvidia-tensorrt/) which builds a docker image built from a DockerFile. The dockerFile that we used is similar to it and can be found below:
+
+```
+ARG BUILD_IMAGE=nvcr.io/nvidia/tensorrt:22.05-py3
+
+FROM ${BUILD_IMAGE}
+
+ENV PATH="/usr/src/tensorrt/bin:${PATH}"
+
+WORKDIR /workspace
+
+RUN apt-get update && \
+    apt-get install -y software-properties-common && \
+    add-apt-repository universe && \
+    apt-get update && \
+    apt-get install -y python3.8-venv
+
+RUN pip install --no-cache-dir apache-beam[gcp]==2.43.0
+COPY --from=apache/beam_python3.8_sdk:2.43.0 /opt/apache/beam /opt/apache/beam
+
+RUN pip install --upgrade pip \
+    && pip install torch==1.13.1 \
+    && pip install torchvision>=0.8.2 \
+    && pip install pillow>=8.0.0 \
+    && pip install transformers>=4.18.0 \
+    && pip install cuda-python
+
+ENTRYPOINT [ "/opt/apache/beam/boot" ]
+```
+The blogpost also contains the instructions on how to test the TensorRT engine locally.
+
+
+## Running TensorRT Engine with RunInference in a Beam Pipeline
+
+Now that you have the TensorRT engine, you can use TensorRT engine with RunInferece in a Beam pipeline that can be run both locally and on GCP.
+
+The following code example is a part of the pipeline, where you use `TensorRTEngineHandlerNumPy` to load the TensorRT engine and set other inference parameters.

Review Comment:
   ```suggestion
   The following code example is a part of the pipeline. You use `TensorRTEngineHandlerNumPy` to load the TensorRT engine and to set other inference parameters.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on a diff in pull request #25226: Add TensorRT runinference example for Text Classification

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.

AnandInguva commented on code in PR #25226:
URL: https://github.com/apache/beam/pull/25226#discussion_r1097381899


##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,125 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads data from a text
+file, preprocesses the data, and then uses RunInference to generate
+predictions from the text classification TensorRT engine. Next,
+it postprocesses the RunInference outputs to print the input and
+the predicted class label.
+It also prints metrics provided by RunInference.
+"""
+
+import argparse
+import logging
+
+import numpy as np
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.tensorrt_inference import TensorRTEngineHandlerNumPy
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from transformers import AutoTokenizer
+
+
+class Preprocess(beam.DoFn):
+  """Processes the input sentences to tokenize them.
+
+  The input sentences are tokenized because the
+  model is expecting tokens.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    inputs = self._tokenizer(
+        element, return_tensors="np", padding="max_length", max_length=128)
+    return inputs.input_ids
+
+
+class Postprocess(beam.DoFn):
+  """Processes the PredictionResult to get the predicted class.
+
+  The logits are the output of the TensorRT engine.
+  We can get the class label by getting the index of
+  maximum logit using argmax.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    decoded_input = self._tokenizer.decode(
+        element.example, skip_special_tokens=True)
+    logits = element.inference[0]
+    argmax = np.argmax(logits)
+    output = "Positive" if argmax == 1 else "Negative"
+    print(f"Input: {decoded_input}, \t Sentiment: {output}")
+
+
+def parse_known_args(argv):
+  """Parses args for the workflow."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+      '--input',
+      dest='input',
+      required=True,
+      help='Path to the text file containing sentences.')
+  parser.add_argument(
+      '--trt_model_path',
+      dest='trt_model_path',
+      required=True,
+      help='Path to the pre-built textattack/bert-base-uncased-SST-2'
+      'TensorRT engine.')
+  parser.add_argument(
+      '--model_id',
+      dest='model_id',
+      default="textattack/bert-base-uncased-SST-2",
+      help="name of model.")
+  return parser.parse_known_args(argv)
+
+
+def run(
+    argv=None,
+    save_main_session=True,
+):
+  known_args, pipeline_args = parse_known_args(argv)
+  pipeline_options = PipelineOptions(pipeline_args)
+  pipeline_options.view_as(SetupOptions).save_main_session = save_main_session
+
+  model_handler = TensorRTEngineHandlerNumPy(
+      min_batch_size=1,
+      max_batch_size=1,
+      engine_path=known_args.trt_model_path,
+  )
+
+  tokenizer = AutoTokenizer.from_pretrained(known_args.model_id)
+
+  with beam.Pipeline(options=pipeline_options) as pipeline:
+    _ = (
+        pipeline
+        | "ReadSentences" >> beam.io.ReadFromText(known_args.input)
+        | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+        | "RunInference" >> RunInference(model_handler=model_handler)
+        | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer)))

Review Comment:
   ```suggestion
           | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer))
           | "LogResult" >> beam.Map(logging.info)
           )
   ```



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,125 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads data from a text
+file, preprocesses the data, and then uses RunInference to generate
+predictions from the text classification TensorRT engine. Next,
+it postprocesses the RunInference outputs to print the input and
+the predicted class label.
+It also prints metrics provided by RunInference.
+"""
+
+import argparse
+import logging
+
+import numpy as np
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.tensorrt_inference import TensorRTEngineHandlerNumPy
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from transformers import AutoTokenizer
+
+
+class Preprocess(beam.DoFn):
+  """Processes the input sentences to tokenize them.
+
+  The input sentences are tokenized because the
+  model is expecting tokens.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    inputs = self._tokenizer(
+        element, return_tensors="np", padding="max_length", max_length=128)
+    return inputs.input_ids
+
+
+class Postprocess(beam.DoFn):
+  """Processes the PredictionResult to get the predicted class.
+
+  The logits are the output of the TensorRT engine.
+  We can get the class label by getting the index of
+  maximum logit using argmax.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    decoded_input = self._tokenizer.decode(
+        element.example, skip_special_tokens=True)
+    logits = element.inference[0]
+    argmax = np.argmax(logits)
+    output = "Positive" if argmax == 1 else "Negative"
+    print(f"Input: {decoded_input}, \t Sentiment: {output}")

Review Comment:
   Can we yield the required values here and using logging.info in the pipeline to print? (commented below as well)
   
   



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,125 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads data from a text
+file, preprocesses the data, and then uses RunInference to generate
+predictions from the text classification TensorRT engine. Next,
+it postprocesses the RunInference outputs to print the input and
+the predicted class label.
+It also prints metrics provided by RunInference.
+"""
+
+import argparse
+import logging
+
+import numpy as np
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.tensorrt_inference import TensorRTEngineHandlerNumPy
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from transformers import AutoTokenizer
+
+
+class Preprocess(beam.DoFn):
+  """Processes the input sentences to tokenize them.
+
+  The input sentences are tokenized because the
+  model is expecting tokens.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    inputs = self._tokenizer(
+        element, return_tensors="np", padding="max_length", max_length=128)
+    return inputs.input_ids
+
+
+class Postprocess(beam.DoFn):
+  """Processes the PredictionResult to get the predicted class.
+
+  The logits are the output of the TensorRT engine.
+  We can get the class label by getting the index of
+  maximum logit using argmax.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    decoded_input = self._tokenizer.decode(
+        element.example, skip_special_tokens=True)
+    logits = element.inference[0]
+    argmax = np.argmax(logits)
+    output = "Positive" if argmax == 1 else "Negative"
+    print(f"Input: {decoded_input}, \t Sentiment: {output}")
+
+
+def parse_known_args(argv):
+  """Parses args for the workflow."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+      '--input',
+      dest='input',
+      required=True,
+      help='Path to the text file containing sentences.')
+  parser.add_argument(
+      '--trt_model_path',
+      dest='trt_model_path',
+      required=True,
+      help='Path to the pre-built textattack/bert-base-uncased-SST-2'
+      'TensorRT engine.')
+  parser.add_argument(
+      '--model_id',
+      dest='model_id',
+      default="textattack/bert-base-uncased-SST-2",
+      help="name of model.")
+  return parser.parse_known_args(argv)
+
+
+def run(
+    argv=None,
+    save_main_session=True,
+):
+  known_args, pipeline_args = parse_known_args(argv)
+  pipeline_options = PipelineOptions(pipeline_args)
+  pipeline_options.view_as(SetupOptions).save_main_session = save_main_session
+
+  model_handler = TensorRTEngineHandlerNumPy(
+      min_batch_size=1,
+      max_batch_size=1,
+      engine_path=known_args.trt_model_path,
+  )
+
+  tokenizer = AutoTokenizer.from_pretrained(known_args.model_id)
+
+  with beam.Pipeline(options=pipeline_options) as pipeline:
+    _ = (
+        pipeline
+        | "ReadSentences" >> beam.io.ReadFromText(known_args.input)
+        | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+        | "RunInference" >> RunInference(model_handler=model_handler)
+        | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer)))
+  metrics = pipeline.result.metrics().query(beam.metrics.MetricsFilter())
+  print(metrics)

Review Comment:
   Can we remove the print? If that is necessary, we can use logging.info or logging.debug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm merged pull request #25226: Add TensorRT runinference example for Text Classification

Posted by "damccorm (via GitHub)" <gi...@apache.org>.

damccorm merged PR #25226:
URL: https://github.com/apache/beam/pull/25226


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] github-actions[bot] commented on pull request #25226: Add TensorRT runinference example for Text Classification

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #25226:
URL: https://github.com/apache/beam/pull/25226#issuecomment-1419613011

   R: @pabloem for final approval


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on pull request #25226: Add TensorRT runinference example for Text Classification

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.

AnandInguva commented on PR #25226:
URL: https://github.com/apache/beam/pull/25226#issuecomment-1410604387

   cc: @rszper


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] pabloem commented on pull request #25226: Add TensorRT runinference example for Text Classification

Posted by "pabloem (via GitHub)" <gi...@apache.org>.

pabloem commented on PR #25226:
URL: https://github.com/apache/beam/pull/25226#issuecomment-1419722613

   I'll let @damccorm merge. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on a diff in pull request #25226: Add TensorRT runinference example for Text Classification

Posted by "AnandInguva (via GitHub)" <gi...@apache.org>.

AnandInguva commented on code in PR #25226:
URL: https://github.com/apache/beam/pull/25226#discussion_r1092104874


##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.

Review Comment:
   ```suggestion
   - In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML inference pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
   ```



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads in memory data,
+does some preprocessing and then uses RunInference for getting prediction
+from the text classification TensorRT engine. Afterwards, it post process
+the RunInference outputs to print the input and the predicted class label.
+It also prints different metrics provided by RunInference.
+"""
+
+import argparse
+import logging
+
+import numpy as np
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.tensorrt_inference import TensorRTEngineHandlerNumPy
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from transformers import AutoTokenizer
+
+
+class Preprocess(beam.DoFn):
+  """Processes the input sentence to tokenize them.
+
+  The input sentences are tokenized as the
+  model is expecting tokens.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    inputs = self._tokenizer(
+        element, return_tensors="np", padding="max_length", max_length=128)
+    return inputs.input_ids
+
+
+class Postprocess(beam.DoFn):
+  """Processes the PredictionResult to get the predicted class.
+
+  The logits are the output of the TensorRT engine.
+  We can get the class label by getting the index of
+  maximum logit using argmax.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    decoded_input = self._tokenizer.decode(
+        element.example, skip_special_tokens=True)
+    logits = element.inference[0]
+    argmax = np.argmax(logits)
+    output = "Positive" if argmax == 1 else "Negative"
+    print(f"Input: {decoded_input}, \t Sentiment: {output}")
+
+
+def parse_known_args(argv):
+  """Parses args for the workflow."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+      '--trt-model-path',
+      dest='trt_model_path',
+      required=True,
+      help='Path to the TensorRT engine.')
+  parser.add_argument(
+      '--model-id',
+      dest='model_id',
+      default="textattack/bert-base-uncased-SST-2",
+      help="name of model.")
+  return parser.parse_known_args(argv)
+
+
+def run(
+    argv=None,
+    save_main_session=True,
+):
+  known_args, pipeline_args = parse_known_args(argv)
+  pipeline_options = PipelineOptions(pipeline_args)
+  pipeline_options.view_as(SetupOptions).save_main_session = save_main_session
+
+  model_handler = TensorRTEngineHandlerNumPy(
+      min_batch_size=1,
+      max_batch_size=1,
+      engine_path=known_args.trt_model_path,
+  )
+
+  task_sentences = [

Review Comment:
   (just a suggestion) How much does it take to pickle this list? if it takes some time, we can create text file and use `ReadFromText` Ptransform for inputs.



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads in memory data,
+does some preprocessing and then uses RunInference for getting prediction
+from the text classification TensorRT engine. Afterwards, it post process
+the RunInference outputs to print the input and the predicted class label.
+It also prints different metrics provided by RunInference.
+"""
+
+import argparse
+import logging
+
+import numpy as np
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.tensorrt_inference import TensorRTEngineHandlerNumPy
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from transformers import AutoTokenizer
+
+
+class Preprocess(beam.DoFn):
+  """Processes the input sentence to tokenize them.
+
+  The input sentences are tokenized as the
+  model is expecting tokens.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    inputs = self._tokenizer(
+        element, return_tensors="np", padding="max_length", max_length=128)
+    return inputs.input_ids
+
+
+class Postprocess(beam.DoFn):
+  """Processes the PredictionResult to get the predicted class.
+
+  The logits are the output of the TensorRT engine.
+  We can get the class label by getting the index of
+  maximum logit using argmax.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    decoded_input = self._tokenizer.decode(
+        element.example, skip_special_tokens=True)
+    logits = element.inference[0]
+    argmax = np.argmax(logits)
+    output = "Positive" if argmax == 1 else "Negative"
+    print(f"Input: {decoded_input}, \t Sentiment: {output}")
+
+
+def parse_known_args(argv):
+  """Parses args for the workflow."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+      '--trt-model-path',

Review Comment:
   I see in the documentation, it was mentioned `--trt_model_path` as flag name. Can we normalize that here?



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.

Review Comment:
   Can we add a link pointing to TensorRTEnginerHandler?



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads in memory data,
+does some preprocessing and then uses RunInference for getting prediction
+from the text classification TensorRT engine. Afterwards, it post process
+the RunInference outputs to print the input and the predicted class label.
+It also prints different metrics provided by RunInference.
+"""
+
+import argparse
+import logging
+
+import numpy as np
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.tensorrt_inference import TensorRTEngineHandlerNumPy
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from transformers import AutoTokenizer
+
+
+class Preprocess(beam.DoFn):
+  """Processes the input sentence to tokenize them.
+
+  The input sentences are tokenized as the
+  model is expecting tokens.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    inputs = self._tokenizer(
+        element, return_tensors="np", padding="max_length", max_length=128)
+    return inputs.input_ids
+
+
+class Postprocess(beam.DoFn):
+  """Processes the PredictionResult to get the predicted class.
+
+  The logits are the output of the TensorRT engine.
+  We can get the class label by getting the index of
+  maximum logit using argmax.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    decoded_input = self._tokenizer.decode(
+        element.example, skip_special_tokens=True)
+    logits = element.inference[0]
+    argmax = np.argmax(logits)
+    output = "Positive" if argmax == 1 else "Negative"
+    print(f"Input: {decoded_input}, \t Sentiment: {output}")
+
+
+def parse_known_args(argv):
+  """Parses args for the workflow."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+      '--trt-model-path',
+      dest='trt_model_path',
+      required=True,
+      help='Path to the TensorRT engine.')
+  parser.add_argument(
+      '--model-id',

Review Comment:
   ```suggestion
         '--model_id',
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] pabloem commented on pull request #25226: Add TensorRT runinference example for Text Classification

Posted by "pabloem (via GitHub)" <gi...@apache.org>.

pabloem commented on PR #25226:
URL: https://github.com/apache/beam/pull/25226#issuecomment-1419631010

   Run PythonDocker PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm commented on a diff in pull request #25226: Add TensorRT runinference example for Text Classification

Posted by "damccorm (via GitHub)" <gi...@apache.org>.

damccorm commented on code in PR #25226:
URL: https://github.com/apache/beam/pull/25226#discussion_r1097844820


##########
website/www/site/layouts/partials/section-menu/en/documentation.html:
##########
@@ -225,6 +225,7 @@
     <li><a href="/documentation/ml/anomaly-detection/">Anomaly Detection</a></li>
     <li><a href="/documentation/ml/large-language-modeling">Large Language Model Inference in Beam</a></li>
     <li><a href="/documentation/ml/per-entity-training">Per Entity Training in Beam</a></li>
+    <li><a href="/documentation/ml/tensorrt-runinference">TensorRT Text Classification Inference</a></li>

Review Comment:
   ```suggestion
       <li><a href="/documentation/ml/tensorrt-runinference">TensorRT Inference</a></li>
   ```
   
   This will help this render more naturally and illustrates the main point of the document



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,150 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Use TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network to efficiently run inference on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the [TensorRTEngineHandler](https://beam.apache.org/releases/pydoc/2.43.0/apache_beam.ml.inference.tensorrt_inference.html#apache_beam.ml.inference.tensorrt_inference.TensorRTEngineHandlerNumPy), which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML inference pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+The following example that demonstrates how to use TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+## Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, that is, it classifies any text into two classes: positive or negative. The trained model is available [from HuggingFace](https://huggingface.co/textattack/bert-base-uncased-SST-2). To convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.
+
+### Conversion to ONNX
+
+You can use the HuggingFace `transformers` library to convert a PyTorch model to ONNX. For details, see the blog post [Convert Transformers to ONNX with Hugging Face Optimum](https://huggingface.co/blog/convert-transformers-to-onnx). The blog post explains which required packages to install. The following code that is used for the conversion.

Review Comment:
   ```suggestion
   You can use the HuggingFace `transformers` library to convert a PyTorch model to ONNX. For details, see the blog post [Convert Transformers to ONNX with Hugging Face Optimum](https://huggingface.co/blog/convert-transformers-to-onnx). The blog post explains which required packages to install. The following code is used for the conversion.
   ```



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,150 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Use TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network to efficiently run inference on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the [TensorRTEngineHandler](https://beam.apache.org/releases/pydoc/2.43.0/apache_beam.ml.inference.tensorrt_inference.html#apache_beam.ml.inference.tensorrt_inference.TensorRTEngineHandlerNumPy), which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML inference pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
+
+The following example that demonstrates how to use TensorRT with the RunInference API using a BERT-based text classification model in a Beam pipeline.
+
+## Build a TensorRT engine for inference
+To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis, that is, it classifies any text into two classes: positive or negative. The trained model is available [from HuggingFace](https://huggingface.co/textattack/bert-base-uncased-SST-2). To convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.

Review Comment:
   ```suggestion
   To use TensorRT with Apache Beam, you need a converted TensorRT engine file from a trained model. We take a trained BERT based text classification model that does sentiment analysis and classifies any text into two classes: positive or negative. The trained model is available [from HuggingFace](https://huggingface.co/textattack/bert-base-uncased-SST-2). To convert the PyTorch Model to TensorRT engine, you need to first convert the model to ONNX and then from ONNX to TensorRT.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] shub-kris commented on a diff in pull request #25226: Add TensorRT runinference example for Text Classification

Posted by "shub-kris (via GitHub)" <gi...@apache.org>.

shub-kris commented on code in PR #25226:
URL: https://github.com/apache/beam/pull/25226#discussion_r1092202510


##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads in memory data,
+does some preprocessing and then uses RunInference for getting prediction
+from the text classification TensorRT engine. Afterwards, it post process
+the RunInference outputs to print the input and the predicted class label.
+It also prints different metrics provided by RunInference.
+"""
+
+import argparse
+import logging
+
+import numpy as np
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.tensorrt_inference import TensorRTEngineHandlerNumPy
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from transformers import AutoTokenizer
+
+
+class Preprocess(beam.DoFn):
+  """Processes the input sentence to tokenize them.
+
+  The input sentences are tokenized as the
+  model is expecting tokens.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    inputs = self._tokenizer(
+        element, return_tensors="np", padding="max_length", max_length=128)
+    return inputs.input_ids
+
+
+class Postprocess(beam.DoFn):
+  """Processes the PredictionResult to get the predicted class.
+
+  The logits are the output of the TensorRT engine.
+  We can get the class label by getting the index of
+  maximum logit using argmax.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    decoded_input = self._tokenizer.decode(
+        element.example, skip_special_tokens=True)
+    logits = element.inference[0]
+    argmax = np.argmax(logits)
+    output = "Positive" if argmax == 1 else "Negative"
+    print(f"Input: {decoded_input}, \t Sentiment: {output}")
+
+
+def parse_known_args(argv):
+  """Parses args for the workflow."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+      '--trt-model-path',
+      dest='trt_model_path',
+      required=True,
+      help='Path to the TensorRT engine.')
+  parser.add_argument(
+      '--model-id',
+      dest='model_id',
+      default="textattack/bert-base-uncased-SST-2",
+      help="name of model.")
+  return parser.parse_known_args(argv)
+
+
+def run(
+    argv=None,
+    save_main_session=True,
+):
+  known_args, pipeline_args = parse_known_args(argv)
+  pipeline_options = PipelineOptions(pipeline_args)
+  pipeline_options.view_as(SetupOptions).save_main_session = save_main_session
+
+  model_handler = TensorRTEngineHandlerNumPy(
+      min_batch_size=1,
+      max_batch_size=1,
+      engine_path=known_args.trt_model_path,
+  )
+
+  task_sentences = [

Review Comment:
   Sure, I can do that. 



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.

Review Comment:
   Sure



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org