You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "AnandInguva (via GitHub)" <gi...@apache.org> on 2023/01/31 15:37:03 UTC
[GitHub] [beam] AnandInguva commented on a diff in pull request #25226: Add TensorRT runinference example for Text Classification

AnandInguva commented on code in PR #25226:
URL: https://github.com/apache/beam/pull/25226#discussion_r1092104874


##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.

Review Comment:
   ```suggestion
   - In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML inference pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.
   ```



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads in memory data,
+does some preprocessing and then uses RunInference for getting prediction
+from the text classification TensorRT engine. Afterwards, it post process
+the RunInference outputs to print the input and the predicted class label.
+It also prints different metrics provided by RunInference.
+"""
+
+import argparse
+import logging
+
+import numpy as np
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.tensorrt_inference import TensorRTEngineHandlerNumPy
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from transformers import AutoTokenizer
+
+
+class Preprocess(beam.DoFn):
+  """Processes the input sentence to tokenize them.
+
+  The input sentences are tokenized as the
+  model is expecting tokens.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    inputs = self._tokenizer(
+        element, return_tensors="np", padding="max_length", max_length=128)
+    return inputs.input_ids
+
+
+class Postprocess(beam.DoFn):
+  """Processes the PredictionResult to get the predicted class.
+
+  The logits are the output of the TensorRT engine.
+  We can get the class label by getting the index of
+  maximum logit using argmax.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    decoded_input = self._tokenizer.decode(
+        element.example, skip_special_tokens=True)
+    logits = element.inference[0]
+    argmax = np.argmax(logits)
+    output = "Positive" if argmax == 1 else "Negative"
+    print(f"Input: {decoded_input}, \t Sentiment: {output}")
+
+
+def parse_known_args(argv):
+  """Parses args for the workflow."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+      '--trt-model-path',
+      dest='trt_model_path',
+      required=True,
+      help='Path to the TensorRT engine.')
+  parser.add_argument(
+      '--model-id',
+      dest='model_id',
+      default="textattack/bert-base-uncased-SST-2",
+      help="name of model.")
+  return parser.parse_known_args(argv)
+
+
+def run(
+    argv=None,
+    save_main_session=True,
+):
+  known_args, pipeline_args = parse_known_args(argv)
+  pipeline_options = PipelineOptions(pipeline_args)
+  pipeline_options.view_as(SetupOptions).save_main_session = save_main_session
+
+  model_handler = TensorRTEngineHandlerNumPy(
+      min_batch_size=1,
+      max_batch_size=1,
+      engine_path=known_args.trt_model_path,
+  )
+
+  task_sentences = [

Review Comment:
   (just a suggestion) How much does it take to pickle this list? if it takes some time, we can create text file and use `ReadFromText` Ptransform for inputs.



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads in memory data,
+does some preprocessing and then uses RunInference for getting prediction
+from the text classification TensorRT engine. Afterwards, it post process
+the RunInference outputs to print the input and the predicted class label.
+It also prints different metrics provided by RunInference.
+"""
+
+import argparse
+import logging
+
+import numpy as np
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.tensorrt_inference import TensorRTEngineHandlerNumPy
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from transformers import AutoTokenizer
+
+
+class Preprocess(beam.DoFn):
+  """Processes the input sentence to tokenize them.
+
+  The input sentences are tokenized as the
+  model is expecting tokens.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    inputs = self._tokenizer(
+        element, return_tensors="np", padding="max_length", max_length=128)
+    return inputs.input_ids
+
+
+class Postprocess(beam.DoFn):
+  """Processes the PredictionResult to get the predicted class.
+
+  The logits are the output of the TensorRT engine.
+  We can get the class label by getting the index of
+  maximum logit using argmax.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    decoded_input = self._tokenizer.decode(
+        element.example, skip_special_tokens=True)
+    logits = element.inference[0]
+    argmax = np.argmax(logits)
+    output = "Positive" if argmax == 1 else "Negative"
+    print(f"Input: {decoded_input}, \t Sentiment: {output}")
+
+
+def parse_known_args(argv):
+  """Parses args for the workflow."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+      '--trt-model-path',

Review Comment:
   I see in the documentation, it was mentioned `--trt_model_path` as flag name. Can we normalize that here?



##########
website/www/site/content/en/documentation/ml/tensorrt-runinference.md:
##########
@@ -0,0 +1,149 @@
+---
+title: "TensorRT RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Using TensorRT with RunInference
+- [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt) is an SDK that facilitates high-performance machine learning inference. It is designed to work with deep learning frameworks such as TensorFlow, PyTorch, and MXNet. It focuses specifically on optimizing and running a trained neural network for inference efficiently on NVIDIA GPUs. TensorRT can maximize inference throughput with multiple optimizations while preserving model accuracy including model quantization, layer and tensor fusions, kernel auto-tuning, multi-stream executions, and efficient tensor memory usage.
+
+- In Apache Beam 2.43.0, Beam introduced the `TensorRTEngineHandler`, which lets you deploy a TensorRT engine in a Beam pipeline. The RunInference transform simplifies the ML pipeline creation process by allowing developers to use Sklearn, PyTorch, TensorFlow and now TensorRT models in production pipelines without needing lots of boilerplate code.

Review Comment:
   Can we add a link pointing to TensorRTEnginerHandler?



##########
sdks/python/apache_beam/examples/inference/tensorrt_text_classification.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A pipeline to demonstrate usage of TensorRT with RunInference
+for a text classification model. This pipeline reads in memory data,
+does some preprocessing and then uses RunInference for getting prediction
+from the text classification TensorRT engine. Afterwards, it post process
+the RunInference outputs to print the input and the predicted class label.
+It also prints different metrics provided by RunInference.
+"""
+
+import argparse
+import logging
+
+import numpy as np
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.tensorrt_inference import TensorRTEngineHandlerNumPy
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.options.pipeline_options import SetupOptions
+from transformers import AutoTokenizer
+
+
+class Preprocess(beam.DoFn):
+  """Processes the input sentence to tokenize them.
+
+  The input sentences are tokenized as the
+  model is expecting tokens.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    inputs = self._tokenizer(
+        element, return_tensors="np", padding="max_length", max_length=128)
+    return inputs.input_ids
+
+
+class Postprocess(beam.DoFn):
+  """Processes the PredictionResult to get the predicted class.
+
+  The logits are the output of the TensorRT engine.
+  We can get the class label by getting the index of
+  maximum logit using argmax.
+  """
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    decoded_input = self._tokenizer.decode(
+        element.example, skip_special_tokens=True)
+    logits = element.inference[0]
+    argmax = np.argmax(logits)
+    output = "Positive" if argmax == 1 else "Negative"
+    print(f"Input: {decoded_input}, \t Sentiment: {output}")
+
+
+def parse_known_args(argv):
+  """Parses args for the workflow."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+      '--trt-model-path',
+      dest='trt_model_path',
+      required=True,
+      help='Path to the TensorRT engine.')
+  parser.add_argument(
+      '--model-id',

Review Comment:
   ```suggestion
         '--model_id',
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org