You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/28 20:44:16 UTC
[GitHub] [beam] damccorm commented on a diff in pull request #24350: Add Large Language Model RunInference Example

damccorm commented on code in PR #24350:
URL: https://github.com/apache/beam/pull/24350#discussion_r1034018891


##########
sdks/python/apache_beam/examples/inference/large_language_modeling/main.py:
##########
@@ -0,0 +1,139 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License
+
+""""A pipeline that uses RunInference to perform Translation
+with T5 language model.
+
+This pipeline takes a list of english sentences and then uses
+the T5ForConditionalGeneration from Hugging Face to translate the
+english sentence into german.
+"""
+import argparse
+import sys
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import make_tensor_model_fn
+from apache_beam.options.pipeline_options import PipelineOptions
+from transformers import AutoConfig
+from transformers import AutoTokenizer
+from transformers import T5ForConditionalGeneration
+
+
+class Preprocess(beam.DoFn):
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    """
+        Process the raw text input to a format suitable for
+        T5ForConditionalGeneration Model Inference
+
+        Args:
+          element: A Pcollection
+
+        Returns:
+          The input_ids are being returned.

Review Comment:
   ```suggestion
             element: a string of text
   
           Returns:
             A tokenized example that can be read by the T5ForConditionalGeneration model
   ```



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).

Review Comment:
   ```suggestion
   We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task. For example, for translation you the input would be: `translate English to German: …` and for summarization it would be: `summarize: ….`. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
   ```



##########
website/www/site/content/en/documentation/ml/overview.md:
##########
@@ -89,4 +89,5 @@ You can find examples of end-to-end AI/ML pipelines for several use cases:
 * [ML Workflow Orchestration](/documentation/ml/orchestration): Illustrates how to orchestrate ML workflows consisting of multiple steps by using Kubeflow Pipelines and Tensorflow Extended.
 * [Multi model pipelines in Beam](/documentation/ml/multi-model-pipelines): Explains how multi-model pipelines work and gives an overview of what you need to know to build one using the RunInference API.
 * [Online Clustering in Beam](/documentation/ml/online-clustering): Demonstrates how to set up a real-time clustering pipeline that can read text from Pub/Sub, convert the text into an embedding using a transformer-based language model with the RunInference API, and cluster the text using BIRCH with stateful processing.
-* [Anomaly Detection in Beam](/documentation/ml/anomaly-detection): Demonstrates how to set up an anomaly detection pipeline that reads text from Pub/Sub in real time and then detects anomalies using a trained HDBSCAN clustering model with the RunInference API.
\ No newline at end of file
+* [Anomaly Detection in Beam](/documentation/ml/anomaly-detection): Demonstrates how to set up an anomaly detection pipeline that reads text from Pub/Sub in real time and then detects anomalies using a trained HDBSCAN clustering model with the RunInference API.
+* [Large Language Model Inference in Beam](/documentation/ml/large-language-modeling): Demonstrates a pipeline that uses RunInference to perform Translation with T5 language model which contains 11 Billion parameters.

Review Comment:
   ```suggestion
   * [Large Language Model Inference in Beam](/documentation/ml/large-language-modeling): Demonstrates a pipeline that uses RunInference to perform Translation with the T5 language model which contains 11 Billion parameters.
   ```



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`
+2. On GCP using Dataflow: `python main.py --runner DataflowRunner`
+
+### Explaining the Pipeline
+The pipeline can be broken down into few simple steps:
+1. Reading the inputs
+2. Encoding the text into transformer-readable token ID integers using a tokenizer
+3. Using RunInference to get the output
+4. Decoding the RunInference output and printing them

Review Comment:
   ```suggestion
   4. Decoding the RunInference output and printing it
   ```



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`
+2. On GCP using Dataflow: `python main.py --runner DataflowRunner`
+
+### Explaining the Pipeline
+The pipeline can be broken down into few simple steps:
+1. Reading the inputs
+2. Encoding the text into transformer-readable token ID integers using a tokenizer
+3. Using RunInference to get the output
+4. Decoding the RunInference output and printing them
+
+The code snippet for all the 4 steps can be found below:
+
+{{< highlight >}}
+    with beam.Pipeline(options=pipeline_options) as pipeline:
+        _ = (
+            pipeline
+            | "CreateInputs" >> beam.Create(task_sentences)
+            | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+            | "RunInference" >> RunInference(model_handler=model_handler)
+            | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer))
+        )
+{{< /highlight >}}
+
+We now closely look at the 3rd step of pipeline where we use `RunInference`.
+In order to use it, one has to first define a `ModelHandler`. RunInference provides model handlers for `PyTorch`, `TensorFlow` and `Scikit-Learn`. As, we are using a `PyTorch` model, so we used a `PyTorchModelHandlerTensor`.

Review Comment:
   ```suggestion
   In the 3rd step of pipeline we use `RunInference`.
   In order to use it, you must first define a `ModelHandler`. RunInference provides model handlers for `PyTorch`, `TensorFlow` and `Scikit-Learn`. Since the example uses a `PyTorch` model, it uses the `PyTorchModelHandlerTensor` model handler.
   ```



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.

Review Comment:
   ```suggestion
   RunInference works well on arbitrarily large models as long as they can fit on your hardware.
   ```



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`
+2. On GCP using Dataflow: `python main.py --runner DataflowRunner`
+
+### Explaining the Pipeline
+The pipeline can be broken down into few simple steps:
+1. Reading the inputs
+2. Encoding the text into transformer-readable token ID integers using a tokenizer
+3. Using RunInference to get the output
+4. Decoding the RunInference output and printing them
+
+The code snippet for all the 4 steps can be found below:
+
+{{< highlight >}}
+    with beam.Pipeline(options=pipeline_options) as pipeline:
+        _ = (
+            pipeline
+            | "CreateInputs" >> beam.Create(task_sentences)
+            | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+            | "RunInference" >> RunInference(model_handler=model_handler)
+            | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer))
+        )
+{{< /highlight >}}
+
+We now closely look at the 3rd step of pipeline where we use `RunInference`.
+In order to use it, one has to first define a `ModelHandler`. RunInference provides model handlers for `PyTorch`, `TensorFlow` and `Scikit-Learn`. As, we are using a `PyTorch` model, so we used a `PyTorchModelHandlerTensor`.
+
+{{< highlight >}}
+  gen_fn = make_tensor_model_fn('generate')
+
+  model_handler = PytorchModelHandlerTensor(
+      state_dict_path=args.model_state_dict_path,
+      model_class=T5ForConditionalGeneration,
+      model_params={"config": AutoConfig.from_pretrained(args.model_name)},
+      device="cpu",
+      inference_fn=gen_fn)
+{{< /highlight >}}
+
+`ModelHandler` requires parameters like:
+* `state_dict_path` – path to the saved dictionary of the model state.
+* `model_class` – class of the Pytorch model that defines the model structure.
+* `model_params` – A dictionary of arguments required to instantiate the model class.
+* `device` – the device on which you wish to run the model. If device = GPU then a GPU device will be used if it is available. Otherwise, it will be CPU.
+* `inference_fn` -  the inference function to use during RunInference.

Review Comment:
   ```suggestion
   * `state_dict_path` – the path to the saved dictionary of the model state.
   * `model_class` – the class of the Pytorch model that defines the model structure.
   * `model_params` – the dictionary of arguments required to instantiate the model class.
   * `device` – the device on which you wish to run the model. If device = GPU then a GPU device will be used if it is available. Otherwise, it will be CPU.
   * `inference_fn` -  the inference function to use during RunInference.
   ```



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`

Review Comment:
   ```suggestion
   1. Locally on your machine: `python main.py --runner DirectRunner`. Note that you will need to have 45 GB of disk space available to run this example.
   ```



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`
+2. On GCP using Dataflow: `python main.py --runner DataflowRunner`
+
+### Explaining the Pipeline
+The pipeline can be broken down into few simple steps:
+1. Reading the inputs
+2. Encoding the text into transformer-readable token ID integers using a tokenizer
+3. Using RunInference to get the output
+4. Decoding the RunInference output and printing them
+
+The code snippet for all the 4 steps can be found below:
+
+{{< highlight >}}
+    with beam.Pipeline(options=pipeline_options) as pipeline:
+        _ = (
+            pipeline
+            | "CreateInputs" >> beam.Create(task_sentences)
+            | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+            | "RunInference" >> RunInference(model_handler=model_handler)
+            | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer))
+        )
+{{< /highlight >}}
+
+We now closely look at the 3rd step of pipeline where we use `RunInference`.
+In order to use it, one has to first define a `ModelHandler`. RunInference provides model handlers for `PyTorch`, `TensorFlow` and `Scikit-Learn`. As, we are using a `PyTorch` model, so we used a `PyTorchModelHandlerTensor`.
+
+{{< highlight >}}
+  gen_fn = make_tensor_model_fn('generate')
+
+  model_handler = PytorchModelHandlerTensor(
+      state_dict_path=args.model_state_dict_path,
+      model_class=T5ForConditionalGeneration,
+      model_params={"config": AutoConfig.from_pretrained(args.model_name)},
+      device="cpu",
+      inference_fn=gen_fn)
+{{< /highlight >}}
+
+`ModelHandler` requires parameters like:

Review Comment:
   ```suggestion
   A `ModelHandler` allows you to specify parameters like:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org