You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/28 21:13:39 UTC
[GitHub] [beam] rszper commented on a diff in pull request #24350: Add Large Language Model RunInference Example

rszper commented on code in PR #24350:
URL: https://github.com/apache/beam/pull/24350#discussion_r1034040573


##########
sdks/python/apache_beam/examples/inference/large_language_modeling/main.py:
##########
@@ -0,0 +1,139 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License
+
+""""A pipeline that uses RunInference to perform Translation
+with T5 language model.

Review Comment:
   A pipeline that uses RunInference to perform translation
   with a T5 language model.



##########
sdks/python/apache_beam/examples/inference/large_language_modeling/main.py:
##########
@@ -0,0 +1,139 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License
+
+""""A pipeline that uses RunInference to perform Translation
+with T5 language model.
+
+This pipeline takes a list of english sentences and then uses
+the T5ForConditionalGeneration from Hugging Face to translate the
+english sentence into german.
+"""
+import argparse
+import sys
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import make_tensor_model_fn
+from apache_beam.options.pipeline_options import PipelineOptions
+from transformers import AutoConfig
+from transformers import AutoTokenizer
+from transformers import T5ForConditionalGeneration
+
+
+class Preprocess(beam.DoFn):
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    """
+        Process the raw text input to a format suitable for
+        T5ForConditionalGeneration Model Inference

Review Comment:
   T5ForConditionalGeneration model inference.



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.

Review Comment:
   First, install the required packages and pass the required arguments.



##########
sdks/python/apache_beam/examples/inference/large_language_modeling/main.py:
##########
@@ -0,0 +1,139 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License
+
+""""A pipeline that uses RunInference to perform Translation
+with T5 language model.
+
+This pipeline takes a list of english sentences and then uses
+the T5ForConditionalGeneration from Hugging Face to translate the
+english sentence into german.
+"""
+import argparse
+import sys
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import make_tensor_model_fn
+from apache_beam.options.pipeline_options import PipelineOptions
+from transformers import AutoConfig
+from transformers import AutoTokenizer
+from transformers import T5ForConditionalGeneration
+
+
+class Preprocess(beam.DoFn):
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    """
+        Process the raw text input to a format suitable for
+        T5ForConditionalGeneration Model Inference
+
+        Args:
+          element: A Pcollection

Review Comment:
   Pcollection -> PCollection



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`
+2. On GCP using Dataflow: `python main.py --runner DataflowRunner`
+
+### Explaining the Pipeline
+The pipeline can be broken down into few simple steps:

Review Comment:
   The pipeline contains the following steps:



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).

Review Comment:
   Suggested text:
   
   This example demonstrates running inference with a `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pretrained on a multi-task mixture of unsupervised and supervised tasks. Each task is converted into a text-to-text format. The example uses `T5-11B`, which contains 11 billion parameters and is 45 GB in size. In order to work well on a variety of tasks, `T5` prepends a different prefix to the input corresponding to each task. For example, for translation, the input would be: `translate English to German: …` and for summarization, it would be: `summarize: ….`. For more information about `T5`, see the [T5 overview](https://huggingface.co/docs/transformers/model_doc/t5) in the Hugging Face documentation.



##########
website/www/site/content/en/documentation/ml/overview.md:
##########
@@ -89,4 +89,5 @@ You can find examples of end-to-end AI/ML pipelines for several use cases:
 * [ML Workflow Orchestration](/documentation/ml/orchestration): Illustrates how to orchestrate ML workflows consisting of multiple steps by using Kubeflow Pipelines and Tensorflow Extended.
 * [Multi model pipelines in Beam](/documentation/ml/multi-model-pipelines): Explains how multi-model pipelines work and gives an overview of what you need to know to build one using the RunInference API.
 * [Online Clustering in Beam](/documentation/ml/online-clustering): Demonstrates how to set up a real-time clustering pipeline that can read text from Pub/Sub, convert the text into an embedding using a transformer-based language model with the RunInference API, and cluster the text using BIRCH with stateful processing.
-* [Anomaly Detection in Beam](/documentation/ml/anomaly-detection): Demonstrates how to set up an anomaly detection pipeline that reads text from Pub/Sub in real time and then detects anomalies using a trained HDBSCAN clustering model with the RunInference API.
\ No newline at end of file
+* [Anomaly Detection in Beam](/documentation/ml/anomaly-detection): Demonstrates how to set up an anomaly detection pipeline that reads text from Pub/Sub in real time and then detects anomalies using a trained HDBSCAN clustering model with the RunInference API.
+* [Large Language Model Inference in Beam](/documentation/ml/large-language-modeling): Demonstrates a pipeline that uses RunInference to perform Translation with T5 language model which contains 11 Billion parameters.

Review Comment:
   Translation -> translation
   
   I think "billion" should be lowercase, unless it's a proper noun.



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).

Review Comment:
   In Apache Beam version 2.40.0, Beam introduced the RunInference API, which lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a `PCollection` that contains the input examples and output predictions. For more information, see [RunInference](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/). You can also find [inference examples on GitHub](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?

Review Comment:
   ### Run the pipeline



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`
+2. On GCP using Dataflow: `python main.py --runner DataflowRunner`

Review Comment:
   2. On Google Cloud using Dataflow: `python main.py --runner DataflowRunner`



##########
sdks/python/apache_beam/examples/inference/large_language_modeling/main.py:
##########
@@ -0,0 +1,139 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License
+
+""""A pipeline that uses RunInference to perform Translation
+with T5 language model.
+
+This pipeline takes a list of english sentences and then uses
+the T5ForConditionalGeneration from Hugging Face to translate the
+english sentence into german.
+"""
+import argparse
+import sys
+
+import apache_beam as beam
+from apache_beam.ml.inference.base import RunInference
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import make_tensor_model_fn
+from apache_beam.options.pipeline_options import PipelineOptions
+from transformers import AutoConfig
+from transformers import AutoTokenizer
+from transformers import T5ForConditionalGeneration
+
+
+class Preprocess(beam.DoFn):
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    """
+        Process the raw text input to a format suitable for
+        T5ForConditionalGeneration Model Inference
+
+        Args:
+          element: A Pcollection
+
+        Returns:
+          The input_ids are being returned.
+        """
+    input_ids = self._tokenizer(
+        element, return_tensors="pt", padding="max_length",
+        max_length=512).input_ids
+    return input_ids
+
+
+class Postprocess(beam.DoFn):
+  def __init__(self, tokenizer: AutoTokenizer):
+    self._tokenizer = tokenizer
+
+  def process(self, element):
+    """
+        Process the PredictionResult to print the translated texts
+
+        Args:
+          element: The RunInference output to be processed.
+        """
+    decoded_inputs = self._tokenizer.decode(
+        element.example, skip_special_tokens=True)
+    decoded_outputs = self._tokenizer.decode(
+        element.inference, skip_special_tokens=True)
+    print(f"{decoded_inputs} \t Output: {decoded_outputs}")
+
+
+def parse_args(argv):
+  """Parses args for the workflow."""
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+      "--model_state_dict_path",
+      dest="model_state_dict_path",
+      required=True,
+      help="Path to the model's state_dict.",
+  )
+  parser.add_argument(
+      "--model_name",
+      dest="model_name",
+      required=True,
+      help="Path to the model's state_dict.",
+      default="t5-small",
+  )
+
+  return parser.parse_known_args(args=argv)
+
+
+def run():
+  """
+    Runs the interjector pipeline which translates english sentences
+    into german using the RunInference API. """

Review Comment:
   Runs the interjector pipeline, which translates English sentences
       into German using the RunInference API.



##########
sdks/python/apache_beam/examples/inference/large_language_modeling/main.py:
##########
@@ -0,0 +1,139 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License
+
+""""A pipeline that uses RunInference to perform Translation
+with T5 language model.
+
+This pipeline takes a list of english sentences and then uses
+the T5ForConditionalGeneration from Hugging Face to translate the
+english sentence into german.

Review Comment:
   This pipeline takes a list of English sentences and uses
   the `T5ForConditionalGeneration` from Hugging Face to translate the
   English sentences into German.



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`
+2. On GCP using Dataflow: `python main.py --runner DataflowRunner`
+
+### Explaining the Pipeline
+The pipeline can be broken down into few simple steps:
+1. Reading the inputs

Review Comment:
   1. Read the inputs.
   2. Encode the text into transformer-readable token ID integers using a tokenizer.
   3. Use RunInference to get the output.
   4. Decode the RunInference output and print it.



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)

Review Comment:
   You can view the code on [GitHub](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`

Review Comment:
   I would simplify a bit: 
   
   Note that you will need to have 45 GB of disk space available to run this example. --> You need to have 45 GB of disk space available to run this example.



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`
+2. On GCP using Dataflow: `python main.py --runner DataflowRunner`
+
+### Explaining the Pipeline
+The pipeline can be broken down into few simple steps:
+1. Reading the inputs
+2. Encoding the text into transformer-readable token ID integers using a tokenizer
+3. Using RunInference to get the output
+4. Decoding the RunInference output and printing them
+
+The code snippet for all the 4 steps can be found below:
+
+{{< highlight >}}
+    with beam.Pipeline(options=pipeline_options) as pipeline:
+        _ = (
+            pipeline
+            | "CreateInputs" >> beam.Create(task_sentences)
+            | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+            | "RunInference" >> RunInference(model_handler=model_handler)
+            | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer))
+        )
+{{< /highlight >}}
+
+We now closely look at the 3rd step of pipeline where we use `RunInference`.
+In order to use it, one has to first define a `ModelHandler`. RunInference provides model handlers for `PyTorch`, `TensorFlow` and `Scikit-Learn`. As, we are using a `PyTorch` model, so we used a `PyTorchModelHandlerTensor`.

Review Comment:
   RunInference provides model handlers for `PyTorch`, `TensorFlow`, and `Scikit-Learn`. Because the example uses a `PyTorch` model, it uses the `PyTorchModelHandlerTensor` model handler.



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`
+2. On GCP using Dataflow: `python main.py --runner DataflowRunner`
+
+### Explaining the Pipeline
+The pipeline can be broken down into few simple steps:
+1. Reading the inputs
+2. Encoding the text into transformer-readable token ID integers using a tokenizer
+3. Using RunInference to get the output
+4. Decoding the RunInference output and printing them
+
+The code snippet for all the 4 steps can be found below:
+
+{{< highlight >}}
+    with beam.Pipeline(options=pipeline_options) as pipeline:
+        _ = (
+            pipeline
+            | "CreateInputs" >> beam.Create(task_sentences)
+            | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+            | "RunInference" >> RunInference(model_handler=model_handler)
+            | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer))
+        )
+{{< /highlight >}}
+
+We now closely look at the 3rd step of pipeline where we use `RunInference`.
+In order to use it, one has to first define a `ModelHandler`. RunInference provides model handlers for `PyTorch`, `TensorFlow` and `Scikit-Learn`. As, we are using a `PyTorch` model, so we used a `PyTorchModelHandlerTensor`.
+
+{{< highlight >}}
+  gen_fn = make_tensor_model_fn('generate')
+
+  model_handler = PytorchModelHandlerTensor(
+      state_dict_path=args.model_state_dict_path,
+      model_class=T5ForConditionalGeneration,
+      model_params={"config": AutoConfig.from_pretrained(args.model_name)},
+      device="cpu",
+      inference_fn=gen_fn)
+{{< /highlight >}}
+
+`ModelHandler` requires parameters like:
+* `state_dict_path` – path to the saved dictionary of the model state.

Review Comment:
   * `state_dict_path` – The path to the saved dictionary of the model state.
   * `model_class` – The class of the Pytorch model that defines the model structure.
   * `model_params` – A dictionary of arguments required to instantiate the model class.
   * `device` – The device to run the model on. If device is set to GPU, a GPU device is used if it is available. Otherwise, a CPU device is used.
   * `inference_fn` -  The inference function to use during RunInference.



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`
+2. On GCP using Dataflow: `python main.py --runner DataflowRunner`
+
+### Explaining the Pipeline

Review Comment:
   ### Pipeline steps



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`
+2. On GCP using Dataflow: `python main.py --runner DataflowRunner`
+
+### Explaining the Pipeline
+The pipeline can be broken down into few simple steps:
+1. Reading the inputs
+2. Encoding the text into transformer-readable token ID integers using a tokenizer
+3. Using RunInference to get the output
+4. Decoding the RunInference output and printing them
+
+The code snippet for all the 4 steps can be found below:

Review Comment:
   The following code snippet contains the four steps:



##########
website/www/site/content/en/documentation/ml/large-language-modeling.md:
##########
@@ -0,0 +1,73 @@
+---
+title: "Large Language Model Inference in Beam"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+Staring from Apache Beam 2.40.0, Beam introduced the RunInference API that lets you deploy a machine learning model in a Beam pipeline. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a PCollection that contains the input examples and output predictions. You can find more information about RunInference [here](https://beam.apache.org/documentation/transforms/python/elementwise/runinference/) and some examples [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+
+## Using RunInference with very large models
+RunInference doesn't only help you deploying small sized models but also any kind of large scale models.
+
+We will demonstrate doing inference with  `T5` language model using `RunInference` in a pipeline. `T5` is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format. We will be using `T5-11B` which contains `11 Billion` parameters and is `45 GB` in size. `T5` works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g., for translation: translate English to German: …, for summarization: summarize: …. You can find more information about the `T5` [here](https://huggingface.co/docs/transformers/model_doc/t5).
+
+### How to Run the Pipeline ?
+First, make sure you have installed the required packages and pass the required arguments.
+You can find the code [here](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference/large_language_modeling/main.py)
+
+1. Locally on your machine: `python main.py --runner DirectRunner`
+2. On GCP using Dataflow: `python main.py --runner DataflowRunner`
+
+### Explaining the Pipeline
+The pipeline can be broken down into few simple steps:
+1. Reading the inputs
+2. Encoding the text into transformer-readable token ID integers using a tokenizer
+3. Using RunInference to get the output
+4. Decoding the RunInference output and printing them
+
+The code snippet for all the 4 steps can be found below:
+
+{{< highlight >}}
+    with beam.Pipeline(options=pipeline_options) as pipeline:
+        _ = (
+            pipeline
+            | "CreateInputs" >> beam.Create(task_sentences)
+            | "Preprocess" >> beam.ParDo(Preprocess(tokenizer=tokenizer))
+            | "RunInference" >> RunInference(model_handler=model_handler)
+            | "PostProcess" >> beam.ParDo(Postprocess(tokenizer=tokenizer))
+        )
+{{< /highlight >}}
+
+We now closely look at the 3rd step of pipeline where we use `RunInference`.
+In order to use it, one has to first define a `ModelHandler`. RunInference provides model handlers for `PyTorch`, `TensorFlow` and `Scikit-Learn`. As, we are using a `PyTorch` model, so we used a `PyTorchModelHandlerTensor`.

Review Comment:
   In the third step...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org