You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/12 20:40:48 UTC

[GitHub] [beam] rszper opened a new pull request, #22250: Rszper run inference docs

rszper opened a new pull request, #22250:
URL: https://github.com/apache/beam/pull/22250

   Adding documentation for the RunInference API.
   
   - Added a ML page in the Python section, linked to from the Python SDK page.
   - Added a RunInference transform page, linked to from the Python Transforms page.
   - Added snippets and output for the RunInference transforms page examples.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Update RunInference documentation

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r922553248


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -171,7 +171,7 @@ In some cases, the `PredictionResults` output might not include the correct pred
 
 The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
 
-To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49) and our [Bert language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py).

Review Comment:
   Oops. This should be fixed now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn merged pull request #22250: Update RunInference documentation

Posted by GitBox <gi...@apache.org>.

tvalentyn merged PR #22250:
URL: https://github.com/apache/beam/pull/22250


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921526878


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   Do we need `FileSystems` here? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r919980966


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = data | RunInference(ModelHandlerB)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(ModelHandlerB)
+```
+
+### Use a key handler
+
+If a key is attached to the examples, use the `KeyedModelHandler`:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', np.array[[1,2,3],[4,5,6],...]),
+      ('img2', np.array[[1,2,3],[4,5,6],...]),
+      ('img3', np.array[[1,2,3],[4,5,6],...]),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+### Use the prediction results object
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 
+                | 'PyTorchRunInference' >> RunInference(KeyedModelHandler)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Prediction results missing
+
+When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. 
+
+Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`.
+
+When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see an [example with the batching flag added](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.
+
+To avoid this issue:
+
+1. Either use elements that have the same size, or resize image inputs and word embeddings to make them 
+the same size. Depending on the language model and encoding technique, this option might not be available. 

Review Comment:
   ```suggestion
   the same size. For NLP use cases, this might not be possible to do with text of varying lengths.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

tvalentyn commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921705587


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch

Review Comment:
   In which environment do you run the snippet examples? 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on a diff in pull request #22250: Update RunInference documentation

Posted by GitBox <gi...@apache.org>.

tvalentyn commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r923676270


##########
website/www/site/layouts/partials/section-menu/en/sdks.html:
##########
@@ -39,6 +39,7 @@
     <li><a href="/documentation/sdks/python-dependencies/">Python SDK dependencies</a></li>
     <li><a href="/documentation/sdks/python-streaming/">Python streaming pipelines</a></li>
     <li><a href="/documentation/sdks/python-type-safety/">Ensuring Python type safety</a></li>
+    <li><a href="/documentation/sdks/python-machine-learning/">Machine Learning</a></li>

Review Comment:
   should we also link this content at https://beam.apache.org/documentation/resources/learning-resources/#machine-learning ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rezarokni commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rezarokni commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r919424219


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,167 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API allows you to find the input that determined the prediction without returning to the full input data.

Review Comment:
   Change : line ...find the input that
   To: Built in capabilities for dealing with [keyed values](link)



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,167 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.

Review Comment:
   Can we add links to the pydocs for the frameworks please.



##########
website/www/site/content/en/documentation/sdks/python.md:
##########
@@ -48,6 +48,11 @@ language-specific implementation guidance.
 
 ## Using Beam Python SDK in your ML pipelines
 
+To use the Beam Python SDK with your machine learning pipelines, you can either use the RunInference API or TensorFlow.

Review Comment:
   change: use the RunInference API or TensorFlow
   to: use the RunInference API for PyTorch and Sklearn models. If using Tensorflow model you can make use of the library from tfx_bsl. Further integrations for tensorflow are planned. 
   



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,167 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API allows you to find the input that determined the prediction without returning to the full input data.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in a worker, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API allows you to build complex multi-model pipelines with minimum effort. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.

Review Comment:
   The RunInference API can be composed into multi-model pipelines 



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,167 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API allows you to find the input that determined the prediction without returning to the full input data.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in a worker, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the

Review Comment:
   change: each thread in the worker
   to: each thread in the process



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,167 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API allows you to find the input that determined the prediction without returning to the full input data.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in a worker, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API allows you to build complex multi-model pipelines with minimum effort. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = data | RunInference(ModelHandlerB)
+```
+
+#### Ensemble Pattern

Review Comment:
   Models in Sequence pattern .
   
   Code should be (note 2nd line is different):
   
      data = p | 'Read' >> beam.ReadFromSource('a_source') 
      model_a_predictions = data | RunInference(ModelHandlerA)
      model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(ModelHandlerB)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on PR #22250:
URL: https://github.com/apache/beam/pull/22250#issuecomment-1183103715

   Added the RunInference examples -> https://github.com/apache/beam/pull/22254/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920329126


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.

Review Comment:
   we apply a transformation to the batched elements. (the batched elements are applied with the transformation)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r919468671


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,167 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API allows you to find the input that determined the prediction without returning to the full input data.

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920468376


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```

Review Comment:
   Would it make sense to reword it to something like this, and keep the (refactored) code block?
   
   To import models, you need to wrap them around a `ModelHandler` object. The `ModelHandler` you import will depend on the framework and type of data structure that contains the inputs. See the following examples on which ones you may want to import. 
   ```
   from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
   from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
   from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
   from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on a diff in pull request #22250: Update RunInference documentation

Posted by GitBox <gi...@apache.org>.

tvalentyn commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r922548662


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -171,7 +171,7 @@ In some cases, the `PredictionResults` output might not include the correct pred
 
 The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
 
-To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49) and our [Bert language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py).

Review Comment:
   these are the same links, looks like not the change we intended to make?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r919469200


##########
website/www/site/content/en/documentation/sdks/python.md:
##########
@@ -48,6 +48,11 @@ language-specific implementation guidance.
 
 ## Using Beam Python SDK in your ML pipelines
 
+To use the Beam Python SDK with your machine learning pipelines, you can either use the RunInference API or TensorFlow.

Review Comment:
   Updated, but I'm not sure if we'll need to update this again based on the email thread?



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,167 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API allows you to find the input that determined the prediction without returning to the full input data.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in a worker, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921544869


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   Yep 
   ```
   try:
     from apache_beam.io.gcp.gcsfilesystem import GCSFileSystem
   except ImportError:
     GCSFileSystem = None  # type: ignore
   ```
   
   Something like this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on PR #22250:
URL: https://github.com/apache/beam/pull/22250#issuecomment-1185750976

   @tvalentyn @pabloem This one too please https://github.com/apache/beam/pull/22069. The tests are queued, but it should be fine since it's only the .md file that was modified.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r919947632


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.

Review Comment:
   ```suggestion
   The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing, or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = data | RunInference(ModelHandlerB)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(ModelHandlerB)
+```
+
+### Use a key handler

Review Comment:
   ```suggestion
   ### Use a keyed ModelHandler
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.

Review Comment:
   ```suggestion
   To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together, which are then applied with a transformation appropriate for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).

Review Comment:
   Not sure if we want to mentioned that we can deal with keyed values in this section as to why to use RunInference.



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+

Review Comment:
   ```suggestion
   
   ####  Scikit-learn
   
   You need to provide a path, accessible by the pipeline, to a file that contains the pickled Scikit-learn model. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
   
   1. Download the pickled model class and host them in a location that the pipeline can access.
   2. Pass the path of the model to the Sklearn `model_handler` by using the following code: `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>` (where you can specify ModelFileType.PICKLE or ModelFileType.JOBLIB, depending on how the model was serialized).
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = data | RunInference(ModelHandlerB)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(ModelHandlerB)
+```
+
+### Use a key handler
+
+If a key is attached to the examples, use the `KeyedModelHandler`:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', np.array[[1,2,3],[4,5,6],...]),
+      ('img2', np.array[[1,2,3],[4,5,6],...]),
+      ('img3', np.array[[1,2,3],[4,5,6],...]),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+### Use the prediction results object
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 
+                | 'PyTorchRunInference' >> RunInference(KeyedModelHandler)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Prediction results missing
+
+When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. 
+
+Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`.
+
+When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see an [example with the batching flag added](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).

Review Comment:
   ```suggestion
   You might encounter thy the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps key to predictions and additional metadata. An example return type is `Dict[str, Tensor]`.
   
   The RunInference API currently expects outputs to be an `Iterable[Any]`. Example types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
   
   To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:

Review Comment:
   Does "single line" refer to another line in the pipeline? Because it will certainly take more than 1 line to do the imports, set up the handlers, and do other model-related setup.



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).

Review Comment:
   Link should be `# Use a key handler`. Though if we apply my suggestion `# Use a keyed ModelHandler`



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:

Review Comment:
   ```suggestion
   To import models, you need to wrap them around a `ModelHandler` object. Add one or more of the following lines of code, depending on the framework and type of data structure that contains the inputs:
   ```



##########
website/www/site/content/en/documentation/sdks/python.md:
##########
@@ -48,6 +48,11 @@ language-specific implementation guidance.
 
 ## Using Beam Python SDK in your ML pipelines
 
+To use the Beam Python SDK with your machine learning pipelines, use the RunInference API for PyTorch and Sklearn models. If using Tensorflow model, you can make use of the library from `tfx_bsl`. Further integrations for TensorFlow are planned.

Review Comment:
   ```suggestion
   To integrate machine learning models into your pipelines for making inferences, use the RunInference API for PyTorch and Scikit-learn models. If you are using TensorFlow models, you can make use of the library from `tfx_bsl`. Further integrations for TensorFlow are planned.
   ```



##########
website/www/site/content/en/documentation/sdks/python.md:
##########
@@ -48,6 +48,11 @@ language-specific implementation guidance.
 
 ## Using Beam Python SDK in your ML pipelines

Review Comment:
   Not sure the best wording here..
   ```suggestion
   ## Making machine learning inferences with Python
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results

Review Comment:
   I don't think we need this `Prediction results` section under `Why use the RunInference API?` since it doesn't explain why one should use RunInference. We talk about it in the next section when explaining how to use the API.



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = data | RunInference(ModelHandlerB)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(ModelHandlerB)
+```
+
+### Use a key handler
+
+If a key is attached to the examples, use the `KeyedModelHandler`:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', np.array[[1,2,3],[4,5,6],...]),
+      ('img2', np.array[[1,2,3],[4,5,6],...]),
+      ('img3', np.array[[1,2,3],[4,5,6],...]),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+### Use the prediction results object

Review Comment:
   ```suggestion
   ### Use the PredictionResults object
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:

Review Comment:
   ```suggestion
   You need to provide a path, accessible by the pipeline, to a file that contains the model saved weights. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models

Review Comment:
   ```suggestion
   ### Use multiple models
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).

Review Comment:
   ```suggestion
   RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable users to easily plug in models into their pipelines and enjoy a transform optimized for machine learning inferences.
   
   By leveraging how users can create arbitrarily complex workflow graphs, users can then easily build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.

Review Comment:
   ```suggestion
   2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+

Review Comment:
   ```suggestion
   #### PyTorch
   ```



##########
website/www/site/content/en/documentation/sdks/python.md:
##########
@@ -48,6 +48,11 @@ language-specific implementation guidance.
 
 ## Using Beam Python SDK in your ML pipelines
 
+To use the Beam Python SDK with your machine learning pipelines, use the RunInference API for PyTorch and Sklearn models. If using Tensorflow model, you can make use of the library from `tfx_bsl`. Further integrations for TensorFlow are planned.

Review Comment:
   Here's the link to the tfx_bsl repo: https://github.com/tensorflow/tfx-bsl/tree/master/tfx_bsl/beam. Should we add a link?
   
   Here's the issue for the TF work https://github.com/apache/beam/issues/21442. Should we add this link?
   
   



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = data | RunInference(ModelHandlerB)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(ModelHandlerB)
+```
+
+### Use a key handler
+
+If a key is attached to the examples, use the `KeyedModelHandler`:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', np.array[[1,2,3],[4,5,6],...]),
+      ('img2', np.array[[1,2,3],[4,5,6],...]),
+      ('img3', np.array[[1,2,3],[4,5,6],...]),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+### Use the prediction results object
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 
+                | 'PyTorchRunInference' >> RunInference(KeyedModelHandler)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Prediction results missing
+
+When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. 
+
+Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`.
+
+When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see an [example with the batching flag added](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.
+
+To avoid this issue:
+
+1. Either use elements that have the same size, or resize image inputs and word embeddings to make them 
+the same size. Depending on the language model and encoding technique, this option might not be available. 

Review Comment:
   ```suggestion
   1. Use elements that have the same size. For CV applications, you would need to resize the images to the same dimensions. For NLP applications, you would need to resize text or word embeddings to the same length. This, however, might not be feasible with texts of varying lengths.
   ```



##########
website/www/site/content/en/documentation/transforms/python/elementwise/runinference.md:
##########
@@ -0,0 +1,94 @@
+---
+title: "RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+
+{{< localstorage language language-py >}}
+
+{{< button-pydoc path="apache_beam.ml.inference" class="RunInference" >}}
+
+Uses models to do local and remote inference. A `RunInference` transform uses a `PCollection` of examples to create a machine learning (ML) model. The transform outputs a `PCollection` that contains the input examples and output predictions.
+
+You must have Apache Beam 2.40.0 or later installed to run these pipelines.
+
+See more [RunInference API pipeline examples](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+## PyTorch dependencies
+
+The RunInference API supports the PyTorch framework. To use PyTorch locally, first install `torch`. To install `torch`, in your terminal, run the following command:
+
+`pip install torch==1.10.0`
+
+If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you also need to install `torchvision`. To install `torchvision`, in your terminal, run the following command:
+
+`pip install torchvision`
+
+If you are using pretrained models from Hugging Face's [`transformers` package](https://huggingface.co/docs/transformers/index), you need to install `transformers`. To install `transformers`, in your terminal, run the following command:
+
+`pip install transformers`
+
+For information about installing the `torch` dependency on a distributed runner such as Dataflow, see the [PyPI dependency instructions](/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).

Review Comment:
   Can probably be removed if we replace examples with the simpler ones from Anand's notebook



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = data | RunInference(ModelHandlerB)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(ModelHandlerB)
+```
+
+### Use a key handler
+
+If a key is attached to the examples, use the `KeyedModelHandler`:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', np.array[[1,2,3],[4,5,6],...]),
+      ('img2', np.array[[1,2,3],[4,5,6],...]),
+      ('img3', np.array[[1,2,3],[4,5,6],...]),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+### Use the prediction results object
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 
+                | 'PyTorchRunInference' >> RunInference(KeyedModelHandler)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Prediction results missing
+
+When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. 
+
+Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`.
+
+When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see an [example with the batching flag added](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.
+
+To avoid this issue:

Review Comment:
   ```suggestion
   There are two solutions to avoid this issue:
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = data | RunInference(ModelHandlerB)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(ModelHandlerB)
+```
+
+### Use a key handler
+
+If a key is attached to the examples, use the `KeyedModelHandler`:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', np.array[[1,2,3],[4,5,6],...]),
+      ('img2', np.array[[1,2,3],[4,5,6],...]),
+      ('img3', np.array[[1,2,3],[4,5,6],...]),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+### Use the prediction results object
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 
+                | 'PyTorchRunInference' >> RunInference(KeyedModelHandler)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Prediction results missing

Review Comment:
   ```suggestion
   ### Incorrect inferences in the PredictionResult object
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = data | RunInference(ModelHandlerB)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(ModelHandlerB)
+```
+
+### Use a key handler
+
+If a key is attached to the examples, use the `KeyedModelHandler`:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', np.array[[1,2,3],[4,5,6],...]),
+      ('img2', np.array[[1,2,3],[4,5,6],...]),
+      ('img3', np.array[[1,2,3],[4,5,6],...]),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+### Use the prediction results object
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 
+                | 'PyTorchRunInference' >> RunInference(KeyedModelHandler)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Prediction results missing
+
+When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. 
+
+Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`.
+
+When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see an [example with the batching flag added](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.
+
+To avoid this issue:
+
+1. Either use elements that have the same size, or resize image inputs and word embeddings to make them 
+the same size. Depending on the language model and encoding technique, this option might not be available. 

Review Comment:
   ```suggestion
   the same size. For NLP use cases, this might not be possible to do with text of varying lengths.
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.

Review Comment:
   The examples here `tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution` are highly NLP specific. Would it be better to generalize?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920490038


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   

Review Comment:
   I added a line under the first place this appears saying "Where model_handler is the model handler setup code." I can add this in every place we use a placeholder, but I don't *think* it's necessary.
   
   I also added brackets around all the model handler placeholders in the code and made the styling consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920521501


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   

Review Comment:
   I decided to add the line under all the cases; I think there are only three.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on a diff in pull request #22250: Update RunInference documentation

Posted by GitBox <gi...@apache.org>.

tvalentyn commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r922548975


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -171,7 +171,7 @@ In some cases, the `PredictionResults` output might not include the correct pred
 
 The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
 
-To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49) and our [Bert language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py).

Review Comment:
   my last comment referred to disable batching section



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921113109


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference_test.py:
##########
@@ -0,0 +1,93 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+import unittest
+from io import StringIO
+
+import mock
+
+from apache_beam.examples.snippets.util import assert_matches_stdout
+from apache_beam.testing.test_pipeline import TestPipeline
+
+from . import runinference
+
+def check_torch_keyed_model_handler(actual):
+  expected = '''[START torch_keyed_model_handler]
+('first_question', PredictionResult(example=tensor([105.]), inference=tensor([523.6982], grad_fn=<UnbindBackward0>)))
+('second_question', PredictionResult(example=tensor([108.]), inference=tensor([538.5867], grad_fn=<UnbindBackward0>)))
+('third_question', PredictionResult(example=tensor([1000.]), inference=tensor([4965.4019], grad_fn=<UnbindBackward0>)))
+('fourth_question', PredictionResult(example=tensor([1013.]), inference=tensor([5029.9180], grad_fn=<UnbindBackward0>)))
+[END torch_keyed_model_handler]'''.splitlines()[1:-1]
+  assert_matches_stdout(actual, expected)
+
+
+def check_sklearn_keyed_model_handler(actual):
+  expected = '''[START sklearn_keyed_model_handler]
+('first_question', PredictionResult(example=[105.0], inference=array([525.])))
+('second_question', PredictionResult(example=[108.0], inference=array([540.])))
+('third_question', PredictionResult(example=[1000.0], inference=array([5000.])))
+('fourth_question', PredictionResult(example=[1013.0], inference=array([5065.])))
+[END sklearn_keyed_model_handler] '''.splitlines()[1:-1]
+  assert_matches_stdout(actual, expected)
+
+
+def check_torch_unkeyed_model_handler(actual):
+  expected = '''[START torch_unkeyed_model_handler]
+PredictionResult(example=tensor([10.]), inference=tensor([52.2325], grad_fn=<UnbindBackward0>))
+PredictionResult(example=tensor([40.]), inference=tensor([201.1165], grad_fn=<UnbindBackward0>))
+PredictionResult(example=tensor([60.]), inference=tensor([300.3724], grad_fn=<UnbindBackward0>))
+PredictionResult(example=tensor([90.]), inference=tensor([449.2563], grad_fn=<UnbindBackward0>))
+[END torch_unkeyed_model_handler] '''.splitlines()[1:-1]
+  assert_matches_stdout(actual, expected)
+
+
+def check_sklearn_unkeyed_model_handler(actual):
+  expected = '''[START sklearn_unkeyed_model_handler]
+PredictionResult(example=array([20.], dtype=float32), inference=array([100.], dtype=float32))
+PredictionResult(example=array([40.], dtype=float32), inference=array([200.], dtype=float32))
+PredictionResult(example=array([60.], dtype=float32), inference=array([300.], dtype=float32))
+PredictionResult(example=array([90.], dtype=float32), inference=array([450.], dtype=float32))
+[END sklearn_unkeyed_model_handler]  '''.splitlines()[1:-1]
+  assert_matches_stdout(actual, expected)
+
+@mock.patch('apache_beam.Pipeline', TestPipeline)
+@mock.patch(
+    'apache_beam.examples.snippets.transforms.elementwise.runinference.print', str)
+class RunInferenceTest(unittest.TestCase):
+  def test_torch_unkeyed_model_handler(self):
+    runinference.torch_unkeyed_model_handler(check_torch_unkeyed_model_handler)
+
+  def test_torch_keyed_model_handler(self):
+    runinference.torch_keyed_model_handler(check_torch_keyed_model_handler)
+
+  def test_sklearn_unkeyed_model_handler(self):
+    runinference.sklearn_unkeyed_model_handler(check_sklearn_unkeyed_model_handler)
+
+  def test_sklearn_keyed_model_handler(self):
+    runinference.sklearn_keyed_model_handler(check_sklearn_keyed_model_handler)
+
+  def test_images(self):
+    runinference.images(check_images)
+
+  def test_digits(self):
+    runinference.digits(check_digits)
+

Review Comment:
   This can be removed now?



##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   Can't find this filesystem in the unit tests. Can you double check this @AnandInguva 



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 

Review Comment:
   ```suggestion
      data = p | 'Read' >> beam.ReadFromSource('a_source')
   ```



##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch

Review Comment:
   @AnandInguva I see `ModuleNotFoundError: No module named 'torch'`. Do you know where we can install extra deps for snippet examples? 



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   

Review Comment:
   ```suggestion
      predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 

Review Comment:
   ```suggestion
   
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 

Review Comment:
   ```suggestion
   
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 

Review Comment:
   ```suggestion
   
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 

Review Comment:
   ```suggestion
      data = p | 'Read' >> beam.ReadFromSource('a_source')
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(KeyedModelHandler)
+```
+
+### Use the PredictionResults object
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions.
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. When keys are passed with the input data to the RunInference transform, the output `PCollection` returns a `Tuple[str, PredictionResult]`, which is the key and the `PredictionResult` object. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 
+                | 'PyTorchRunInference' >> RunInference(<keyed_model_handler>)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 

Review Comment:
   ```suggestion
   For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65).
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(KeyedModelHandler)
+```
+
+### Use the PredictionResults object
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions.
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. When keys are passed with the input data to the RunInference transform, the output `PCollection` returns a `Tuple[str, PredictionResult]`, which is the key and the `PredictionResult` object. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 

Review Comment:
   ```suggestion
           p | 'Read' >> beam.ReadFromSource('a_source')
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921344252


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(KeyedModelHandler)
+```
+
+### Use the PredictionResults object
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions.
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. When keys are passed with the input data to the RunInference transform, the output `PCollection` returns a `Tuple[str, PredictionResult]`, which is the key and the `PredictionResult` object. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 
+                | 'PyTorchRunInference' >> RunInference(<keyed_model_handler>)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Incorrect inferences in the PredictionResult object
+
+In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and other metadata. An example return type is `Dict[str, Tensor]`.
+
+The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921548816


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   How do we skip the test if apache_beam[gcp] is not installed? @tvalentyn @yeandy . My test fetch the file from the GCS bucket. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

tvalentyn commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921707651


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   Typically our tests do sth like:
   
   ```
   try: 
     from apache_beam.io.gcp.gcsfilesystem import GCSFileSystem
   except ImportError: 
     GCSFileSystem = None # type: ignore
   ...
   @unittest.skipIf(gcsfilesystem is None, 'GCP dependencies are not installed')
   ```
   
   does this approach work here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920327264


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = data | RunInference(ModelHandlerB)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(ModelHandlerB)
+```
+
+### Use a key handler
+
+If a key is attached to the examples, use the `KeyedModelHandler`:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', np.array[[1,2,3],[4,5,6],...]),
+      ('img2', np.array[[1,2,3],[4,5,6],...]),
+      ('img3', np.array[[1,2,3],[4,5,6],...]),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+### Use the prediction results object
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 
+                | 'PyTorchRunInference' >> RunInference(KeyedModelHandler)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Prediction results missing
+
+When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. 
+
+Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`.
+
+When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see an [example with the batching flag added](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.
+
+To avoid this issue:
+
+1. Either use elements that have the same size, or resize image inputs and word embeddings to make them 
+the same size. Depending on the language model and encoding technique, this option might not be available. 

Review Comment:
   computer vision



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920450377


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   

Review Comment:
   Should we try to style model_handler as a variable, for example <model_handler>?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on PR #22250:
URL: https://github.com/apache/beam/pull/22250#issuecomment-1183714183

   Before I forget, we should probably add somewhere here about supported pytorch versions. https://github.com/apache/beam/issues/22206. TODO: @yeandy yeandy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] github-actions[bot] commented on pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on PR #22250:
URL: https://github.com/apache/beam/pull/22250#issuecomment-1182490114

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

tvalentyn commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r922488644


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,201 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.

Review Comment:
   > To import models, you need to wrap them around a `ModelHandler` object
   
   Consider instead: 
   `To import models, you need to configure a `ModelHandler` object that will wrap the underlying model` 



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,201 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the

Review Comment:
   Using the `Shared` class within RunInference implementation allows us to load the model  only once per process and share it with all DoFn instances created in that process. This reduces the memory consumption and model loading time. 



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,201 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model weights to the PyTorch `ModelHandler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `ModelHandler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source')
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source')
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(KeyedModelHandler)
+```
+
+### Use the PredictionResults object
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions.
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. When keys are passed with the input data to the RunInference transform, the output `PCollection` returns a `Tuple[str, PredictionResult]`, which is the key and the `PredictionResult` object. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source')
+                | 'PyTorchRunInference' >> RunInference(<keyed_model_handler>)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65).
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Incorrect inferences in the PredictionResult object
+
+In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and other metadata. An example return type is `Dict[str, Tensor]`.
+
+The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, so samples passed to the RunInferene transform must be the same dimension or length. If you provide images of different sizes or word embeddings of different lengths, the following error might occur:
+
+`
+File "/beam/sdks/python/apache_beam/ml/inference/pytorch_inference.py", line 232, in run_inference
+batched_tensors = torch.stack(key_to_tensor_list[key])
+RuntimeError: stack expects each tensor to be equal size, but got [12] at entry 0 and [10] at entry 1 [while running 'PyTorchRunInference/ParDo(_RunInferenceDoFn)']
+`
+
+To avoid this issue, either use elements of the same size, or disable batching.
+
+**Option 1: Use elements of the same size**
+
+Use elements of the same size or resize the inputs. For computer vision applications, resize image inputs so that they have the same dimensions. For natural language processing (NLP) applications that have text of varying length, resize the text or word embeddings to make them the same length. When working with texts of varying length, resizing might not be possible. In this scenario, you could disable batching (see option 2).
+
+**Option 2: Disable batching**
+
+Disable batching by overriding the `batch_elements_kwargs` function in your ModelHandler and setting the maximum batch size (`max_batch_size`) to one: `max_batch_size=1`. For more information, see
+[BatchElements PTransforms](/documentation/sdks/python-machine-learning/#batchelements-ptransform).
+

Review Comment:
   How about we link apache_beam/examples/inference/pytorch_language_modeling.py as an example that does this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920293595


##########
website/www/site/content/en/documentation/sdks/python.md:
##########
@@ -48,6 +48,11 @@ language-specific implementation guidance.
 
 ## Using Beam Python SDK in your ML pipelines

Review Comment:
   I think what you have is good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920508188


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,194 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', np.array[[1,2,3],[4,5,6],...]),
+      ('img2', np.array[[1,2,3],[4,5,6],...]),
+      ('img3', np.array[[1,2,3],[4,5,6],...]),
+   ])
+   predictions = data | RunInference(<keyed_model_handler>)

Review Comment:
   technically, we've already defined the `keyed_model_handler` variable above for this snippet, so should it be wrapped with <>?



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,194 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', np.array[[1,2,3],[4,5,6],...]),
+      ('img2', np.array[[1,2,3],[4,5,6],...]),
+      ('img3', np.array[[1,2,3],[4,5,6],...]),

Review Comment:
   ```suggestion
         ('img1', torch.tensor([[1,2,3],[4,5,6],...])),
         ('img2', torch.tensor([[1,2,3],[4,5,6],...])),
         ('img3', torch.tensor([[1,2,3],[4,5,6],...])),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921544457


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   We can add a check to skip if GCP is not detected?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920518772


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,194 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', np.array[[1,2,3],[4,5,6],...]),
+      ('img2', np.array[[1,2,3],[4,5,6],...]),
+      ('img3', np.array[[1,2,3],[4,5,6],...]),
+   ])
+   predictions = data | RunInference(<keyed_model_handler>)

Review Comment:
   Good point. I'll fix that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

tvalentyn commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921708246


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   > Since SKlearn is a installed as a beam dependency in each tox environment, we need to add if GCP is installed as well in every tox test for the Sklearn tests
   
   I may be missing some context here but can we install gcp dependenices for tox environment for sklearn tests similar to https://github.com/apache/beam/blob/64bcc7df2b86b9a531591a5f907e245f1af7998d/sdks/python/tox.ini#L274 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on PR #22250:
URL: https://github.com/apache/beam/pull/22250#issuecomment-1185749685

   R: @tvalentyn @pabloem we did the review on the docs. Would you be able to do a final review and merge the PR? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Update RunInference documentation

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r922546806


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,201 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the

Review Comment:
   Updated



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,201 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921332416


##########
website/www/site/content/en/documentation/sdks/python.md:
##########
@@ -46,7 +46,13 @@ new I/O connectors. See the [Developing I/O connectors overview](/documentation/
 for information about developing new I/O connectors and links to
 language-specific implementation guidance.
 
-## Using Beam Python SDK in your ML pipelines
+## Making machine learning inferences with Python
+
+To integrate machine learning models into your pipelines for making inferences, use the RunInference API for PyTorch and Scikit-learn models. If you are using TensorFlow models, you can make use of the

Review Comment:
   I don't think this is the right place for that link. We've got the content on the RunInference transform page, and links to our PyDocs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921528429


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,195 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model weights to the PyTorch `ModelHandler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `ModelHandler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source')
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source')
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(KeyedModelHandler)
+```
+
+### Use the PredictionResults object
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions.
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. When keys are passed with the input data to the RunInference transform, the output `PCollection` returns a `Tuple[str, PredictionResult]`, which is the key and the `PredictionResult` object. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source')
+                | 'PyTorchRunInference' >> RunInference(<keyed_model_handler>)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65).
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Incorrect inferences in the PredictionResult object
+
+In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and other metadata. An example return type is `Dict[str, Tensor]`.
+
+The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, so samples passed to teh RunInferene transform must be the same dimension or length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.
+

Review Comment:
   ```
     File "/beam/sdks/python/apache_beam/ml/inference/pytorch_inference.py", line 232, in run_inference
       batched_tensors = torch.stack(key_to_tensor_list[key])
   RuntimeError: stack expects each tensor to be equal size, but got [12] at entry 0 and [10] at entry 1 [while running 'PyTorchRunInference/ParDo(_RunInferenceDoFn)']
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,195 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model weights to the PyTorch `ModelHandler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `ModelHandler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source')
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source')
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(KeyedModelHandler)
+```
+
+### Use the PredictionResults object
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions.
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. When keys are passed with the input data to the RunInference transform, the output `PCollection` returns a `Tuple[str, PredictionResult]`, which is the key and the `PredictionResult` object. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source')
+                | 'PyTorchRunInference' >> RunInference(<keyed_model_handler>)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65).
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Incorrect inferences in the PredictionResult object
+
+In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and other metadata. An example return type is `Dict[str, Tensor]`.
+
+The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, so samples passed to teh RunInferene transform must be the same dimension or length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.

Review Comment:
   ```suggestion
   RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, so samples passed to the RunInference transform must be the same dimension or length. If you provide images of different sizes or word embeddings of different lengths, the following error may occur:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921548816


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   How do we skip if apache_beam[gcp] is not installed? @tvalentyn @yeandy 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920291223


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.

Review Comment:
   I'm agnostic about whether to have specifics or a generalization. If we make it more generic, what would you suggest?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920291468


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:

Review Comment:
   I'll remove that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920299601


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.

Review Comment:
   What is applied? The batched elements?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920483456


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```

Review Comment:
   Thanks. By the way, the imports I originally wrote had some typos, so I fixed them
   ```
   from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
   from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
   from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
   from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921543808


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   I don't think so. Other pytorch examples don't need to import it.
   
   Do we need to install `apache-beam[gcp]`?



##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   `Unable to get filesystem from specified path, please use the correct path or ensure the required dependency is installed, e.g., pip install apache-beam[gcp].`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on PR #22250:
URL: https://github.com/apache/beam/pull/22250#issuecomment-1182481447

   R: @yeandy @rezarokni 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r919467734


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,167 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API allows you to find the input that determined the prediction without returning to the full input data.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in a worker, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API allows you to build complex multi-model pipelines with minimum effort. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = data | RunInference(ModelHandlerB)
+```
+
+#### Ensemble Pattern

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920317341


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = data | RunInference(ModelHandlerB)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(ModelHandlerB)
+```
+
+### Use a key handler
+
+If a key is attached to the examples, use the `KeyedModelHandler`:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', np.array[[1,2,3],[4,5,6],...]),
+      ('img2', np.array[[1,2,3],[4,5,6],...]),
+      ('img3', np.array[[1,2,3],[4,5,6],...]),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+### Use the prediction results object
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 
+                | 'PyTorchRunInference' >> RunInference(KeyedModelHandler)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Prediction results missing
+
+When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. 
+
+Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`.
+
+When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see an [example with the batching flag added](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.
+
+To avoid this issue:
+
+1. Either use elements that have the same size, or resize image inputs and word embeddings to make them 
+the same size. Depending on the language model and encoding technique, this option might not be available. 

Review Comment:
   What does CV stand for?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] codecov[bot] commented on pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

codecov[bot] commented on PR #22250:
URL: https://github.com/apache/beam/pull/22250#issuecomment-1185582679

   # [Codecov](https://codecov.io/gh/apache/beam/pull/22250?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#22250](https://codecov.io/gh/apache/beam/pull/22250?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4160d1b) into [master](https://codecov.io/gh/apache/beam/commit/9cf8cf5abfdd947aa59ca993d8dc04d0e28d9f06?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9cf8cf5) will **increase** coverage by `9.28%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #22250      +/-   ##
   ==========================================
   + Coverage   74.25%   83.54%   +9.28%     
   ==========================================
     Files         702      474     -228     
     Lines       92999    65934   -27065     
   ==========================================
   - Hits        69058    55085   -13973     
   + Misses      22674    10849   -11825     
   + Partials     1267        0    -1267     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | go | `?` | |
   | python | `83.54% <0.00%> (-0.10%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/22250?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...es/snippets/transforms/elementwise/runinference.py](https://codecov.io/gh/apache/beam/pull/22250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvc25pcHBldHMvdHJhbnNmb3Jtcy9lbGVtZW50d2lzZS9ydW5pbmZlcmVuY2UucHk=) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/source\_test\_utils.py](https://codecov.io/gh/apache/beam/pull/22250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vc291cmNlX3Rlc3RfdXRpbHMucHk=) | `88.01% <0.00%> (-1.39%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/direct/test\_stream\_impl.py](https://codecov.io/gh/apache/beam/pull/22250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9kaXJlY3QvdGVzdF9zdHJlYW1faW1wbC5weQ==) | `93.28% <0.00%> (-0.75%)` | :arrow_down: |
   | [...eam/runners/portability/fn\_api\_runner/execution.py](https://codecov.io/gh/apache/beam/pull/22250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL2V4ZWN1dGlvbi5weQ==) | `92.44% <0.00%> (-0.65%)` | :arrow_down: |
   | [...ks/python/apache\_beam/runners/worker/sdk\_worker.py](https://codecov.io/gh/apache/beam/pull/22250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvc2RrX3dvcmtlci5weQ==) | `89.09% <0.00%> (-0.32%)` | :arrow_down: |
   | [sdks/python/apache\_beam/utils/annotations.py](https://codecov.io/gh/apache/beam/pull/22250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvYW5ub3RhdGlvbnMucHk=) | `100.00% <0.00%> (ø)` | |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/22250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `93.54% <0.00%> (ø)` | |
   | [sdks/go/pkg/beam/core/runtime/harness/sampler.go](https://codecov.io/gh/apache/beam/pull/22250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3J1bnRpbWUvaGFybmVzcy9zYW1wbGVyLmdv) | | |
   | [sdks/go/pkg/beam/core/runtime/exec/encode.go](https://codecov.io/gh/apache/beam/pull/22250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3J1bnRpbWUvZXhlYy9lbmNvZGUuZ28=) | | |
   | [sdks/go/pkg/beam/combine.go](https://codecov.io/gh/apache/beam/pull/22250/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb21iaW5lLmdv) | | |
   | ... and [227 more](https://codecov.io/gh/apache/beam/pull/22250/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/22250?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/22250?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [9cf8cf5...4160d1b](https://codecov.io/gh/apache/beam/pull/22250?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920473096


##########
website/www/site/content/en/documentation/transforms/python/elementwise/runinference.md:
##########
@@ -0,0 +1,94 @@
+---
+title: "RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+
+{{< localstorage language language-py >}}
+
+{{< button-pydoc path="apache_beam.ml.inference" class="RunInference" >}}
+
+Uses models to do local and remote inference. A `RunInference` transform uses a `PCollection` of examples to create a machine learning (ML) model. The transform outputs a `PCollection` that contains the input examples and output predictions.
+
+You must have Apache Beam 2.40.0 or later installed to run these pipelines.
+
+See more [RunInference API pipeline examples](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).
+
+## PyTorch dependencies
+
+The RunInference API supports the PyTorch framework. To use PyTorch locally, first install `torch`. To install `torch`, in your terminal, run the following command:
+
+`pip install torch==1.10.0`
+
+If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you also need to install `torchvision`. To install `torchvision`, in your terminal, run the following command:
+
+`pip install torchvision`
+
+If you are using pretrained models from Hugging Face's [`transformers` package](https://huggingface.co/docs/transformers/index), you need to install `transformers`. To install `transformers`, in your terminal, run the following command:
+
+`pip install transformers`
+
+For information about installing the `torch` dependency on a distributed runner such as Dataflow, see the [PyPI dependency instructions](/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).

Review Comment:
   Removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on PR #22250:
URL: https://github.com/apache/beam/pull/22250#issuecomment-1183664578

   @ryanthompson591  & @yeandy I think I've made all the changes based on the comments that I can at this point, and I've updated the examples to use the ones provided by Anand. Please let me know what else needs to be done before this is ready.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Update RunInference documentation

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r923698362


##########
website/www/site/layouts/partials/section-menu/en/sdks.html:
##########
@@ -39,6 +39,7 @@
     <li><a href="/documentation/sdks/python-dependencies/">Python SDK dependencies</a></li>
     <li><a href="/documentation/sdks/python-streaming/">Python streaming pipelines</a></li>
     <li><a href="/documentation/sdks/python-type-safety/">Ensuring Python type safety</a></li>
+    <li><a href="/documentation/sdks/python-machine-learning/">Machine Learning</a></li>

Review Comment:
   Yeah, that's a good idea. I'll add that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921545839


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   Since SKlearn is a installed as a beam dependency in each tox environment, we need to add if GCP is installed as well in every to test



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920327643


##########
website/www/site/content/en/documentation/sdks/python.md:
##########
@@ -48,6 +48,11 @@ language-specific implementation guidance.
 
 ## Using Beam Python SDK in your ML pipelines
 
+To use the Beam Python SDK with your machine learning pipelines, use the RunInference API for PyTorch and Sklearn models. If using Tensorflow model, you can make use of the library from `tfx_bsl`. Further integrations for TensorFlow are planned.

Review Comment:
   👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920371691


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -158,24 +169,21 @@ For detailed instructions explaining how to build and run a pipeline that uses M
 
 If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
 
-### Prediction results missing
-
-When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. 
+### Incorrect inferences in the PredictionResult object
 
-Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`.
+In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and that includes additional metadata. An example return type is `Dict[str, Tensor]`.
 
-When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+The RunInference API currently expects outputs to be an `Iterable[Any]`. Example types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
 
-To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see an [example with the batching flag added](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).

Review Comment:
   ```suggestion
   To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper should override `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -158,24 +169,21 @@ For detailed instructions explaining how to build and run a pipeline that uses M
 
 If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
 
-### Prediction results missing
-
-When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. 
+### Incorrect inferences in the PredictionResult object
 
-Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`.
+In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and that includes additional metadata. An example return type is `Dict[str, Tensor]`.
 
-When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+The RunInference API currently expects outputs to be an `Iterable[Any]`. Example types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.

Review Comment:
   ```suggestion
   The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -158,24 +169,21 @@ For detailed instructions explaining how to build and run a pipeline that uses M
 
 If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
 
-### Prediction results missing
-
-When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. 
+### Incorrect inferences in the PredictionResult object
 
-Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`.
+In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and that includes additional metadata. An example return type is `Dict[str, Tensor]`.

Review Comment:
   The intention here is to say that there could be a mapping
   {'key1': predictions,
    'key2': metadata2,
    'key3': metadata3,
   }
   
   ```suggestion
   In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and other metadata. An example return type is `Dict[str, Tensor]`.
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -158,24 +169,21 @@ For detailed instructions explaining how to build and run a pipeline that uses M
 
 If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
 
-### Prediction results missing
-
-When you use a dictionary of tensors, the output might not include the prediction results. This issue occurs because the RunInference API supports tensors but not dictionaries of tensors. 
+### Incorrect inferences in the PredictionResult object
 
-Many model inferences return a dictionary with the predictions and additional metadata, for example, `Dict[str, Tensor]`. The RunInference API currently expects outputs to be an `Iterable[Any]`, for example, `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`.
+In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and that includes additional metadata. An example return type is `Dict[str, Tensor]`.
 
-When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+The RunInference API currently expects outputs to be an `Iterable[Any]`. Example types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
 
-To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see an [example with the batching flag added](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+To work with current RunInference implementation, override the `forward()` function and convert the standard Hugging Face forward output into the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
 
 ### Unable to batch tensor elements
 
 RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.
 
-To avoid this issue:
+To avoid this issue, use one of the following solutions:
 
-1. Either use elements that have the same size, or resize image inputs and word embeddings to make them 
-the same size. Depending on the language model and encoding technique, this option might not be available. 
+1. Use elements of the same size or resize the inputs. For computer vision applications, resize image inputs so that they have the same dimensions. For natural language processing (NLP) applications that have text of varying length, resize the text or word embeddings to make them the same length. When working with texts of varying length, resizing might not be possible.

Review Comment:
   ```suggestion
   1. Use elements of the same size or resize the inputs. For computer vision applications, resize image inputs so that they have the same dimensions. For natural language processing (NLP) applications that have text of varying length, resize the text or word embeddings to make them the same length. When working with texts of varying length, resizing might not be possible. If this is the case, consider solution 2.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on PR #22250:
URL: https://github.com/apache/beam/pull/22250#issuecomment-1184048774

   @yeandy @ryanthompson591 can you review the example snippets? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920450780


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```

Review Comment:
   @yeandy How do you want to handle this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921337400


##########
website/www/site/content/en/documentation/sdks/python.md:
##########
@@ -46,7 +46,13 @@ new I/O connectors. See the [Developing I/O connectors overview](/documentation/
 for information about developing new I/O connectors and links to
 language-specific implementation guidance.
 
-## Using Beam Python SDK in your ML pipelines
+## Making machine learning inferences with Python
+
+To integrate machine learning models into your pipelines for making inferences, use the RunInference API for PyTorch and Scikit-learn models. If you are using TensorFlow models, you can make use of the
+[library from `tfx_bsl`](https://github.com/tensorflow/tfx-bsl/tree/master/tfx_bsl/beam). Further integrations for TensorFlow are planned. For more information, see [Beam issue #21442](https://github.com/apache/beam/issues/21442).

Review Comment:
   I'll remove it. We shouldn't talk about future releases in documentation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921548816


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   How do we skip the test if apache_beam[gcp] is not installed? @tvalentyn @yeandy . My test fetch the file from the GCS bucket. I just provide the GCS location to the RunInference transform



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on a diff in pull request #22250: Update RunInference documentation

Posted by GitBox <gi...@apache.org>.

tvalentyn commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r923676270


##########
website/www/site/layouts/partials/section-menu/en/sdks.html:
##########
@@ -39,6 +39,7 @@
     <li><a href="/documentation/sdks/python-dependencies/">Python SDK dependencies</a></li>
     <li><a href="/documentation/sdks/python-streaming/">Python streaming pipelines</a></li>
     <li><a href="/documentation/sdks/python-type-safety/">Ensuring Python type safety</a></li>
+    <li><a href="/documentation/sdks/python-machine-learning/">Machine Learning</a></li>

Review Comment:
   should we also this content at https://beam.apache.org/documentation/resources/learning-resources/#machine-learning ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921760079


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   I figured it out. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] ryanthompson591 commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

ryanthompson591 commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921251940


##########
website/www/site/content/en/documentation/sdks/python.md:
##########
@@ -46,7 +46,13 @@ new I/O connectors. See the [Developing I/O connectors overview](/documentation/
 for information about developing new I/O connectors and links to
 language-specific implementation guidance.
 
-## Using Beam Python SDK in your ML pipelines
+## Making machine learning inferences with Python
+
+To integrate machine learning models into your pipelines for making inferences, use the RunInference API for PyTorch and Scikit-learn models. If you are using TensorFlow models, you can make use of the
+[library from `tfx_bsl`](https://github.com/tensorflow/tfx-bsl/tree/master/tfx_bsl/beam). Further integrations for TensorFlow are planned. For more information, see [Beam issue #21442](https://github.com/apache/beam/issues/21442).

Review Comment:
   You could say --- Further integrations for popular machine learning frameworks like Tensorflow are planned.
   
   But I'm wondering if its worth saying anything at all, since this documentation could end up out of date soon.



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(KeyedModelHandler)
+```
+
+### Use the PredictionResults object
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions.
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. When keys are passed with the input data to the RunInference transform, the output `PCollection` returns a `Tuple[str, PredictionResult]`, which is the key and the `PredictionResult` object. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 
+                | 'PyTorchRunInference' >> RunInference(<keyed_model_handler>)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Incorrect inferences in the PredictionResult object
+
+In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and other metadata. An example return type is `Dict[str, Tensor]`.
+
+The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.

Review Comment:
   Yes, that is a good generalization for sklearn as well.  Going forward it may not always going to be the case, TF might have some leeway with the way it sets up protos, but it is a good troubleshooting thing to think about in general.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Update RunInference documentation

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r922547000


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,201 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model weights to the PyTorch `ModelHandler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `ModelHandler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source')
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source')
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(KeyedModelHandler)
+```
+
+### Use the PredictionResults object
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions.
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. When keys are passed with the input data to the RunInference transform, the output `PCollection` returns a `Tuple[str, PredictionResult]`, which is the key and the `PredictionResult` object. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source')
+                | 'PyTorchRunInference' >> RunInference(<keyed_model_handler>)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65).
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Incorrect inferences in the PredictionResult object
+
+In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and other metadata. An example return type is `Dict[str, Tensor]`.
+
+The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, so samples passed to the RunInferene transform must be the same dimension or length. If you provide images of different sizes or word embeddings of different lengths, the following error might occur:
+
+`
+File "/beam/sdks/python/apache_beam/ml/inference/pytorch_inference.py", line 232, in run_inference
+batched_tensors = torch.stack(key_to_tensor_list[key])
+RuntimeError: stack expects each tensor to be equal size, but got [12] at entry 0 and [10] at entry 1 [while running 'PyTorchRunInference/ParDo(_RunInferenceDoFn)']
+`
+
+To avoid this issue, either use elements of the same size, or disable batching.
+
+**Option 1: Use elements of the same size**
+
+Use elements of the same size or resize the inputs. For computer vision applications, resize image inputs so that they have the same dimensions. For natural language processing (NLP) applications that have text of varying length, resize the text or word embeddings to make them the same length. When working with texts of varying length, resizing might not be possible. In this scenario, you could disable batching (see option 2).
+
+**Option 2: Disable batching**
+
+Disable batching by overriding the `batch_elements_kwargs` function in your ModelHandler and setting the maximum batch size (`max_batch_size`) to one: `max_batch_size=1`. For more information, see
+[BatchElements PTransforms](/documentation/sdks/python-machine-learning/#batchelements-ptransform).
+

Review Comment:
   Added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921248092


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 

Review Comment:
   @rszper there are some whitespaces that fail our formatting tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] ryanthompson591 commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

ryanthompson591 commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920331996


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   

Review Comment:
   All implementations will need some sort of a line with:
   model_handler = ... Add your model handler setup code here...



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))

Review Comment:
   This documentation has been very good so far.
   
   I would change configuration to model_handler.



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:

Review Comment:
   This documentation talking about model weights seem pytorch specific.  How about add pytorch to the header that is now Use Pre-trained Models?



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```

Review Comment:
   These little chunks of code below here seem out of place.



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the hosted path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+### Use multiple inference models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = data | RunInference(ModelHandlerB)
+```
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(ModelHandlerA)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(ModelHandlerB)
+```
+
+### Use a key handler
+
+If a key is attached to the examples, use the `KeyedModelHandler`:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', np.array[[1,2,3],[4,5,6],...]),
+      ('img2', np.array[[1,2,3],[4,5,6],...]),
+      ('img3', np.array[[1,2,3],[4,5,6],...]),
+   ])
+   predictions = data | RunInference(keyed_model_handler)
+```
+
+### Use the prediction results object

Review Comment:
   Or an alternate title -- Accessing Output from a PredictionResults object



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r919467258


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,167 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.

Review Comment:
   I added a related links section and included a link to the PyDoc



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920496197


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).

Review Comment:
   I'll remove it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920297441


##########
website/www/site/content/en/documentation/sdks/python.md:
##########
@@ -48,6 +48,11 @@ language-specific implementation guidance.
 
 ## Using Beam Python SDK in your ML pipelines

Review Comment:
   👍 



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:

Review Comment:
   👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920778718


##########
website/www/site/content/en/documentation/transforms/python/elementwise/runinference.md:
##########
@@ -0,0 +1,105 @@
+---
+title: "RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+
+{{< localstorage language language-py >}}
+
+{{< button-pydoc path="apache_beam.ml.inference" class="RunInference" >}}
+
+Uses models to do local and remote inference. A `RunInference` transform uses a `PCollection` of examples to create a machine learning (ML) model. The transform outputs a `PCollection` that contains the input examples and output predictions.

Review Comment:
   ```suggestion
   Uses models to do local and remote inference. A `RunInference` transform performs inference on a `PCollection` of examples using a machine learning (ML) model. The transform outputs a `PCollection` that contains the input examples and output predictions.
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.

Review Comment:
   ```suggestion
   2. Pass the path of the model weights to the PyTorch `ModelHandler` by using the following code: `state_dict_path=<path_to_weights>`.
   ```



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(KeyedModelHandler)
+```
+
+### Use the PredictionResults object
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions.
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. When keys are passed with the input data to the RunInference transform, the output `PCollection` returns a `Tuple[str, PredictionResult]`, which is the key and the `PredictionResult` object. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 
+                | 'PyTorchRunInference' >> RunInference(<keyed_model_handler>)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Incorrect inferences in the PredictionResult object
+
+In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and other metadata. An example return type is `Dict[str, Tensor]`.
+
+The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.

Review Comment:
   @ryanthompson591 is it the same case for `numpy` as well? if yes, we can generalize the statement saying something like  all the samples passed to the RunInference transform must be of the same dimension/length.



##########
website/www/site/content/en/documentation/sdks/python.md:
##########
@@ -46,7 +46,13 @@ new I/O connectors. See the [Developing I/O connectors overview](/documentation/
 for information about developing new I/O connectors and links to
 language-specific implementation guidance.
 
-## Using Beam Python SDK in your ML pipelines
+## Making machine learning inferences with Python
+
+To integrate machine learning models into your pipelines for making inferences, use the RunInference API for PyTorch and Scikit-learn models. If you are using TensorFlow models, you can make use of the

Review Comment:
   To integrate machine learning models into your pipelines for making inferences, use the RunInference API.....after this can we add a link to our repo just like how you did with the tfx_bsl here?
   
   https://github.com/apache/beam/tree/master/sdks/python/apache_beam/ml/inference



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.

Review Comment:
   I suggest model weights here because we save the state_dict of the Pytorch model, which is the weights of the model.



##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:

Review Comment:
   ```suggestion
   2. Pass the path of the model to the Sklearn `ModelHandler` by using the following code:
   ```



##########
website/www/site/content/en/documentation/transforms/python/elementwise/runinference.md:
##########
@@ -0,0 +1,105 @@
+---
+title: "RunInference"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# RunInference
+
+{{< localstorage language language-py >}}
+
+{{< button-pydoc path="apache_beam.ml.inference" class="RunInference" >}}
+
+Uses models to do local and remote inference. A `RunInference` transform uses a `PCollection` of examples to create a machine learning (ML) model. The transform outputs a `PCollection` that contains the input examples and output predictions.

Review Comment:
   Create might cause confusion since we only instantiate or unpickle the already created/trained model.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920790329


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,198 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')   
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model to the PyTorch `model_handler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `model_handler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source') 
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+ 
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+ 
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(KeyedModelHandler)
+```
+
+### Use the PredictionResults object
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions.
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. When keys are passed with the input data to the RunInference transform, the output `PCollection` returns a `Tuple[str, PredictionResult]`, which is the key and the `PredictionResult` object. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source') 
+                | 'PyTorchRunInference' >> RunInference(<keyed_model_handler>)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Incorrect inferences in the PredictionResult object
+
+In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and other metadata. An example return type is `Dict[str, Tensor]`.
+
+The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.

Review Comment:
   @ryanthompson591 is it the same case for `numpy` as well? if yes, we can generalize the statement saying something like  all the `samples passed to the RunInference transform must be of the same dimension/length`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on PR #22250:
URL: https://github.com/apache/beam/pull/22250#issuecomment-1183513336

   @yeandy I believe all updates are made based on your comments, except I haven't updated the examples yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920481889


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```

Review Comment:
   I made some updates. Take a look and let me know if we need more changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920494546


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```

Review Comment:
   Updated to fix the typos



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r919469052


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,167 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API allows you to find the input that determined the prediction without returning to the full input data.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in a worker, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API allows you to build complex multi-model pipelines with minimum effort. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920344794


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results

Review Comment:
   I removed this section and moved the content into the section lower down. I also updated it based on Anand's comments in the Google doc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920448995


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))
+```
+
+To import models, you need to wrap them around a `ModelHandler object`. Add one or more of the following lines of code, depending on the framework and type of data structure that holds the data:
+
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerNumpy
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+```
+```
+from apache_beam.ml.inference.pytorch_inference import SklearnModelHandlerPandas
+```
+```
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+You need to provide a path to the model weights that's accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:

Review Comment:
   This has been updated to include a PyTorch section and a Scikit-learn section.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920449421


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,186 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference leverages existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, and it allows you to build multi-model pipelines. In addition, the RunInference API has built in capabilities for dealing with [keyed values](#use-the-prediction-results-object).
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The resulting batch is used to make the appropriate transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines are useful for A/B testing and for building out ensembles for tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+### Prediction results
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions without returning the full input data.
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65). 
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, you add a single line of code in your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+ 
+with pipeline as p:
+   predictions = ( p | beam.ReadFromSource('a_source')   
+    | RunInference(configuration)))

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r920326369


##########
website/www/site/content/en/documentation/sdks/python.md:
##########
@@ -48,6 +48,11 @@ language-specific implementation guidance.
 
 ## Using Beam Python SDK in your ML pipelines
 
+To use the Beam Python SDK with your machine learning pipelines, use the RunInference API for PyTorch and Sklearn models. If using Tensorflow model, you can make use of the library from `tfx_bsl`. Further integrations for TensorFlow are planned.

Review Comment:
   Thank you! I'm adding the link for the library. I don't know if this is the right place to link to the beam issue with future work, but I'm adding it for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921344872


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference_test.py:
##########
@@ -0,0 +1,93 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+import unittest
+from io import StringIO
+
+import mock
+
+from apache_beam.examples.snippets.util import assert_matches_stdout
+from apache_beam.testing.test_pipeline import TestPipeline
+
+from . import runinference
+
+def check_torch_keyed_model_handler(actual):
+  expected = '''[START torch_keyed_model_handler]
+('first_question', PredictionResult(example=tensor([105.]), inference=tensor([523.6982], grad_fn=<UnbindBackward0>)))
+('second_question', PredictionResult(example=tensor([108.]), inference=tensor([538.5867], grad_fn=<UnbindBackward0>)))
+('third_question', PredictionResult(example=tensor([1000.]), inference=tensor([4965.4019], grad_fn=<UnbindBackward0>)))
+('fourth_question', PredictionResult(example=tensor([1013.]), inference=tensor([5029.9180], grad_fn=<UnbindBackward0>)))
+[END torch_keyed_model_handler]'''.splitlines()[1:-1]
+  assert_matches_stdout(actual, expected)
+
+
+def check_sklearn_keyed_model_handler(actual):
+  expected = '''[START sklearn_keyed_model_handler]
+('first_question', PredictionResult(example=[105.0], inference=array([525.])))
+('second_question', PredictionResult(example=[108.0], inference=array([540.])))
+('third_question', PredictionResult(example=[1000.0], inference=array([5000.])))
+('fourth_question', PredictionResult(example=[1013.0], inference=array([5065.])))
+[END sklearn_keyed_model_handler] '''.splitlines()[1:-1]
+  assert_matches_stdout(actual, expected)
+
+
+def check_torch_unkeyed_model_handler(actual):
+  expected = '''[START torch_unkeyed_model_handler]
+PredictionResult(example=tensor([10.]), inference=tensor([52.2325], grad_fn=<UnbindBackward0>))
+PredictionResult(example=tensor([40.]), inference=tensor([201.1165], grad_fn=<UnbindBackward0>))
+PredictionResult(example=tensor([60.]), inference=tensor([300.3724], grad_fn=<UnbindBackward0>))
+PredictionResult(example=tensor([90.]), inference=tensor([449.2563], grad_fn=<UnbindBackward0>))
+[END torch_unkeyed_model_handler] '''.splitlines()[1:-1]
+  assert_matches_stdout(actual, expected)
+
+
+def check_sklearn_unkeyed_model_handler(actual):
+  expected = '''[START sklearn_unkeyed_model_handler]
+PredictionResult(example=array([20.], dtype=float32), inference=array([100.], dtype=float32))
+PredictionResult(example=array([40.], dtype=float32), inference=array([200.], dtype=float32))
+PredictionResult(example=array([60.], dtype=float32), inference=array([300.], dtype=float32))
+PredictionResult(example=array([90.], dtype=float32), inference=array([450.], dtype=float32))
+[END sklearn_unkeyed_model_handler]  '''.splitlines()[1:-1]
+  assert_matches_stdout(actual, expected)
+
+@mock.patch('apache_beam.Pipeline', TestPipeline)
+@mock.patch(
+    'apache_beam.examples.snippets.transforms.elementwise.runinference.print', str)
+class RunInferenceTest(unittest.TestCase):
+  def test_torch_unkeyed_model_handler(self):
+    runinference.torch_unkeyed_model_handler(check_torch_unkeyed_model_handler)
+
+  def test_torch_keyed_model_handler(self):
+    runinference.torch_keyed_model_handler(check_torch_keyed_model_handler)
+
+  def test_sklearn_unkeyed_model_handler(self):
+    runinference.sklearn_unkeyed_model_handler(check_sklearn_unkeyed_model_handler)
+
+  def test_sklearn_keyed_model_handler(self):
+    runinference.sklearn_keyed_model_handler(check_sklearn_keyed_model_handler)
+
+  def test_images(self):
+    runinference.images(check_images)
+
+  def test_digits(self):
+    runinference.digits(check_digits)
+

Review Comment:
   Thanks. I missed that when I removed everything else.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921545839


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch
+  from apache_beam.ml.inference.base import KeyedModelHandler
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  keyed_model_handler = KeyedModelHandler(
+      PytorchModelHandlerTensor(
+          model_class=model_class,
+          model_params=model_params,
+          state_dict_path=model_state_dict_path))
+
+  keyed_data = [("first_question", 105.00), ("second_question", 108.00),
+                ("third_question", 1000.00), ("fourth_question", 1013.00)]
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'KeyedInputData' >> beam.Create(keyed_data)
+        | "ConvertIntToTensor" >>
+        beam.Map(lambda x: (x[0], torch.Tensor([x[1]])))
+        | 'PytorchRunInference' >>
+        RunInference(model_handler=keyed_model_handler)
+        | beam.Map(print))
+    # [END torch_keyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def sklearn_unkeyed_model_handler(test=None):
+    # [START sklearn_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.sklearn_inference import ModelFileType
+  from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+
+  sklearn_model_filename = 'gs://apache-beam-samples/run_inference/five_times_table_sklearn.pkl'  # pylint: disable=line-too-long

Review Comment:
   Since SKlearn is a installed as a beam dependency in each tox environment, we need to add if GCP is installed as well in every tox test for the Sklearn tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] AnandInguva commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

AnandInguva commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r921759841


##########
sdks/python/apache_beam/examples/snippets/transforms/elementwise/runinference.py:
##########
@@ -0,0 +1,157 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# pytype: skip-file
+
+def torch_unkeyed_model_handler(test=None):
+    # [START torch_unkeyed_model_handler]
+  import apache_beam as beam
+  import numpy
+  import torch
+  from apache_beam.ml.inference.base import RunInference
+  from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+
+  class LinearRegression(torch.nn.Module):
+    def __init__(self, input_dim=1, output_dim=1):
+      super().__init__()
+      self.linear = torch.nn.Linear(input_dim, output_dim)
+
+    def forward(self, x):
+      out = self.linear(x)
+      return out
+
+  model_state_dict_path = 'gs://apache-beam-samples/run_inference/five_times_table_torch.pt'  # pylint: disable=line-too-long
+  model_class = LinearRegression
+  model_params = {'input_dim': 1, 'output_dim': 1}
+  model_handler = PytorchModelHandlerTensor(
+      model_class=model_class,
+      model_params=model_params,
+      state_dict_path=model_state_dict_path)
+
+  unkeyed_data = numpy.array([10, 40, 60, 90],
+                             dtype=numpy.float32).reshape(-1, 1)
+
+  with beam.Pipeline() as p:
+    predictions = (
+        p
+        | 'InputData' >> beam.Create(unkeyed_data)
+        | 'ConvertNumpyToTensor' >> beam.Map(torch.Tensor)
+        | 'PytorchRunInference' >> RunInference(model_handler=model_handler)
+        | beam.Map(print))
+    # [END torch_unkeyed_model_handler]
+    if test:
+      test(predictions)
+
+
+def torch_keyed_model_handler(test=None):
+    # [START torch_keyed_model_handler]
+  import apache_beam as beam
+  import torch

Review Comment:
   Tox. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

tvalentyn commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r922494582


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,201 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the
+[`Shared` class documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/shared.py#L20).
+
+### Multi-model pipelines
+
+The RunInference API can be composed into multi-model pipelines. Multi-model pipelines can be useful for A/B testing or for building out ensembles that are comprised of models that perform tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, language detection, coreference resolution, and more.
+
+## Modify a pipeline to use an ML model
+
+To use the RunInference transform, add the following code to your pipeline:
+
+```
+from apache_beam.ml.inference.base import RunInference
+with pipeline as p:
+   predictions = ( p |  'Read' >> beam.ReadFromSource('a_source')
+                     | 'RunInference' >> RunInference(<model_handler>)
+```
+Where `model_handler` is the model handler setup code.
+
+To import models, you need to wrap them around a `ModelHandler` object. Which `ModelHandler` you import depends on the framework and type of data structure that contains the inputs. The following examples show some ModelHandlers that you might want to import.
+
+```
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerNumpy
+from apache_beam.ml.inference.sklearn_inference import SklearnModelHandlerPandas
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
+from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerKeyedTensor
+```
+### Use pre-trained models
+
+The section provides requirements for using pre-trained models with PyTorch and Scikit-learn
+
+#### PyTorch
+
+You need to provide a path to a file that contains the model saved weights. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the PyTorch framework, complete the following steps:
+
+1. Download the pre-trained weights and host them in a location that the pipeline can access.
+2. Pass the path of the model weights to the PyTorch `ModelHandler` by using the following code: `state_dict_path=<path_to_weights>`.
+
+#### Scikit-learn
+
+You need to provide a path to a file that contains the pickled Scikit-learn model. This path must be accessible by the pipeline. To use pre-trained models with the RunInference API and the Scikit-learn framework, complete the following steps:
+
+1. Download the pickled model class and host it in a location that the pipeline can access.
+2. Pass the path of the model to the Sklearn `ModelHandler` by using the following code:
+   `model_uri=<path_to_pickled_file>` and `model_file_type: <ModelFileType>`, where you can specify
+   `ModelFileType.PICKLE` or `ModelFileType.JOBLIB`, depending on how the model was serialized.
+
+### Use multiple models
+
+You can also use the RunInference transform to add multiple inference models to your pipeline.
+
+#### A/B Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source')
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = data | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+#### Ensemble Pattern
+
+```
+with pipeline as p:
+   data = p | 'Read' >> beam.ReadFromSource('a_source')
+   model_a_predictions = data | RunInference(<model_handler_A>)
+   model_b_predictions = model_a_predictions | beam.Map(some_post_processing) | RunInference(<model_handler_B>)
+```
+
+Where `model_handler_A` and `model_handler_B` are the model handler setup code.
+
+### Use a keyed ModelHandler
+
+If a key is attached to the examples, wrap the `KeyedModelHandler` around the `ModelHandler` object:
+
+```
+from apache_beam.ml.inference.base import KeyedModelHandler
+keyed_model_handler = KeyedModelHandler(PytorchModelHandlerTensor(...))
+with pipeline as p:
+   data = p | beam.Create([
+      ('img1', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img2', torch.tensor([[1,2,3],[4,5,6],...])),
+      ('img3', torch.tensor([[1,2,3],[4,5,6],...])),
+   ])
+   predictions = data | RunInference(KeyedModelHandler)
+```
+
+### Use the PredictionResults object
+
+When doing a prediction in Apache Beam, the output `PCollection` includes both the keys of the input examples and the inferences. Including both these items in the output allows you to find the input that determined the predictions.
+
+The `PredictionResult` is a `NamedTuple` object that contains both the input and the inferences, named  `example` and  `inference`, respectively. When keys are passed with the input data to the RunInference transform, the output `PCollection` returns a `Tuple[str, PredictionResult]`, which is the key and the `PredictionResult` object. Your pipeline interacts with a `PredictionResult` object in steps after the RunInference transform.
+
+```
+class PostProcessor(beam.DoFn):
+    def process(self, element: Tuple[str, PredictionResult]):
+       key, prediction_result = element
+       inputs = prediction_result.example
+       predictions = prediction_result.inference
+
+       # Post-processing logic
+       result = ...
+
+       yield (key, result)
+
+with pipeline as p:
+    output = (
+        p | 'Read' >> beam.ReadFromSource('a_source')
+                | 'PyTorchRunInference' >> RunInference(<keyed_model_handler>)
+                | 'ProcessOutput' >> beam.ParDo(PostProcessor()))
+```
+
+If you need to use this object explicitly, include the following line in your pipeline to import the object:
+
+```
+from apache_beam.ml.inference.base import PredictionResult
+```
+
+For more information, see the [`PredictionResult` documentation](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/ml/inference/base.py#L65).
+
+## Run a machine learning pipeline
+
+For detailed instructions explaining how to build and run a pipeline that uses ML models, see the
+[Example RunInference API pipelines](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference) on GitHub.
+
+## Troubleshooting
+
+If you run into problems with your pipeline or job, this section lists issues that you might encounter and provides suggestions for how to fix them.
+
+### Incorrect inferences in the PredictionResult object
+
+In some cases, the `PredictionResults` output might not include the correct predictions in the `inferences` field. This issue occurs when you use a model whose inferences return a dictionary that maps keys to predictions and other metadata. An example return type is `Dict[str, Tensor]`.
+
+The RunInference API currently expects outputs to be an `Iterable[Any]`. Example return types are `Iterable[Tensor]` or `Iterable[Dict[str, Tensor]]`. When RunInference zips the inputs with the predictions, the predictions iterate over the dictionary keys instead of the batch elements. The result is that the key name is preserved but the prediction tensors are discarded. For more information, see the [Pytorch RunInference PredictionResult is a Dict](https://github.com/apache/beam/issues/22240) issue in the Apache Beam GitHub project.
+
+To work with the current RunInference implementation, you can create a wrapper class that overrides the `model(input)` call. In PyTorch, for example, your wrapper would override the `forward()` function and return an output with the appropriate format of `List[Dict[str, torch.Tensor]]`. For more information, see our [HuggingFace language modeling example](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/pytorch_language_modeling.py#L49).
+
+### Unable to batch tensor elements
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, so samples passed to the RunInferene transform must be the same dimension or length. If you provide images of different sizes or word embeddings of different lengths, the following error might occur:
+
+`
+File "/beam/sdks/python/apache_beam/ml/inference/pytorch_inference.py", line 232, in run_inference
+batched_tensors = torch.stack(key_to_tensor_list[key])
+RuntimeError: stack expects each tensor to be equal size, but got [12] at entry 0 and [10] at entry 1 [while running 'PyTorchRunInference/ParDo(_RunInferenceDoFn)']
+`
+
+To avoid this issue, either use elements of the same size, or disable batching.
+
+**Option 1: Use elements of the same size**
+
+Use elements of the same size or resize the inputs. For computer vision applications, resize image inputs so that they have the same dimensions. For natural language processing (NLP) applications that have text of varying length, resize the text or word embeddings to make them the same length. When working with texts of varying length, resizing might not be possible. In this scenario, you could disable batching (see option 2).
+
+**Option 2: Disable batching**
+
+Disable batching by overriding the `batch_elements_kwargs` function in your ModelHandler and setting the maximum batch size (`max_batch_size`) to one: `max_batch_size=1`. For more information, see
+[BatchElements PTransforms](/documentation/sdks/python-machine-learning/#batchelements-ptransform).
+

Review Comment:
   How about we also link apache_beam/examples/inference/pytorch_language_modeling.py as an example that does this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on a diff in pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

tvalentyn commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r922486663


##########
website/www/site/content/en/documentation/sdks/python-machine-learning.md:
##########
@@ -0,0 +1,201 @@
+---
+type: languages
+title: "Apache Beam Python Machine Learning"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Machine Learning
+
+You can use Apache Beam with the RunInference API to use machine learning (ML) models to do local and remote inference with batch and streaming pipelines. Starting with Apache Beam 2.40.0, PyTorch and Scikit-learn frameworks are supported. You can create multiple types of transforms using the RunInference API: the API takes multiple types of setup parameters from model handlers, and the parameter type determines the model implementation.
+
+## Why use the RunInference API?
+
+RunInference takes advantage of existing Apache Beam concepts, such as the the `BatchElements` transform and the `Shared` class, to enable you to use models in your pipelines to create transforms optimized for machine learning inferences. The ability to create arbitrarily complex workflow graphs also allows you to build multi-model pipelines.
+
+### BatchElements PTransform
+
+To take advantage of the optimizations of vectorized inference that many models implement, we added the `BatchElements` transform as an intermediate step before making the prediction for the model. This transform batches elements together. The batched elements are then applied with a transformation for the particular framework of RunInference. For example, for numpy `ndarrays`, we call `numpy.stack()`,  and for torch `Tensor` elements, we call `torch.stack()`.
+
+To customize the settings for `beam.BatchElements`, in `ModelHandler`, override the `batch_elements_kwargs` function. For example, use `min_batch_size` to set the lowest number of elements per batch or `max_batch_size` to set the highest number of elements per batch.
+
+For more information, see the [`BatchElements` transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements).
+
+### Shared helper class
+
+Instead of loading a model for each thread in the process, we use the `Shared` class, which allows us to load one model that is shared across all threads of each worker in a DoFn. For more information, see the

Review Comment:
   How about: 
   
   Using the `Shared` class within RunInference implementation allows us to load the model  only once per process and share it with all DoFn instances created in that process. This reduces the memory consumption and model loading time. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on pull request #22250: Rszper run inference docs

Posted by GitBox <gi...@apache.org>.

yeandy commented on PR #22250:
URL: https://github.com/apache/beam/pull/22250#issuecomment-1182483253

   R: @AnandInguva @ryanthompson591 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on a diff in pull request #22250: Update RunInference documentation

Posted by GitBox <gi...@apache.org>.

tvalentyn commented on code in PR #22250:
URL: https://github.com/apache/beam/pull/22250#discussion_r924581965


##########
website/www/site/layouts/partials/section-menu/en/sdks.html:
##########
@@ -39,6 +39,7 @@
     <li><a href="/documentation/sdks/python-dependencies/">Python SDK dependencies</a></li>
     <li><a href="/documentation/sdks/python-streaming/">Python streaming pipelines</a></li>
     <li><a href="/documentation/sdks/python-type-safety/">Ensuring Python type safety</a></li>
+    <li><a href="/documentation/sdks/python-machine-learning/">Machine Learning</a></li>

Review Comment:
   Done in https://github.com/apache/beam/pull/22325, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org