You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by "damccorm (via GitHub)" <gi...@apache.org> on 2023/09/06 14:43:45 UTC

[GitHub] [beam] damccorm opened a new pull request, #28327: Add notebook for per key models

damccorm opened a new pull request, #28327:
URL: https://github.com/apache/beam/pull/28327

   ✨ [RENDERED](https://github.com/damccorm/beam/blob/703e9c64c3d067ec35038a511bdc92bc4270aef4/examples/notebooks/beam-ml/per_key_models.ipynb) ✨ 
   
   This should not be merged until 2.51 is released, it was tested by replacing the beam install with a clone/install from the Beam repo though.
   
   Resolves #27628
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm commented on a diff in pull request #28327: Add notebook for per key models

Posted by "damccorm (via GitHub)" <gi...@apache.org>.

damccorm commented on code in PR #28327:
URL: https://github.com/apache/beam/pull/28327#discussion_r1320229732


##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."
+      ],
+      "metadata": {
+        "id": "yd92MC7YEsTf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "examples = [\n",
+        "    (\"This restaurant is awesome\", \"positive\"),\n",
+        "    (\"This restaurant is bad\", \"negative\"),\n",
+        "    (\"I feel fine\", \"neutral\"),\n",
+        "    (\"I love chocolate\", \"positive\"),\n",
+        "]"
+      ],
+      "metadata": {
+        "id": "5HAziWEavQws"
+      },
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can extract the actual sentiment of the example later."
+      ],
+      "metadata": {
+        "id": "r6GXL5PLFBY7"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FormatExamples(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
+        "  We will use these keyes to map our elements to the correct models.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:\n",
+        "    yield (f'distilbert-{element[1]}', element[0])\n",
+        "    yield (f'roberta-{element[1]}', element[0])"
+      ],
+      "metadata": {
+        "id": "p2uVwws8zRpg"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Using the formatted keys, we will define a `KeyedModelHandler` which maps keys to the ModelHandler we should use for those keys. `KeyedModelHandler` also allows you to define an optional `max_models_per_worker_hint` which will limit the number of models that can be held in a single worker process at once. This is useful if you are worried about your worker running out of memory. See https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler for more info on managing memory."
+      ],
+      "metadata": {
+        "id": "IP65_5nNGIb8"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "per_key_mhs = [\n",
+        "    KeyModelMapping(['distilbert-positive', 'distilbert-neutral', 'distilbert-negative'], distilbert_mh),\n",
+        "    KeyModelMapping(['roberta-positive', 'roberta-neutral', 'roberta-negative'], roberta_mh)\n",
+        "]\n",
+        "mh = KeyedModelHandler(per_key_mhs, max_models_per_worker_hint=2)"
+      ],
+      "metadata": {
+        "id": "DZpfjeGL2hMG"
+      },
+      "execution_count": 7,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Postprocess our results\n",
+        "\n",
+        "The `RunInference` transform returns a Tuple of the original key and a `PredictionResult` object that contains the original example and the inference. From that, we will extract the data we care about. We will then group this data by the original example in order to compare each model's prediction."

Review Comment:
   This is the relevant diff - https://github.com/apache/beam/pull/28327/commits/056d6d2ebc509938686497e1a0baf8946cdcf136



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] github-actions[bot] commented on pull request #28327: Add notebook for per key models

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #28327:
URL: https://github.com/apache/beam/pull/28327#issuecomment-1708573146

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on pull request #28327: Add notebook for per key models

Posted by "rszper (via GitHub)" <gi...@apache.org>.

rszper commented on PR #28327:
URL: https://github.com/apache/beam/pull/28327#issuecomment-1712043297

   Should we also include this in the ReadMe list? https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/README.md


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm commented on a diff in pull request #28327: Add notebook for per key models

Posted by "damccorm (via GitHub)" <gi...@apache.org>.

damccorm commented on code in PR #28327:
URL: https://github.com/apache/beam/pull/28327#discussion_r1320229732


##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."
+      ],
+      "metadata": {
+        "id": "yd92MC7YEsTf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "examples = [\n",
+        "    (\"This restaurant is awesome\", \"positive\"),\n",
+        "    (\"This restaurant is bad\", \"negative\"),\n",
+        "    (\"I feel fine\", \"neutral\"),\n",
+        "    (\"I love chocolate\", \"positive\"),\n",
+        "]"
+      ],
+      "metadata": {
+        "id": "5HAziWEavQws"
+      },
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can extract the actual sentiment of the example later."
+      ],
+      "metadata": {
+        "id": "r6GXL5PLFBY7"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FormatExamples(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
+        "  We will use these keyes to map our elements to the correct models.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:\n",
+        "    yield (f'distilbert-{element[1]}', element[0])\n",
+        "    yield (f'roberta-{element[1]}', element[0])"
+      ],
+      "metadata": {
+        "id": "p2uVwws8zRpg"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Using the formatted keys, we will define a `KeyedModelHandler` which maps keys to the ModelHandler we should use for those keys. `KeyedModelHandler` also allows you to define an optional `max_models_per_worker_hint` which will limit the number of models that can be held in a single worker process at once. This is useful if you are worried about your worker running out of memory. See https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler for more info on managing memory."
+      ],
+      "metadata": {
+        "id": "IP65_5nNGIb8"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "per_key_mhs = [\n",
+        "    KeyModelMapping(['distilbert-positive', 'distilbert-neutral', 'distilbert-negative'], distilbert_mh),\n",
+        "    KeyModelMapping(['roberta-positive', 'roberta-neutral', 'roberta-negative'], roberta_mh)\n",
+        "]\n",
+        "mh = KeyedModelHandler(per_key_mhs, max_models_per_worker_hint=2)"
+      ],
+      "metadata": {
+        "id": "DZpfjeGL2hMG"
+      },
+      "execution_count": 7,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Postprocess our results\n",
+        "\n",
+        "The `RunInference` transform returns a Tuple of the original key and a `PredictionResult` object that contains the original example and the inference. From that, we will extract the data we care about. We will then group this data by the original example in order to compare each model's prediction."

Review Comment:
   This is the relevant diff - https://github.com/apache/beam/pull/28327/commits/056d6d2ebc509938686497e1a0baf8946cdcf136 - the other commit is directly applying all other suggestions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm commented on a diff in pull request #28327: Add notebook for per key models

Posted by "damccorm (via GitHub)" <gi...@apache.org>.

damccorm commented on code in PR #28327:
URL: https://github.com/apache/beam/pull/28327#discussion_r1320229431


##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."
+      ],
+      "metadata": {
+        "id": "yd92MC7YEsTf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "examples = [\n",
+        "    (\"This restaurant is awesome\", \"positive\"),\n",
+        "    (\"This restaurant is bad\", \"negative\"),\n",
+        "    (\"I feel fine\", \"neutral\"),\n",
+        "    (\"I love chocolate\", \"positive\"),\n",
+        "]"
+      ],
+      "metadata": {
+        "id": "5HAziWEavQws"
+      },
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can extract the actual sentiment of the example later."
+      ],
+      "metadata": {
+        "id": "r6GXL5PLFBY7"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FormatExamples(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
+        "  We will use these keyes to map our elements to the correct models.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:\n",
+        "    yield (f'distilbert-{element[1]}', element[0])\n",
+        "    yield (f'roberta-{element[1]}', element[0])"
+      ],
+      "metadata": {
+        "id": "p2uVwws8zRpg"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Using the formatted keys, we will define a `KeyedModelHandler` which maps keys to the ModelHandler we should use for those keys. `KeyedModelHandler` also allows you to define an optional `max_models_per_worker_hint` which will limit the number of models that can be held in a single worker process at once. This is useful if you are worried about your worker running out of memory. See https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler for more info on managing memory."
+      ],
+      "metadata": {
+        "id": "IP65_5nNGIb8"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "per_key_mhs = [\n",
+        "    KeyModelMapping(['distilbert-positive', 'distilbert-neutral', 'distilbert-negative'], distilbert_mh),\n",
+        "    KeyModelMapping(['roberta-positive', 'roberta-neutral', 'roberta-negative'], roberta_mh)\n",
+        "]\n",
+        "mh = KeyedModelHandler(per_key_mhs, max_models_per_worker_hint=2)"
+      ],
+      "metadata": {
+        "id": "DZpfjeGL2hMG"
+      },
+      "execution_count": 7,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Postprocess our results\n",
+        "\n",
+        "The `RunInference` transform returns a Tuple of the original key and a `PredictionResult` object that contains the original example and the inference. From that, we will extract the data we care about. We will then group this data by the original example in order to compare each model's prediction."

Review Comment:
   That changes the meaning (which means it definitely wasn't clear 😅 ). Let me know if it reads better now



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."
+      ],
+      "metadata": {
+        "id": "yd92MC7YEsTf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "examples = [\n",
+        "    (\"This restaurant is awesome\", \"positive\"),\n",
+        "    (\"This restaurant is bad\", \"negative\"),\n",
+        "    (\"I feel fine\", \"neutral\"),\n",
+        "    (\"I love chocolate\", \"positive\"),\n",
+        "]"
+      ],
+      "metadata": {
+        "id": "5HAziWEavQws"
+      },
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can extract the actual sentiment of the example later."
+      ],
+      "metadata": {
+        "id": "r6GXL5PLFBY7"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FormatExamples(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
+        "  We will use these keyes to map our elements to the correct models.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:\n",
+        "    yield (f'distilbert-{element[1]}', element[0])\n",
+        "    yield (f'roberta-{element[1]}', element[0])"
+      ],
+      "metadata": {
+        "id": "p2uVwws8zRpg"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Using the formatted keys, we will define a `KeyedModelHandler` which maps keys to the ModelHandler we should use for those keys. `KeyedModelHandler` also allows you to define an optional `max_models_per_worker_hint` which will limit the number of models that can be held in a single worker process at once. This is useful if you are worried about your worker running out of memory. See https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler for more info on managing memory."
+      ],
+      "metadata": {
+        "id": "IP65_5nNGIb8"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "per_key_mhs = [\n",
+        "    KeyModelMapping(['distilbert-positive', 'distilbert-neutral', 'distilbert-negative'], distilbert_mh),\n",
+        "    KeyModelMapping(['roberta-positive', 'roberta-neutral', 'roberta-negative'], roberta_mh)\n",
+        "]\n",
+        "mh = KeyedModelHandler(per_key_mhs, max_models_per_worker_hint=2)"
+      ],
+      "metadata": {
+        "id": "DZpfjeGL2hMG"
+      },
+      "execution_count": 7,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Postprocess our results\n",
+        "\n",
+        "The `RunInference` transform returns a Tuple of the original key and a `PredictionResult` object that contains the original example and the inference. From that, we will extract the data we care about. We will then group this data by the original example in order to compare each model's prediction."

Review Comment:
   That changes the meaning (which means it definitely wasn't clear 😅 ). Let me know if it reads better now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] Add notebook for per key models [beam]

Posted by "damccorm (via GitHub)" <gi...@apache.org>.

damccorm merged PR #28327:
URL: https://github.com/apache/beam/pull/28327


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] Add notebook for per key models [beam]

Posted by "damccorm (via GitHub)" <gi...@apache.org>.

damccorm commented on PR #28327:
URL: https://github.com/apache/beam/pull/28327#issuecomment-1745468277

   I'm going to merge this in the current pull from master form so that it can be easily used for demos. I'll follow up to update the beam install once 2.51 is released


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm commented on pull request #28327: Add notebook for per key models

Posted by "damccorm (via GitHub)" <gi...@apache.org>.

damccorm commented on PR #28327:
URL: https://github.com/apache/beam/pull/28327#issuecomment-1712066475

   > Should we also include this in the ReadMe list? https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/README.md
   
   Yes, done!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #28327: Add notebook for per key models

Posted by "rszper (via GitHub)" <gi...@apache.org>.

rszper commented on code in PR #28327:
URL: https://github.com/apache/beam/pull/28327#discussion_r1320192582


##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",

Review Comment:
   ```suggestion
           "In Apache Beam, the recommended way to run inference is to use the `RunInference` transform. By using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without having to manage memory yourself.\n",
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",

Review Comment:
   ```suggestion
           "## Install dependencies\n",
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."
+      ],
+      "metadata": {
+        "id": "yd92MC7YEsTf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "examples = [\n",
+        "    (\"This restaurant is awesome\", \"positive\"),\n",
+        "    (\"This restaurant is bad\", \"negative\"),\n",
+        "    (\"I feel fine\", \"neutral\"),\n",
+        "    (\"I love chocolate\", \"positive\"),\n",
+        "]"
+      ],
+      "metadata": {
+        "id": "5HAziWEavQws"
+      },
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can extract the actual sentiment of the example later."
+      ],
+      "metadata": {
+        "id": "r6GXL5PLFBY7"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FormatExamples(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
+        "  We will use these keyes to map our elements to the correct models.\n",

Review Comment:
   ```suggestion
           "  We use these keys to map our elements to the correct models.\n",
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."
+      ],
+      "metadata": {
+        "id": "yd92MC7YEsTf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "examples = [\n",
+        "    (\"This restaurant is awesome\", \"positive\"),\n",
+        "    (\"This restaurant is bad\", \"negative\"),\n",
+        "    (\"I feel fine\", \"neutral\"),\n",
+        "    (\"I love chocolate\", \"positive\"),\n",
+        "]"
+      ],
+      "metadata": {
+        "id": "5HAziWEavQws"
+      },
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can extract the actual sentiment of the example later."
+      ],
+      "metadata": {
+        "id": "r6GXL5PLFBY7"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FormatExamples(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
+        "  We will use these keyes to map our elements to the correct models.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:\n",
+        "    yield (f'distilbert-{element[1]}', element[0])\n",
+        "    yield (f'roberta-{element[1]}', element[0])"
+      ],
+      "metadata": {
+        "id": "p2uVwws8zRpg"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Using the formatted keys, we will define a `KeyedModelHandler` which maps keys to the ModelHandler we should use for those keys. `KeyedModelHandler` also allows you to define an optional `max_models_per_worker_hint` which will limit the number of models that can be held in a single worker process at once. This is useful if you are worried about your worker running out of memory. See https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler for more info on managing memory."
+      ],
+      "metadata": {
+        "id": "IP65_5nNGIb8"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "per_key_mhs = [\n",
+        "    KeyModelMapping(['distilbert-positive', 'distilbert-neutral', 'distilbert-negative'], distilbert_mh),\n",
+        "    KeyModelMapping(['roberta-positive', 'roberta-neutral', 'roberta-negative'], roberta_mh)\n",
+        "]\n",
+        "mh = KeyedModelHandler(per_key_mhs, max_models_per_worker_hint=2)"
+      ],
+      "metadata": {
+        "id": "DZpfjeGL2hMG"
+      },
+      "execution_count": 7,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Postprocess our results\n",
+        "\n",
+        "The `RunInference` transform returns a Tuple of the original key and a `PredictionResult` object that contains the original example and the inference. From that, we will extract the data we care about. We will then group this data by the original example in order to compare each model's prediction."
+      ],
+      "metadata": {
+        "id": "_a4ZmnD5FSeG"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class ExtractResults(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Extract the data we care about from the PredictionResult object.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, PredictionResult]) -> Iterable[Tuple[str, Dict[str, str]]]:\n",
+        "    actual_sentiment = element[0].split('-')[1]\n",
+        "    model = element[0].split('-')[0]\n",
+        "    result = element[1]\n",
+        "    example = result.example\n",
+        "    predicted_sentiment = result.inference[0]['label']\n",
+        "\n",
+        "    yield (example, {'model': model, 'actual_sentiment': actual_sentiment, 'predicted_sentiment': predicted_sentiment})"
+      ],
+      "metadata": {
+        "id": "FOwFNQA053TG"
+      },
+      "execution_count": 8,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Finally, we will print the results produced by each model."
+      ],
+      "metadata": {
+        "id": "JVnv4gGbFohk"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class PrintResults(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Print the results produced by each model along with the actual sentiment.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, Iterable[Dict[str, str]]]):\n",
+        "    example = element[0]\n",
+        "    actual_sentiment = element[1][0]['actual_sentiment']\n",
+        "    predicted_sentiment_1 = element[1][0]['predicted_sentiment']\n",
+        "    model_1 = element[1][0]['model']\n",
+        "    predicted_sentiment_2 = element[1][1]['predicted_sentiment']\n",
+        "    model_2 = element[1][1]['model']\n",
+        "\n",
+        "    if model_1 == 'distilbert':\n",
+        "      distilbert_prediction = predicted_sentiment_1\n",
+        "      roberta_prediction = predicted_sentiment_2\n",
+        "    else:\n",
+        "      roberta_prediction = predicted_sentiment_1\n",
+        "      distilbert_prediction = predicted_sentiment_2\n",
+        "\n",
+        "    print(f'Example: {example}\\nActual Sentiment: {actual_sentiment}\\n'\n",
+        "          f'Distilbert Prediction: {distilbert_prediction}\\n'\n",
+        "          f'Roberta Prediction: {roberta_prediction}\\n------------')"
+      ],
+      "metadata": {
+        "id": "kUQJNYOa9Q5-"
+      },
+      "execution_count": 9,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Run Your Pipeline\n",
+        "\n",
+        "We're now ready to put together all of the pieces into a single Beam pipeline!"

Review Comment:
   ```suggestion
           "Put together all of the pieces to run a single Apache Beam pipeline."
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",

Review Comment:
   ```suggestion
           "A model handler is the Apache Beam method used to define the configuration needed to load and invoke models. Because this example uses two models, we define two model handlers, one for each model. Because both models are incapsulated within Hugging Face pipelines, we use the model handler `HuggingFacePipelineModelHandler`.\n",
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",

Review Comment:
   ```suggestion
           "Running inference with multiple differently-trained models performing the same task is useful in many scenarios, including the following examples:\n",
           "\n",
           "* You want to compare the performance of multiple different models.\n",
           "* You have models trained on different datasets that you want to use conditionally based on additional metadata.\n",
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",

Review Comment:
   ```suggestion
           "# Run ML inference with multiple differently-trained models\n",
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",

Review Comment:
   ```suggestion
           "## Define the model configurations\n",
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"

Review Comment:
   ```suggestion
           "First, install both Apache Beam and the dependencies needed by Hugging Face."
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."

Review Comment:
   ```suggestion
           "This notebook demonstrates how to use a `KeyedModelHandler` to run inference in an Apache Beam model with multiple different models on a per-key basis. This notebook uses pretrained pipelines from Hugging Face. Before continuing with this notebook, it is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb)."
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."

Review Comment:
   ```suggestion
           "In this notebook, we load the models using Hugging Face and run them against an example. The models produce different outputs."
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",

Review Comment:
   ```suggestion
           "## Define the examples\n",
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."

Review Comment:
   ```suggestion
           "Next, define examples to input into the pipeline. The examples include their correct classifications."
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."
+      ],
+      "metadata": {
+        "id": "yd92MC7YEsTf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "examples = [\n",
+        "    (\"This restaurant is awesome\", \"positive\"),\n",
+        "    (\"This restaurant is bad\", \"negative\"),\n",
+        "    (\"I feel fine\", \"neutral\"),\n",
+        "    (\"I love chocolate\", \"positive\"),\n",
+        "]"
+      ],
+      "metadata": {
+        "id": "5HAziWEavQws"
+      },
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can extract the actual sentiment of the example later."
+      ],
+      "metadata": {
+        "id": "r6GXL5PLFBY7"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FormatExamples(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
+        "  We will use these keyes to map our elements to the correct models.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:\n",
+        "    yield (f'distilbert-{element[1]}', element[0])\n",
+        "    yield (f'roberta-{element[1]}', element[0])"
+      ],
+      "metadata": {
+        "id": "p2uVwws8zRpg"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Using the formatted keys, we will define a `KeyedModelHandler` which maps keys to the ModelHandler we should use for those keys. `KeyedModelHandler` also allows you to define an optional `max_models_per_worker_hint` which will limit the number of models that can be held in a single worker process at once. This is useful if you are worried about your worker running out of memory. See https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler for more info on managing memory."

Review Comment:
   ```suggestion
           "Use the formatted keys to define a `KeyedModelHandler` that maps keys to the `ModelHandler` used for those keys. The `KeyedModelHandler` method lets you define an optional `max_models_per_worker_hint`, which limits the number of models that can be held in a single worker process at one time. If you're worried about your worker running out of memory, use this option. For more information about managing memory, see [Use a keyed ModelHandler](https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler)."
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."
+      ],
+      "metadata": {
+        "id": "yd92MC7YEsTf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "examples = [\n",
+        "    (\"This restaurant is awesome\", \"positive\"),\n",
+        "    (\"This restaurant is bad\", \"negative\"),\n",
+        "    (\"I feel fine\", \"neutral\"),\n",
+        "    (\"I love chocolate\", \"positive\"),\n",
+        "]"
+      ],
+      "metadata": {
+        "id": "5HAziWEavQws"
+      },
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can extract the actual sentiment of the example later."

Review Comment:
   ```suggestion
           "To feed the examples into RunInference, you need distinct keys that can map to the model. In this case, to make it possible to extract the actual sentiment of the example later, define keys in the form `<model_name>-<actual_sentiment>`."
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."
+      ],
+      "metadata": {
+        "id": "yd92MC7YEsTf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "examples = [\n",
+        "    (\"This restaurant is awesome\", \"positive\"),\n",
+        "    (\"This restaurant is bad\", \"negative\"),\n",
+        "    (\"I feel fine\", \"neutral\"),\n",
+        "    (\"I love chocolate\", \"positive\"),\n",
+        "]"
+      ],
+      "metadata": {
+        "id": "5HAziWEavQws"
+      },
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can extract the actual sentiment of the example later."
+      ],
+      "metadata": {
+        "id": "r6GXL5PLFBY7"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FormatExamples(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
+        "  We will use these keyes to map our elements to the correct models.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:\n",
+        "    yield (f'distilbert-{element[1]}', element[0])\n",
+        "    yield (f'roberta-{element[1]}', element[0])"
+      ],
+      "metadata": {
+        "id": "p2uVwws8zRpg"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Using the formatted keys, we will define a `KeyedModelHandler` which maps keys to the ModelHandler we should use for those keys. `KeyedModelHandler` also allows you to define an optional `max_models_per_worker_hint` which will limit the number of models that can be held in a single worker process at once. This is useful if you are worried about your worker running out of memory. See https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler for more info on managing memory."
+      ],
+      "metadata": {
+        "id": "IP65_5nNGIb8"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "per_key_mhs = [\n",
+        "    KeyModelMapping(['distilbert-positive', 'distilbert-neutral', 'distilbert-negative'], distilbert_mh),\n",
+        "    KeyModelMapping(['roberta-positive', 'roberta-neutral', 'roberta-negative'], roberta_mh)\n",
+        "]\n",
+        "mh = KeyedModelHandler(per_key_mhs, max_models_per_worker_hint=2)"
+      ],
+      "metadata": {
+        "id": "DZpfjeGL2hMG"
+      },
+      "execution_count": 7,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Postprocess our results\n",
+        "\n",
+        "The `RunInference` transform returns a Tuple of the original key and a `PredictionResult` object that contains the original example and the inference. From that, we will extract the data we care about. We will then group this data by the original example in order to compare each model's prediction."
+      ],
+      "metadata": {
+        "id": "_a4ZmnD5FSeG"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class ExtractResults(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Extract the data we care about from the PredictionResult object.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, PredictionResult]) -> Iterable[Tuple[str, Dict[str, str]]]:\n",
+        "    actual_sentiment = element[0].split('-')[1]\n",
+        "    model = element[0].split('-')[0]\n",
+        "    result = element[1]\n",
+        "    example = result.example\n",
+        "    predicted_sentiment = result.inference[0]['label']\n",
+        "\n",
+        "    yield (example, {'model': model, 'actual_sentiment': actual_sentiment, 'predicted_sentiment': predicted_sentiment})"
+      ],
+      "metadata": {
+        "id": "FOwFNQA053TG"
+      },
+      "execution_count": 8,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Finally, we will print the results produced by each model."

Review Comment:
   ```suggestion
           "Finally, print the results produced by each model."
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."
+      ],
+      "metadata": {
+        "id": "yd92MC7YEsTf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "examples = [\n",
+        "    (\"This restaurant is awesome\", \"positive\"),\n",
+        "    (\"This restaurant is bad\", \"negative\"),\n",
+        "    (\"I feel fine\", \"neutral\"),\n",
+        "    (\"I love chocolate\", \"positive\"),\n",
+        "]"
+      ],
+      "metadata": {
+        "id": "5HAziWEavQws"
+      },
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can extract the actual sentiment of the example later."
+      ],
+      "metadata": {
+        "id": "r6GXL5PLFBY7"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FormatExamples(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
+        "  We will use these keyes to map our elements to the correct models.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:\n",
+        "    yield (f'distilbert-{element[1]}', element[0])\n",
+        "    yield (f'roberta-{element[1]}', element[0])"
+      ],
+      "metadata": {
+        "id": "p2uVwws8zRpg"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Using the formatted keys, we will define a `KeyedModelHandler` which maps keys to the ModelHandler we should use for those keys. `KeyedModelHandler` also allows you to define an optional `max_models_per_worker_hint` which will limit the number of models that can be held in a single worker process at once. This is useful if you are worried about your worker running out of memory. See https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler for more info on managing memory."
+      ],
+      "metadata": {
+        "id": "IP65_5nNGIb8"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "per_key_mhs = [\n",
+        "    KeyModelMapping(['distilbert-positive', 'distilbert-neutral', 'distilbert-negative'], distilbert_mh),\n",
+        "    KeyModelMapping(['roberta-positive', 'roberta-neutral', 'roberta-negative'], roberta_mh)\n",
+        "]\n",
+        "mh = KeyedModelHandler(per_key_mhs, max_models_per_worker_hint=2)"
+      ],
+      "metadata": {
+        "id": "DZpfjeGL2hMG"
+      },
+      "execution_count": 7,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Postprocess our results\n",

Review Comment:
   ```suggestion
           "## Postprocess the results\n",
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."
+      ],
+      "metadata": {
+        "id": "yd92MC7YEsTf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "examples = [\n",
+        "    (\"This restaurant is awesome\", \"positive\"),\n",
+        "    (\"This restaurant is bad\", \"negative\"),\n",
+        "    (\"I feel fine\", \"neutral\"),\n",
+        "    (\"I love chocolate\", \"positive\"),\n",
+        "]"
+      ],
+      "metadata": {
+        "id": "5HAziWEavQws"
+      },
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can extract the actual sentiment of the example later."
+      ],
+      "metadata": {
+        "id": "r6GXL5PLFBY7"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FormatExamples(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
+        "  We will use these keyes to map our elements to the correct models.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:\n",
+        "    yield (f'distilbert-{element[1]}', element[0])\n",
+        "    yield (f'roberta-{element[1]}', element[0])"
+      ],
+      "metadata": {
+        "id": "p2uVwws8zRpg"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Using the formatted keys, we will define a `KeyedModelHandler` which maps keys to the ModelHandler we should use for those keys. `KeyedModelHandler` also allows you to define an optional `max_models_per_worker_hint` which will limit the number of models that can be held in a single worker process at once. This is useful if you are worried about your worker running out of memory. See https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler for more info on managing memory."
+      ],
+      "metadata": {
+        "id": "IP65_5nNGIb8"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "per_key_mhs = [\n",
+        "    KeyModelMapping(['distilbert-positive', 'distilbert-neutral', 'distilbert-negative'], distilbert_mh),\n",
+        "    KeyModelMapping(['roberta-positive', 'roberta-neutral', 'roberta-negative'], roberta_mh)\n",
+        "]\n",
+        "mh = KeyedModelHandler(per_key_mhs, max_models_per_worker_hint=2)"
+      ],
+      "metadata": {
+        "id": "DZpfjeGL2hMG"
+      },
+      "execution_count": 7,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Postprocess our results\n",
+        "\n",
+        "The `RunInference` transform returns a Tuple of the original key and a `PredictionResult` object that contains the original example and the inference. From that, we will extract the data we care about. We will then group this data by the original example in order to compare each model's prediction."
+      ],
+      "metadata": {
+        "id": "_a4ZmnD5FSeG"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class ExtractResults(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Extract the data we care about from the PredictionResult object.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, PredictionResult]) -> Iterable[Tuple[str, Dict[str, str]]]:\n",
+        "    actual_sentiment = element[0].split('-')[1]\n",
+        "    model = element[0].split('-')[0]\n",
+        "    result = element[1]\n",
+        "    example = result.example\n",
+        "    predicted_sentiment = result.inference[0]['label']\n",
+        "\n",
+        "    yield (example, {'model': model, 'actual_sentiment': actual_sentiment, 'predicted_sentiment': predicted_sentiment})"
+      ],
+      "metadata": {
+        "id": "FOwFNQA053TG"
+      },
+      "execution_count": 8,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Finally, we will print the results produced by each model."
+      ],
+      "metadata": {
+        "id": "JVnv4gGbFohk"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class PrintResults(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Print the results produced by each model along with the actual sentiment.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, Iterable[Dict[str, str]]]):\n",
+        "    example = element[0]\n",
+        "    actual_sentiment = element[1][0]['actual_sentiment']\n",
+        "    predicted_sentiment_1 = element[1][0]['predicted_sentiment']\n",
+        "    model_1 = element[1][0]['model']\n",
+        "    predicted_sentiment_2 = element[1][1]['predicted_sentiment']\n",
+        "    model_2 = element[1][1]['model']\n",
+        "\n",
+        "    if model_1 == 'distilbert':\n",
+        "      distilbert_prediction = predicted_sentiment_1\n",
+        "      roberta_prediction = predicted_sentiment_2\n",
+        "    else:\n",
+        "      roberta_prediction = predicted_sentiment_1\n",
+        "      distilbert_prediction = predicted_sentiment_2\n",
+        "\n",
+        "    print(f'Example: {example}\\nActual Sentiment: {actual_sentiment}\\n'\n",
+        "          f'Distilbert Prediction: {distilbert_prediction}\\n'\n",
+        "          f'Roberta Prediction: {roberta_prediction}\\n------------')"
+      ],
+      "metadata": {
+        "id": "kUQJNYOa9Q5-"
+      },
+      "execution_count": 9,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Run Your Pipeline\n",

Review Comment:
   ```suggestion
           "## Run the pipeline\n",
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."
+      ],
+      "metadata": {
+        "id": "yd92MC7YEsTf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "examples = [\n",
+        "    (\"This restaurant is awesome\", \"positive\"),\n",
+        "    (\"This restaurant is bad\", \"negative\"),\n",
+        "    (\"I feel fine\", \"neutral\"),\n",
+        "    (\"I love chocolate\", \"positive\"),\n",
+        "]"
+      ],
+      "metadata": {
+        "id": "5HAziWEavQws"
+      },
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can extract the actual sentiment of the example later."
+      ],
+      "metadata": {
+        "id": "r6GXL5PLFBY7"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FormatExamples(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
+        "  We will use these keyes to map our elements to the correct models.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:\n",
+        "    yield (f'distilbert-{element[1]}', element[0])\n",
+        "    yield (f'roberta-{element[1]}', element[0])"
+      ],
+      "metadata": {
+        "id": "p2uVwws8zRpg"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Using the formatted keys, we will define a `KeyedModelHandler` which maps keys to the ModelHandler we should use for those keys. `KeyedModelHandler` also allows you to define an optional `max_models_per_worker_hint` which will limit the number of models that can be held in a single worker process at once. This is useful if you are worried about your worker running out of memory. See https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler for more info on managing memory."
+      ],
+      "metadata": {
+        "id": "IP65_5nNGIb8"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "per_key_mhs = [\n",
+        "    KeyModelMapping(['distilbert-positive', 'distilbert-neutral', 'distilbert-negative'], distilbert_mh),\n",
+        "    KeyModelMapping(['roberta-positive', 'roberta-neutral', 'roberta-negative'], roberta_mh)\n",
+        "]\n",
+        "mh = KeyedModelHandler(per_key_mhs, max_models_per_worker_hint=2)"
+      ],
+      "metadata": {
+        "id": "DZpfjeGL2hMG"
+      },
+      "execution_count": 7,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Postprocess our results\n",
+        "\n",
+        "The `RunInference` transform returns a Tuple of the original key and a `PredictionResult` object that contains the original example and the inference. From that, we will extract the data we care about. We will then group this data by the original example in order to compare each model's prediction."

Review Comment:
   ```suggestion
           "The `RunInference` transform returns the following items:\n",
           "\n",
           "* A tuple of the original key\n",
           "* A `PredictionResult` object that contains the original example and the inference\n",
           "\n",
           "Use those outputs to extract the relevant data. Then, to compare each model's prediction, group this data by the original example."
   ```



##########
examples/notebooks/beam-ml/per_key_models.ipynb:
##########
@@ -0,0 +1,597 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ],
+      "metadata": {
+        "id": "OsFaZscKSPvo"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Run ML Inference with Different Models Per Key\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/per_key_models.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "ZUSiAR62SgO8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Often users desire to run inference with many differently trained models performing the same task. This can be helpful if you are comparing the performance of multiple different models, or if you have models trained on different datasets which you would like to conditionally use based on additional metadata.\n",
+        "\n",
+        "In Apache Beam, the recommended way to run inference is with the `RunInference` transform. Using a `KeyedModelHandler`, you can efficiently run inference with O(100s) of models without worrying about managing memory yourself.\n",
+        "\n",
+        "This notebook demonstrates how you can use a `KeyedModelHandler` to run inference in a Beam model with multiple different models on a per key basis. This notebook uses pretrained pipelines pulled from Hugging Face. It is recommended that you walk through the [beginner RunInference notebook](https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb) before continuing with this notebook."
+      ],
+      "metadata": {
+        "id": "ZAVOrrW2An1n"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Install Dependencies\n",
+        "\n",
+        "We will first install Beam and some dependencies needed by Hugging Face"
+      ],
+      "metadata": {
+        "id": "_fNyheQoDgGt"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 11,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "B-ENznuJqArA",
+        "outputId": "f72963fc-82db-4d0d-9225-07f6b501e256"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            ""
+          ]
+        }
+      ],
+      "source": [
+        "!pip install apache_beam[gcp]>=2.51.0 --quiet\n",
+        "!pip install torch --quiet\n",
+        "!pip install transformers --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from typing import Dict\n",
+        "from typing import Iterable\n",
+        "from typing import Tuple\n",
+        "\n",
+        "from transformers import pipeline\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
+        "from apache_beam.ml.inference.base import KeyModelMapping\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.huggingface_inference import HuggingFacePipelineModelHandler\n",
+        "from apache_beam.ml.inference.base import RunInference"
+      ],
+      "metadata": {
+        "id": "wUmBEglvsOYW"
+      },
+      "execution_count": 1,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define Configuration for our Models\n",
+        "\n",
+        "A `ModelHandler` is Beam's method for defining the configuration needed to load and invoke your model. Since we want to use multiple models, we will define 2 ModelHandlers, one for each model we're using in this example. Since both models being used are incapsulated by Hugging Face pipelines, we will use `HuggingFacePipelineModelHandler`.\n",
+        "\n",
+        "In this notebook, we will also load the models using Hugging Face and run them against an example. Note that they produce different outputs."
+      ],
+      "metadata": {
+        "id": "uEqljVgCD7hx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_mh = HuggingFacePipelineModelHandler('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_mh = HuggingFacePipelineModelHandler('text-classification', model=\"roberta-large-mnli\")\n",
+        "\n",
+        "distilbert_pipe = pipeline('text-classification', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
+        "roberta_large_pipe = pipeline(model=\"roberta-large-mnli\")"
+      ],
+      "metadata": {
+        "id": "v2NJT5ZcxgH5",
+        "outputId": "3924d72e-5c49-477d-c50f-6d9098f5a4b2"
+      },
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "b7cb51663677434ca42de6b5e6f37420"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "3702756019854683a9dea9f8af0a29d0"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "52b9fdb51d514c2e8b9fa5813972ab01"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "eca24b7b7b1847c1aed6aa59a44ed63a"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)lve/main/config.json:   0%|          | 0.00/688 [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "4d4cfe9a0ca54897aa991420bee01ff9"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading model.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "aee85cd919d24125acff1663fba0b47c"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "0af8ad4eed2d49878fa83b5828d58a96"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "1ed943a51c704ab7a72101b5b6182772"
+            }
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "5b1dcbb533894267b184fd591d8ccdbc"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "distilbert_pipe(\"This restaurant is awesome\")"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "H3nYX9thy8ec",
+        "outputId": "826e3285-24b9-47a8-d2a6-835543fdcae7"
+      },
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'POSITIVE', 'score': 0.9998743534088135}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "roberta_large_pipe(\"This restaurant is awesome\")\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "IIfc94ODyjUg",
+        "outputId": "94ec8afb-ebfb-47ce-9813-48358741bc6b"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "[{'label': 'NEUTRAL', 'score': 0.7313134670257568}]"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Define our Examples\n",
+        "\n",
+        "Next, we will define some examples that we can input into our pipeline, along with their correct classifications."
+      ],
+      "metadata": {
+        "id": "yd92MC7YEsTf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "examples = [\n",
+        "    (\"This restaurant is awesome\", \"positive\"),\n",
+        "    (\"This restaurant is bad\", \"negative\"),\n",
+        "    (\"I feel fine\", \"neutral\"),\n",
+        "    (\"I love chocolate\", \"positive\"),\n",
+        "]"
+      ],
+      "metadata": {
+        "id": "5HAziWEavQws"
+      },
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "To feed our examples into RunInference, we need to have distinct keys that can easily map to our model. In this case, we will define keys of the form `<model_name>-<actual_sentiment>` so that we can extract the actual sentiment of the example later."
+      ],
+      "metadata": {
+        "id": "r6GXL5PLFBY7"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class FormatExamples(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Map each example to a tuple of ('<model_name>-<actual_sentiment>', 'example').\n",
+        "  We will use these keyes to map our elements to the correct models.\n",
+        "  \"\"\"\n",
+        "  def process(self, element: Tuple[str, str]) -> Iterable[Tuple[str, str]]:\n",
+        "    yield (f'distilbert-{element[1]}', element[0])\n",
+        "    yield (f'roberta-{element[1]}', element[0])"
+      ],
+      "metadata": {
+        "id": "p2uVwws8zRpg"
+      },
+      "execution_count": 6,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Using the formatted keys, we will define a `KeyedModelHandler` which maps keys to the ModelHandler we should use for those keys. `KeyedModelHandler` also allows you to define an optional `max_models_per_worker_hint` which will limit the number of models that can be held in a single worker process at once. This is useful if you are worried about your worker running out of memory. See https://beam.apache.org/documentation/sdks/python-machine-learning/index.html#use-a-keyed-modelhandler for more info on managing memory."
+      ],
+      "metadata": {
+        "id": "IP65_5nNGIb8"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "per_key_mhs = [\n",
+        "    KeyModelMapping(['distilbert-positive', 'distilbert-neutral', 'distilbert-negative'], distilbert_mh),\n",
+        "    KeyModelMapping(['roberta-positive', 'roberta-neutral', 'roberta-negative'], roberta_mh)\n",
+        "]\n",
+        "mh = KeyedModelHandler(per_key_mhs, max_models_per_worker_hint=2)"
+      ],
+      "metadata": {
+        "id": "DZpfjeGL2hMG"
+      },
+      "execution_count": 7,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Postprocess our results\n",
+        "\n",
+        "The `RunInference` transform returns a Tuple of the original key and a `PredictionResult` object that contains the original example and the inference. From that, we will extract the data we care about. We will then group this data by the original example in order to compare each model's prediction."
+      ],
+      "metadata": {
+        "id": "_a4ZmnD5FSeG"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "class ExtractResults(beam.DoFn):\n",
+        "  \"\"\"\n",
+        "  Extract the data we care about from the PredictionResult object.\n",

Review Comment:
   ```suggestion
           "  Extract the relevant data from the PredictionResult object.\n",
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm commented on pull request #28327: Add notebook for per key models

Posted by "damccorm (via GitHub)" <gi...@apache.org>.

damccorm commented on PR #28327:
URL: https://github.com/apache/beam/pull/28327#issuecomment-1708568950

   R: @riteshghorse @rszper 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org