You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "jaxpr (via GitHub)" <gi...@apache.org> on 2023/03/20 15:16:29 UTC

[GitHub] [beam] jaxpr opened a new pull request, #25904: Add XGBoost example notebook

jaxpr opened a new pull request, #25904:
URL: https://github.com/apache/beam/pull/25904

   This PR adds the notebook example for the XGBoostModelhandler
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jaxpr commented on pull request #25904: Add XGBoost example notebook

Posted by "jaxpr (via GitHub)" <gi...@apache.org>.
jaxpr commented on PR #25904:
URL: https://github.com/apache/beam/pull/25904#issuecomment-1494231250

   @damccorm I have updated this PR and resolved the comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #25904: Add XGBoost example notebook

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on code in PR #25904:
URL: https://github.com/apache/beam/pull/25904#discussion_r1156030359


##########
examples/notebooks/beam-ml/run_inference_xgboost.ipynb:
##########
@@ -0,0 +1,374 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "XobBB6Sv8mB3"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Apache Beam RunInference for XGBoost\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "DUGbrRuv89CS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see the [Machine Learning section of the Apache Beam documentation](https://beam.apache.org/documentation/ml/overview/).\n",
+        "\n",
+        "You can choose the appropriate model handler based on your input data type:\n",
+        "\n",
+        "- NumPy model handler\n",
+        "- Pandas DataFrame model handler\n",
+        "- Datatable model handler\n",
+        "- SciPy model handler\n",
+        "\n",
+        "With RunInference, these model handlers manage batching, vectorization, and prediction optimization for your XGBoost pipeline or model.\n",
+        "\n",
+        "This notebook demonstrates the following common RunInference patterns:\n",
+        "\n",
+        "- Generate predictions\n",
+        "- Postprocess results after RunInference\n",
+        "- One model to showcase classification of Iris flowers\n",
+        "- One regression model to showcase prediction of housing prices"
+      ],
+      "metadata": {
+        "id": "6nh2h-sIOAOg"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Before you begin\n",
+        "Complete the following setup steps:\n",
+        "- Install dependencies for Apache Beam."
+      ],
+      "metadata": {
+        "id": "nRCJBcTUOq1k"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# !pip install apache-beam[gcp,dataframe] --quiet\n",
+        "!pip install git+https://github.com/apache/beam.git"
+      ],
+      "metadata": {
+        "id": "gbmH329jOuj1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import xgboost\n",
+        "import apache_beam as beam\n",
+        "from sklearn.datasets import fetch_california_housing\n",
+        "from sklearn.datasets import load_iris\n",
+        "from sklearn.model_selection import train_test_split\n",
+        "\n",
+        "from apache_beam.ml.inference import RunInference\n",
+        "from apache_beam.ml.inference.xgboost_inference import XGBoostModelHandlerNumpy\n",
+        "from apache_beam.options.pipeline_options import PipelineOptions"
+      ],
+      "metadata": {
+        "id": "_O0BN_XqOwp1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "SEED = 999\n",
+        "CLASSIFICATION_MODEL_STATE = '/tmp/classification_model.json'\n",
+        "REGRESSION_MODEL_STATE = '/tmp/regression_model.json'"
+      ],
+      "metadata": {
+        "id": "ue_5a-oaO-Lz"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Load the data from scikit-learn and train XGBoost models\n",
+        "This section demonstrates the following steps:\n",
+        "1. Load the iris and Califorina Housing datasets from scikit-learn and create a classification and regression model.\n",
+        "2. Train the classification and regression model.\n",
+        "3. Save the models in a JSON file using `mode.save_model`. (https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html)\n",
+        "\n",
+        "In this example, you create two models, one to classify Iris flowers and one to predict housing prices in California."
+      ],
+      "metadata": {
+        "id": "74oE5pGgPE0M"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Train the classification model\n",
+        "iris_dataset = load_iris()\n",
+        "x_train_classification, x_test_classification, y_train_classification, y_test_classification = train_test_split(\n",
+        "    iris_dataset['data'], iris_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "booster = xgboost.XGBClassifier(\n",
+        "    n_estimators=2, max_depth=2, learning_rate=1, objective='binary:logistic')\n",
+        "booster.fit(x_train_classification, y_train_classification)\n",
+        "booster.save_model(CLASSIFICATION_MODEL_STATE)\n",
+        "\n",
+        "# Train the regression model\n",
+        "california_dataset = fetch_california_housing()\n",
+        "x_train_regression, x_test_regression, y_train_regression, y_test_regression = train_test_split(\n",
+        "    california_dataset['data'], california_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "model = xgboost.XGBRegressor(\n",
+        "    n_estimators=1000,\n",
+        "    max_depth=8,\n",
+        "    eta=0.1,\n",
+        "    subsample=0.75,\n",
+        "    colsample_bytree=0.8)\n",
+        "model.fit(x_train_regression, y_train_regression)\n",
+        "model.save_model(REGRESSION_MODEL_STATE)\n",
+        "\n",
+        "\n",
+        "# Reshape the test data as XGBoost expects a batch instead of a single element\n",
+        "# More information: https://xgboost.readthedocs.io/en/stable/prediction.html\n",
+        "x_test_classification = x_test.reshape(5, 6, 4)\n",
+        "x_test_regression = x_test_regression.reshape(258, 16, 8)"
+      ],
+      "metadata": {
+        "id": "KVSKt3pFPBnj"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Postprocessing helper functions"
+      ],
+      "metadata": {
+        "id": "VGQj-B1Abioq"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def translate_labels(inference_results: PredictionResult):\n",
+        "  \"\"\"\n",
+        "    Maps output values (0, 1 or 2) of the XGBoost Iris classification\n",
+        "    model to the names of the different Iris flowers.\n",
+        "    Args:\n",
+        "      inference_results: Array containing the outputs of the XGBoost Iris classification model\n",
+        "    \"\"\"\n",
+        "  return PredictionResult(\n",
+        "      inference_results.example,\n",
+        "      np.vectorize(['Setosa', 'Versicolour',\n",
+        "                    'Virginica'].__getitem__)(inference_results.inference))\n",
+        "\n",
+        "\n",
+        "class FlattenBatchPredictionResults(beam.DoFn):\n",
+        "  \"\"\"This function takes a batch (list) of\n",
+        "  PredictionResults as input and yield all elements\"\"\"\n",

Review Comment:
   ```suggestion
           "  \"\"\"This function takes a PredictionResult containing a batch (list) of\n",
           "  examples and predictions as input and yields all example/prediction pairs\"\"\"\n",
   ```



##########
examples/notebooks/beam-ml/run_inference_xgboost.ipynb:
##########
@@ -0,0 +1,374 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "XobBB6Sv8mB3"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Apache Beam RunInference for XGBoost\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "DUGbrRuv89CS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see the [Machine Learning section of the Apache Beam documentation](https://beam.apache.org/documentation/ml/overview/).\n",
+        "\n",
+        "You can choose the appropriate model handler based on your input data type:\n",
+        "\n",
+        "- NumPy model handler\n",
+        "- Pandas DataFrame model handler\n",
+        "- Datatable model handler\n",
+        "- SciPy model handler\n",
+        "\n",
+        "With RunInference, these model handlers manage batching, vectorization, and prediction optimization for your XGBoost pipeline or model.\n",
+        "\n",
+        "This notebook demonstrates the following common RunInference patterns:\n",
+        "\n",
+        "- Generate predictions\n",
+        "- Postprocess results after RunInference\n",
+        "- One model to showcase classification of Iris flowers\n",
+        "- One regression model to showcase prediction of housing prices"
+      ],
+      "metadata": {
+        "id": "6nh2h-sIOAOg"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Before you begin\n",
+        "Complete the following setup steps:\n",
+        "- Install dependencies for Apache Beam."
+      ],
+      "metadata": {
+        "id": "nRCJBcTUOq1k"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# !pip install apache-beam[gcp,dataframe] --quiet\n",
+        "!pip install git+https://github.com/apache/beam.git"
+      ],
+      "metadata": {
+        "id": "gbmH329jOuj1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import xgboost\n",
+        "import apache_beam as beam\n",
+        "from sklearn.datasets import fetch_california_housing\n",
+        "from sklearn.datasets import load_iris\n",
+        "from sklearn.model_selection import train_test_split\n",
+        "\n",
+        "from apache_beam.ml.inference import RunInference\n",
+        "from apache_beam.ml.inference.xgboost_inference import XGBoostModelHandlerNumpy\n",
+        "from apache_beam.options.pipeline_options import PipelineOptions"
+      ],
+      "metadata": {
+        "id": "_O0BN_XqOwp1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "SEED = 999\n",
+        "CLASSIFICATION_MODEL_STATE = '/tmp/classification_model.json'\n",
+        "REGRESSION_MODEL_STATE = '/tmp/regression_model.json'"
+      ],
+      "metadata": {
+        "id": "ue_5a-oaO-Lz"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Load the data from scikit-learn and train XGBoost models\n",
+        "This section demonstrates the following steps:\n",
+        "1. Load the iris and Califorina Housing datasets from scikit-learn and create a classification and regression model.\n",
+        "2. Train the classification and regression model.\n",
+        "3. Save the models in a JSON file using `mode.save_model`. (https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html)\n",
+        "\n",
+        "In this example, you create two models, one to classify Iris flowers and one to predict housing prices in California."
+      ],
+      "metadata": {
+        "id": "74oE5pGgPE0M"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Train the classification model\n",
+        "iris_dataset = load_iris()\n",
+        "x_train_classification, x_test_classification, y_train_classification, y_test_classification = train_test_split(\n",
+        "    iris_dataset['data'], iris_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "booster = xgboost.XGBClassifier(\n",
+        "    n_estimators=2, max_depth=2, learning_rate=1, objective='binary:logistic')\n",
+        "booster.fit(x_train_classification, y_train_classification)\n",
+        "booster.save_model(CLASSIFICATION_MODEL_STATE)\n",
+        "\n",
+        "# Train the regression model\n",
+        "california_dataset = fetch_california_housing()\n",
+        "x_train_regression, x_test_regression, y_train_regression, y_test_regression = train_test_split(\n",
+        "    california_dataset['data'], california_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "model = xgboost.XGBRegressor(\n",
+        "    n_estimators=1000,\n",
+        "    max_depth=8,\n",
+        "    eta=0.1,\n",
+        "    subsample=0.75,\n",
+        "    colsample_bytree=0.8)\n",
+        "model.fit(x_train_regression, y_train_regression)\n",
+        "model.save_model(REGRESSION_MODEL_STATE)\n",
+        "\n",
+        "\n",
+        "# Reshape the test data as XGBoost expects a batch instead of a single element\n",
+        "# More information: https://xgboost.readthedocs.io/en/stable/prediction.html\n",
+        "x_test_classification = x_test.reshape(5, 6, 4)\n",
+        "x_test_regression = x_test_regression.reshape(258, 16, 8)"
+      ],
+      "metadata": {
+        "id": "KVSKt3pFPBnj"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Postprocessing helper functions"
+      ],
+      "metadata": {
+        "id": "VGQj-B1Abioq"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def translate_labels(inference_results: PredictionResult):\n",
+        "  \"\"\"\n",
+        "    Maps output values (0, 1 or 2) of the XGBoost Iris classification\n",
+        "    model to the names of the different Iris flowers.\n",
+        "    Args:\n",
+        "      inference_results: Array containing the outputs of the XGBoost Iris classification model\n",
+        "    \"\"\"\n",
+        "  return PredictionResult(\n",
+        "      inference_results.example,\n",
+        "      np.vectorize(['Setosa', 'Versicolour',\n",
+        "                    'Virginica'].__getitem__)(inference_results.inference))\n",
+        "\n",
+        "\n",
+        "class FlattenBatchPredictionResults(beam.DoFn):\n",
+        "  \"\"\"This function takes a batch (list) of\n",
+        "  PredictionResults as input and yield all elements\"\"\"\n",
+        "  def process(self, batch_prediction_result: PredictionResult):\n",
+        "    for example, inference in zip(batch_prediction_result.example, batch_prediction_result.inference):\n",
+        "      yield PredictionResult(\n",
+        "          example, inference, batch_prediction_result.model_id)\n"

Review Comment:
   Instead of yielding the result, how about we just print it here (rather than having an extra map to do that everywhere)?



##########
examples/notebooks/beam-ml/run_inference_xgboost.ipynb:
##########
@@ -0,0 +1,374 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "XobBB6Sv8mB3"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Apache Beam RunInference for XGBoost\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "DUGbrRuv89CS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see the [Machine Learning section of the Apache Beam documentation](https://beam.apache.org/documentation/ml/overview/).\n",
+        "\n",
+        "You can choose the appropriate model handler based on your input data type:\n",
+        "\n",
+        "- NumPy model handler\n",
+        "- Pandas DataFrame model handler\n",
+        "- Datatable model handler\n",
+        "- SciPy model handler\n",
+        "\n",
+        "With RunInference, these model handlers manage batching, vectorization, and prediction optimization for your XGBoost pipeline or model.\n",
+        "\n",
+        "This notebook demonstrates the following common RunInference patterns:\n",
+        "\n",
+        "- Generate predictions\n",
+        "- Postprocess results after RunInference\n",
+        "- One model to showcase classification of Iris flowers\n",
+        "- One regression model to showcase prediction of housing prices"
+      ],
+      "metadata": {
+        "id": "6nh2h-sIOAOg"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Before you begin\n",
+        "Complete the following setup steps:\n",
+        "- Install dependencies for Apache Beam."
+      ],
+      "metadata": {
+        "id": "nRCJBcTUOq1k"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# !pip install apache-beam[gcp,dataframe] --quiet\n",
+        "!pip install git+https://github.com/apache/beam.git"

Review Comment:
   ```suggestion
           "!pip install git+https://github.com/apache/beam.git"
   ```
   
   The comment here doesn't add anything for a user. I filed https://github.com/apache/beam/issues/26077 to update once 2.47 is released



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #25904: Add XGBoost example notebook

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on code in PR #25904:
URL: https://github.com/apache/beam/pull/25904#discussion_r1149469575


##########
examples/notebooks/beam-ml/run_inference_xgboost.ipynb:
##########
@@ -0,0 +1,321 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "XobBB6Sv8mB3"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Apache Beam RunInference for XGBoost\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "DUGbrRuv89CS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see Machine Learning in the Apache Beam documentation.\n",

Review Comment:
   ```suggestion
           "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see the [Machine Learning section of the Apache Beam documentation](https://beam.apache.org/documentation/ml/overview/).\n",
   ```



##########
examples/notebooks/beam-ml/run_inference_xgboost.ipynb:
##########
@@ -0,0 +1,321 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "XobBB6Sv8mB3"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Apache Beam RunInference for XGBoost\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "DUGbrRuv89CS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see Machine Learning in the Apache Beam documentation.\n",
+        "\n",
+        "You can choose the appropriate model handler based on your input data type:\n",
+        "\n",
+        "- NumPy model handler\n",
+        "- Pandas DataFrame model handler\n",
+        "- Datatable model handler\n",
+        "- SciPy model handler\n",
+        "\n",
+        "With RunInference, these model handlers manage batching, vectorization, and prediction optimization for your XGBoost pipeline or model.\n",
+        "\n",
+        "This notebook demonstrates the following common RunInference patterns:\n",
+        "\n",
+        "Generate predictions.\n",
+        "Postprocess results after RunInference.\n",
+        "One model to showcase classification of Iris flowers and one regression model to showcase prediction of housing prices"
+      ],
+      "metadata": {
+        "id": "6nh2h-sIOAOg"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Before you begin\n",
+        "Complete the following setup steps:\n",

Review Comment:
   ```suggestion
           "Before you begin, complete the following setup steps:\n",
   ```
   
   Not sure why, but this renders incorrectly in colab and GitHub (the trailing `\n` isn't respected).
   
   <img width="406" alt="image" src="https://user-images.githubusercontent.com/42773683/227998497-36143a83-85f6-4139-9ad0-a7ae1a806b30.png">
   



##########
examples/notebooks/beam-ml/run_inference_xgboost.ipynb:
##########
@@ -0,0 +1,321 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "XobBB6Sv8mB3"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Apache Beam RunInference for XGBoost\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "DUGbrRuv89CS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see Machine Learning in the Apache Beam documentation.\n",
+        "\n",
+        "You can choose the appropriate model handler based on your input data type:\n",
+        "\n",
+        "- NumPy model handler\n",
+        "- Pandas DataFrame model handler\n",
+        "- Datatable model handler\n",
+        "- SciPy model handler\n",
+        "\n",
+        "With RunInference, these model handlers manage batching, vectorization, and prediction optimization for your XGBoost pipeline or model.\n",
+        "\n",
+        "This notebook demonstrates the following common RunInference patterns:\n",
+        "\n",
+        "Generate predictions.\n",
+        "Postprocess results after RunInference.\n",
+        "One model to showcase classification of Iris flowers and one regression model to showcase prediction of housing prices"
+      ],
+      "metadata": {
+        "id": "6nh2h-sIOAOg"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Before you begin\n",
+        "Complete the following setup steps:\n",
+        "\n",
+        "- Install dependencies for Apache Beam."
+      ],
+      "metadata": {
+        "id": "nRCJBcTUOq1k"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install apache-beam[gcp,dataframe] --quiet"
+      ],
+      "metadata": {
+        "id": "gbmH329jOuj1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import xgboost\n",
+        "import apache_beam as beam\n",
+        "from sklearn.datasets import fetch_california_housing\n",
+        "from sklearn.datasets import load_iris\n",
+        "from sklearn.model_selection import train_test_split\n",
+        "\n",
+        "from apache_beam.ml.inference import RunInference\n",
+        "from apache_beam.ml.inference.xgboost_inference import XGBoostModelHandlerNumpy\n",
+        "from apache_beam.options.pipeline_options import PipelineOptions"
+      ],
+      "metadata": {
+        "id": "_O0BN_XqOwp1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "SEED = 999\n",
+        "CLASSIFICATION_MODEL_STATE = '/tmp/classification_model.json'\n",
+        "REGRESSION_MODEL_STATE = '/tmp/regression_model.json'"
+      ],
+      "metadata": {
+        "id": "ue_5a-oaO-Lz"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Load the data from scikit-learn and train XGBoost models\n",
+        "This section demonstrates the following steps:\n",
+        "1. Load the iris and Califorina Housing datasets from scikit-learn and create a classification and regression model.\n",
+        "2. Train the classification and regression model.\n",
+        "3. Save the models in a JSON file using `mode.save_model`. (https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html)\n",
+        "\n",
+        "In this example, you create two models, one to classify Iris flowers and one to predict housing prices in California."
+      ],
+      "metadata": {
+        "id": "74oE5pGgPE0M"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Train the classification model\n",
+        "iris_dataset = load_iris()\n",
+        "x_train_classification, x_test_classification, y_train_classification, y_test_classification = train_test_split(\n",
+        "    iris_dataset['data'], iris_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "booster = xgboost.XGBClassifier(\n",
+        "    n_estimators=2, max_depth=2, learning_rate=1, objective='binary:logistic')\n",
+        "booster.fit(x_train_classification, y_train_classification)\n",
+        "booster.save_model(CLASSIFICATION_MODEL_STATE)\n",
+        "\n",
+        "# Train the regression model\n",
+        "california_dataset = fetch_california_housing()\n",
+        "x_train_regression, x_test_regression, y_train_regression, y_test_regression = train_test_split(\n",
+        "    california_dataset['data'], california_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "model = xgboost.XGBRegressor(\n",
+        "    n_estimators=1000,\n",
+        "    max_depth=8,\n",
+        "    eta=0.1,\n",
+        "    subsample=0.75,\n",
+        "    colsample_bytree=0.8)\n",
+        "model.fit(x_train_regression, y_train_regression)\n",
+        "model.save_model(REGRESSION_MODEL_STATE)\n",
+        "\n",
+        "\n",
+        "# Reshape the test data as XGBoost expects a batch instead of a single element\n",
+        "# More information: https://xgboost.readthedocs.io/en/stable/prediction.html\n",
+        "x_test_classification = x_test.reshape(5, 6, 4)\n",
+        "x_test_regression = x_test_regression.reshape(258, 16, 8)"
+      ],
+      "metadata": {
+        "id": "KVSKt3pFPBnj"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Create a scikit-learn RunInference pipeline\n",
+        "This section demonstrates how to do the following:\n",
+        "1. Define a XGBoost model handler that accepts an `numpy.ndarray` object as input.\n",
+        "2. Load the data from the datasets.\n",
+        "3. Use the XGBoost trained models and the XGBoost RunInference transform on unkeyed data."
+      ],
+      "metadata": {
+        "id": "ItuxdQoXSNTQ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "xgboost_classification_model_handler = XGBoostModelHandlerNumpy(\n",
+        "    model_class=xgboost.XGBClassifier, model_state=CLASSIFICATION_MODEL_STATE)\n",
+        "\n",
+        "pipeline_options = PipelineOptions().from_dictionary({})\n",
+        "\n",
+        "with beam.Pipeline(options=pipeline_options) as p:\n",
+        "  (\n",
+        "      p\n",
+        "      | \"Load Data\" >> beam.Create(x_test_classification)\n",
+        "      | \"RunInferenceXGBoost\" >>\n",
+        "      RunInference(model_handler=xgboost_classification_model_handler)\n",
+        "      | beam.Map(print))"
+      ],
+      "metadata": {
+        "id": "SBdMq3-CSGqZ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "xgboost_regression_model_handler = XGBoostModelHandlerNumpy(\n",
+        "    model_class=xgboost.XGBRegressor, model_state=REGRESSION_MODEL_STATE)\n",
+        "\n",
+        "pipeline_options = PipelineOptions().from_dictionary({})\n",
+        "\n",
+        "with beam.Pipeline(options=pipeline_options) as p:\n",
+        "  (\n",
+        "      p\n",
+        "      | \"Load Data\" >> beam.Create(x_test_regression)\n",
+        "      | \"RunInferenceSklearn\" >>\n",

Review Comment:
   ```suggestion
           "      | \"RunInferenceXGBoost\" >>\n",
   ```



##########
examples/notebooks/beam-ml/run_inference_xgboost.ipynb:
##########
@@ -0,0 +1,321 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "XobBB6Sv8mB3"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Apache Beam RunInference for XGBoost\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "DUGbrRuv89CS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see Machine Learning in the Apache Beam documentation.\n",
+        "\n",
+        "You can choose the appropriate model handler based on your input data type:\n",
+        "\n",
+        "- NumPy model handler\n",
+        "- Pandas DataFrame model handler\n",
+        "- Datatable model handler\n",
+        "- SciPy model handler\n",
+        "\n",
+        "With RunInference, these model handlers manage batching, vectorization, and prediction optimization for your XGBoost pipeline or model.\n",
+        "\n",
+        "This notebook demonstrates the following common RunInference patterns:\n",
+        "\n",
+        "Generate predictions.\n",
+        "Postprocess results after RunInference.\n",
+        "One model to showcase classification of Iris flowers and one regression model to showcase prediction of housing prices"
+      ],
+      "metadata": {
+        "id": "6nh2h-sIOAOg"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Before you begin\n",
+        "Complete the following setup steps:\n",
+        "\n",
+        "- Install dependencies for Apache Beam."
+      ],
+      "metadata": {
+        "id": "nRCJBcTUOq1k"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install apache-beam[gcp,dataframe] --quiet"
+      ],
+      "metadata": {
+        "id": "gbmH329jOuj1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import xgboost\n",
+        "import apache_beam as beam\n",
+        "from sklearn.datasets import fetch_california_housing\n",
+        "from sklearn.datasets import load_iris\n",
+        "from sklearn.model_selection import train_test_split\n",
+        "\n",
+        "from apache_beam.ml.inference import RunInference\n",
+        "from apache_beam.ml.inference.xgboost_inference import XGBoostModelHandlerNumpy\n",
+        "from apache_beam.options.pipeline_options import PipelineOptions"
+      ],
+      "metadata": {
+        "id": "_O0BN_XqOwp1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "SEED = 999\n",
+        "CLASSIFICATION_MODEL_STATE = '/tmp/classification_model.json'\n",
+        "REGRESSION_MODEL_STATE = '/tmp/regression_model.json'"
+      ],
+      "metadata": {
+        "id": "ue_5a-oaO-Lz"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Load the data from scikit-learn and train XGBoost models\n",
+        "This section demonstrates the following steps:\n",
+        "1. Load the iris and Califorina Housing datasets from scikit-learn and create a classification and regression model.\n",
+        "2. Train the classification and regression model.\n",
+        "3. Save the models in a JSON file using `mode.save_model`. (https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html)\n",
+        "\n",
+        "In this example, you create two models, one to classify Iris flowers and one to predict housing prices in California."
+      ],
+      "metadata": {
+        "id": "74oE5pGgPE0M"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Train the classification model\n",
+        "iris_dataset = load_iris()\n",
+        "x_train_classification, x_test_classification, y_train_classification, y_test_classification = train_test_split(\n",
+        "    iris_dataset['data'], iris_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "booster = xgboost.XGBClassifier(\n",
+        "    n_estimators=2, max_depth=2, learning_rate=1, objective='binary:logistic')\n",
+        "booster.fit(x_train_classification, y_train_classification)\n",
+        "booster.save_model(CLASSIFICATION_MODEL_STATE)\n",
+        "\n",
+        "# Train the regression model\n",
+        "california_dataset = fetch_california_housing()\n",
+        "x_train_regression, x_test_regression, y_train_regression, y_test_regression = train_test_split(\n",
+        "    california_dataset['data'], california_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "model = xgboost.XGBRegressor(\n",
+        "    n_estimators=1000,\n",
+        "    max_depth=8,\n",
+        "    eta=0.1,\n",
+        "    subsample=0.75,\n",
+        "    colsample_bytree=0.8)\n",
+        "model.fit(x_train_regression, y_train_regression)\n",
+        "model.save_model(REGRESSION_MODEL_STATE)\n",
+        "\n",
+        "\n",
+        "# Reshape the test data as XGBoost expects a batch instead of a single element\n",
+        "# More information: https://xgboost.readthedocs.io/en/stable/prediction.html\n",
+        "x_test_classification = x_test.reshape(5, 6, 4)\n",
+        "x_test_regression = x_test_regression.reshape(258, 16, 8)"
+      ],
+      "metadata": {
+        "id": "KVSKt3pFPBnj"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Create a scikit-learn RunInference pipeline\n",
+        "This section demonstrates how to do the following:\n",
+        "1. Define a XGBoost model handler that accepts an `numpy.ndarray` object as input.\n",
+        "2. Load the data from the datasets.\n",
+        "3. Use the XGBoost trained models and the XGBoost RunInference transform on unkeyed data."
+      ],
+      "metadata": {
+        "id": "ItuxdQoXSNTQ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "xgboost_classification_model_handler = XGBoostModelHandlerNumpy(\n",
+        "    model_class=xgboost.XGBClassifier, model_state=CLASSIFICATION_MODEL_STATE)\n",
+        "\n",
+        "pipeline_options = PipelineOptions().from_dictionary({})\n",
+        "\n",
+        "with beam.Pipeline(options=pipeline_options) as p:\n",
+        "  (\n",
+        "      p\n",
+        "      | \"Load Data\" >> beam.Create(x_test_classification)\n",
+        "      | \"RunInferenceXGBoost\" >>\n",
+        "      RunInference(model_handler=xgboost_classification_model_handler)\n",
+        "      | beam.Map(print))"
+      ],
+      "metadata": {
+        "id": "SBdMq3-CSGqZ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "xgboost_regression_model_handler = XGBoostModelHandlerNumpy(\n",
+        "    model_class=xgboost.XGBRegressor, model_state=REGRESSION_MODEL_STATE)\n",
+        "\n",
+        "pipeline_options = PipelineOptions().from_dictionary({})\n",
+        "\n",
+        "with beam.Pipeline(options=pipeline_options) as p:\n",
+        "  (\n",
+        "      p\n",
+        "      | \"Load Data\" >> beam.Create(x_test_regression)\n",
+        "      | \"RunInferenceSklearn\" >>\n",
+        "      RunInference(model_handler=xgboost_regression_model_handler)\n",
+        "      | beam.Map(print))"
+      ],
+      "metadata": {
+        "id": "IYUXIJt7UIm6"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Use XGBoost RunInference on keyed inputs\n",
+        "This section demonstrates how to do the following:\n",
+        "1. Wrap the `XGBoostHandlerNumpy` object around `KeyedModelHandler` to handle keyed data.\n",

Review Comment:
   ```suggestion
           "1. Wrap the `XGBoostHandlerNumpy` with a `KeyedModelHandler` to handle keyed data.\n",
   ```



##########
examples/notebooks/beam-ml/run_inference_xgboost.ipynb:
##########
@@ -0,0 +1,321 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "XobBB6Sv8mB3"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Apache Beam RunInference for XGBoost\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "DUGbrRuv89CS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see Machine Learning in the Apache Beam documentation.\n",
+        "\n",
+        "You can choose the appropriate model handler based on your input data type:\n",
+        "\n",
+        "- NumPy model handler\n",
+        "- Pandas DataFrame model handler\n",
+        "- Datatable model handler\n",
+        "- SciPy model handler\n",
+        "\n",
+        "With RunInference, these model handlers manage batching, vectorization, and prediction optimization for your XGBoost pipeline or model.\n",
+        "\n",
+        "This notebook demonstrates the following common RunInference patterns:\n",
+        "\n",
+        "Generate predictions.\n",
+        "Postprocess results after RunInference.\n",
+        "One model to showcase classification of Iris flowers and one regression model to showcase prediction of housing prices"

Review Comment:
   Actually, it looks like we don't really do any postprocessing anyways (which is fine, printing is enough). So maybe we can just cut this and say `This notebook uses RunInference to perform classification of Iris flowers and regression to predict housing prices`.



##########
examples/notebooks/beam-ml/run_inference_xgboost.ipynb:
##########
@@ -0,0 +1,321 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "XobBB6Sv8mB3"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Apache Beam RunInference for XGBoost\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "DUGbrRuv89CS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see Machine Learning in the Apache Beam documentation.\n",
+        "\n",
+        "You can choose the appropriate model handler based on your input data type:\n",
+        "\n",
+        "- NumPy model handler\n",
+        "- Pandas DataFrame model handler\n",
+        "- Datatable model handler\n",
+        "- SciPy model handler\n",
+        "\n",
+        "With RunInference, these model handlers manage batching, vectorization, and prediction optimization for your XGBoost pipeline or model.\n",
+        "\n",
+        "This notebook demonstrates the following common RunInference patterns:\n",
+        "\n",
+        "Generate predictions.\n",
+        "Postprocess results after RunInference.\n",
+        "One model to showcase classification of Iris flowers and one regression model to showcase prediction of housing prices"
+      ],
+      "metadata": {
+        "id": "6nh2h-sIOAOg"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Before you begin\n",
+        "Complete the following setup steps:\n",
+        "\n",
+        "- Install dependencies for Apache Beam."
+      ],
+      "metadata": {
+        "id": "nRCJBcTUOq1k"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install apache-beam[gcp,dataframe] --quiet"
+      ],
+      "metadata": {
+        "id": "gbmH329jOuj1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import xgboost\n",
+        "import apache_beam as beam\n",
+        "from sklearn.datasets import fetch_california_housing\n",
+        "from sklearn.datasets import load_iris\n",
+        "from sklearn.model_selection import train_test_split\n",
+        "\n",
+        "from apache_beam.ml.inference import RunInference\n",
+        "from apache_beam.ml.inference.xgboost_inference import XGBoostModelHandlerNumpy\n",
+        "from apache_beam.options.pipeline_options import PipelineOptions"
+      ],
+      "metadata": {
+        "id": "_O0BN_XqOwp1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "SEED = 999\n",
+        "CLASSIFICATION_MODEL_STATE = '/tmp/classification_model.json'\n",
+        "REGRESSION_MODEL_STATE = '/tmp/regression_model.json'"
+      ],
+      "metadata": {
+        "id": "ue_5a-oaO-Lz"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Load the data from scikit-learn and train XGBoost models\n",
+        "This section demonstrates the following steps:\n",
+        "1. Load the iris and Califorina Housing datasets from scikit-learn and create a classification and regression model.\n",
+        "2. Train the classification and regression model.\n",
+        "3. Save the models in a JSON file using `mode.save_model`. (https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html)\n",
+        "\n",
+        "In this example, you create two models, one to classify Iris flowers and one to predict housing prices in California."
+      ],
+      "metadata": {
+        "id": "74oE5pGgPE0M"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Train the classification model\n",
+        "iris_dataset = load_iris()\n",
+        "x_train_classification, x_test_classification, y_train_classification, y_test_classification = train_test_split(\n",
+        "    iris_dataset['data'], iris_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "booster = xgboost.XGBClassifier(\n",
+        "    n_estimators=2, max_depth=2, learning_rate=1, objective='binary:logistic')\n",
+        "booster.fit(x_train_classification, y_train_classification)\n",
+        "booster.save_model(CLASSIFICATION_MODEL_STATE)\n",
+        "\n",
+        "# Train the regression model\n",
+        "california_dataset = fetch_california_housing()\n",
+        "x_train_regression, x_test_regression, y_train_regression, y_test_regression = train_test_split(\n",
+        "    california_dataset['data'], california_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "model = xgboost.XGBRegressor(\n",
+        "    n_estimators=1000,\n",
+        "    max_depth=8,\n",
+        "    eta=0.1,\n",
+        "    subsample=0.75,\n",
+        "    colsample_bytree=0.8)\n",
+        "model.fit(x_train_regression, y_train_regression)\n",
+        "model.save_model(REGRESSION_MODEL_STATE)\n",
+        "\n",
+        "\n",
+        "# Reshape the test data as XGBoost expects a batch instead of a single element\n",
+        "# More information: https://xgboost.readthedocs.io/en/stable/prediction.html\n",
+        "x_test_classification = x_test.reshape(5, 6, 4)\n",
+        "x_test_regression = x_test_regression.reshape(258, 16, 8)"
+      ],
+      "metadata": {
+        "id": "KVSKt3pFPBnj"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Create a scikit-learn RunInference pipeline\n",
+        "This section demonstrates how to do the following:\n",
+        "1. Define a XGBoost model handler that accepts an `numpy.ndarray` object as input.\n",
+        "2. Load the data from the datasets.\n",
+        "3. Use the XGBoost trained models and the XGBoost RunInference transform on unkeyed data."
+      ],
+      "metadata": {
+        "id": "ItuxdQoXSNTQ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "xgboost_classification_model_handler = XGBoostModelHandlerNumpy(\n",
+        "    model_class=xgboost.XGBClassifier, model_state=CLASSIFICATION_MODEL_STATE)\n",
+        "\n",
+        "pipeline_options = PipelineOptions().from_dictionary({})\n",
+        "\n",
+        "with beam.Pipeline(options=pipeline_options) as p:\n",
+        "  (\n",
+        "      p\n",
+        "      | \"Load Data\" >> beam.Create(x_test_classification)\n",
+        "      | \"RunInferenceXGBoost\" >>\n",
+        "      RunInference(model_handler=xgboost_classification_model_handler)\n",
+        "      | beam.Map(print))"
+      ],
+      "metadata": {
+        "id": "SBdMq3-CSGqZ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "xgboost_regression_model_handler = XGBoostModelHandlerNumpy(\n",
+        "    model_class=xgboost.XGBRegressor, model_state=REGRESSION_MODEL_STATE)\n",
+        "\n",
+        "pipeline_options = PipelineOptions().from_dictionary({})\n",
+        "\n",
+        "with beam.Pipeline(options=pipeline_options) as p:\n",
+        "  (\n",
+        "      p\n",
+        "      | \"Load Data\" >> beam.Create(x_test_regression)\n",
+        "      | \"RunInferenceSklearn\" >>\n",
+        "      RunInference(model_handler=xgboost_regression_model_handler)\n",
+        "      | beam.Map(print))"
+      ],
+      "metadata": {
+        "id": "IYUXIJt7UIm6"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Use XGBoost RunInference on keyed inputs\n",
+        "This section demonstrates how to do the following:\n",
+        "1. Wrap the `XGBoostHandlerNumpy` object around `KeyedModelHandler` to handle keyed data.\n",
+        "2. Load the data from the datasets.\n",
+        "3. Use the XGBoost trained models and the XGBoost RunInference transform on the keyed data."
+      ],
+      "metadata": {
+        "id": "ptTZUGmqW4s2"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "x_test_classification = [(f'batch {i}', sample) for i, sample in enumerate(x_test_classification)]\n",
+        "x_test_regression = [(f'batch {i}', sample for i, sample in enumerate(x_test_regression)]"
+      ],
+      "metadata": {
+        "id": "MBSbY569W3zm"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "keyed_xgboost_regression_model_handler = KeyedModelHandler(xgboost_classification_model_handler)\n",
+        "\n",
+        "pipeline_options = PipelineOptions().from_dictionary({})\n",
+        "\n",
+        "with beam.Pipeline(options=pipeline_options) as p:\n",
+        "  (\n",
+        "      p\n",
+        "      | \"Load Data\" >> beam.Create(x_test_classification)\n",
+        "      | \"RunInferenceXGBoost\" >>\n",
+        "      RunInference(model_handler=keyed_xgboost_regression_model_handler)\n",
+        "      | beam.Map(print))"
+      ],
+      "metadata": {
+        "id": "8L7sU7a5YXrI"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "keyed_xgboost_regression_model_handler = KeyedModelHandler(xgboost_regression_model_handler)\n",
+        "\n",
+        "\n",

Review Comment:
   ```suggestion
           "keyed_xgboost_regression_model_handler = KeyedModelHandler(xgboost_regression_model_handler)\n",
           "\n",
   ```
   
   Nit, inconsistent spacing with other examples



##########
examples/notebooks/beam-ml/run_inference_xgboost.ipynb:
##########
@@ -0,0 +1,321 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "XobBB6Sv8mB3"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Apache Beam RunInference for XGBoost\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "DUGbrRuv89CS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see Machine Learning in the Apache Beam documentation.\n",
+        "\n",
+        "You can choose the appropriate model handler based on your input data type:\n",
+        "\n",
+        "- NumPy model handler\n",
+        "- Pandas DataFrame model handler\n",
+        "- Datatable model handler\n",
+        "- SciPy model handler\n",
+        "\n",
+        "With RunInference, these model handlers manage batching, vectorization, and prediction optimization for your XGBoost pipeline or model.\n",
+        "\n",
+        "This notebook demonstrates the following common RunInference patterns:\n",
+        "\n",
+        "Generate predictions.\n",
+        "Postprocess results after RunInference.\n",
+        "One model to showcase classification of Iris flowers and one regression model to showcase prediction of housing prices"
+      ],
+      "metadata": {
+        "id": "6nh2h-sIOAOg"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Before you begin\n",
+        "Complete the following setup steps:\n",
+        "\n",
+        "- Install dependencies for Apache Beam."
+      ],
+      "metadata": {
+        "id": "nRCJBcTUOq1k"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install apache-beam[gcp,dataframe] --quiet"
+      ],
+      "metadata": {
+        "id": "gbmH329jOuj1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import xgboost\n",
+        "import apache_beam as beam\n",
+        "from sklearn.datasets import fetch_california_housing\n",
+        "from sklearn.datasets import load_iris\n",
+        "from sklearn.model_selection import train_test_split\n",
+        "\n",
+        "from apache_beam.ml.inference import RunInference\n",
+        "from apache_beam.ml.inference.xgboost_inference import XGBoostModelHandlerNumpy\n",
+        "from apache_beam.options.pipeline_options import PipelineOptions"
+      ],
+      "metadata": {
+        "id": "_O0BN_XqOwp1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "SEED = 999\n",
+        "CLASSIFICATION_MODEL_STATE = '/tmp/classification_model.json'\n",
+        "REGRESSION_MODEL_STATE = '/tmp/regression_model.json'"
+      ],
+      "metadata": {
+        "id": "ue_5a-oaO-Lz"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Load the data from scikit-learn and train XGBoost models\n",
+        "This section demonstrates the following steps:\n",
+        "1. Load the iris and Califorina Housing datasets from scikit-learn and create a classification and regression model.\n",
+        "2. Train the classification and regression model.\n",
+        "3. Save the models in a JSON file using `mode.save_model`. (https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html)\n",
+        "\n",
+        "In this example, you create two models, one to classify Iris flowers and one to predict housing prices in California."
+      ],
+      "metadata": {
+        "id": "74oE5pGgPE0M"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Train the classification model\n",
+        "iris_dataset = load_iris()\n",
+        "x_train_classification, x_test_classification, y_train_classification, y_test_classification = train_test_split(\n",
+        "    iris_dataset['data'], iris_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "booster = xgboost.XGBClassifier(\n",
+        "    n_estimators=2, max_depth=2, learning_rate=1, objective='binary:logistic')\n",
+        "booster.fit(x_train_classification, y_train_classification)\n",
+        "booster.save_model(CLASSIFICATION_MODEL_STATE)\n",
+        "\n",
+        "# Train the regression model\n",
+        "california_dataset = fetch_california_housing()\n",
+        "x_train_regression, x_test_regression, y_train_regression, y_test_regression = train_test_split(\n",
+        "    california_dataset['data'], california_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "model = xgboost.XGBRegressor(\n",
+        "    n_estimators=1000,\n",
+        "    max_depth=8,\n",
+        "    eta=0.1,\n",
+        "    subsample=0.75,\n",
+        "    colsample_bytree=0.8)\n",
+        "model.fit(x_train_regression, y_train_regression)\n",
+        "model.save_model(REGRESSION_MODEL_STATE)\n",
+        "\n",
+        "\n",
+        "# Reshape the test data as XGBoost expects a batch instead of a single element\n",
+        "# More information: https://xgboost.readthedocs.io/en/stable/prediction.html\n",
+        "x_test_classification = x_test.reshape(5, 6, 4)\n",
+        "x_test_regression = x_test_regression.reshape(258, 16, 8)"
+      ],
+      "metadata": {
+        "id": "KVSKt3pFPBnj"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Create a scikit-learn RunInference pipeline\n",

Review Comment:
   ```suggestion
           "### Create an XGBoost RunInference pipeline\n",
   ```



##########
examples/notebooks/beam-ml/run_inference_xgboost.ipynb:
##########
@@ -0,0 +1,321 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "XobBB6Sv8mB3"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Apache Beam RunInference for XGBoost\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "DUGbrRuv89CS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see Machine Learning in the Apache Beam documentation.\n",
+        "\n",
+        "You can choose the appropriate model handler based on your input data type:\n",
+        "\n",
+        "- NumPy model handler\n",
+        "- Pandas DataFrame model handler\n",
+        "- Datatable model handler\n",
+        "- SciPy model handler\n",
+        "\n",
+        "With RunInference, these model handlers manage batching, vectorization, and prediction optimization for your XGBoost pipeline or model.\n",
+        "\n",
+        "This notebook demonstrates the following common RunInference patterns:\n",
+        "\n",
+        "Generate predictions.\n",
+        "Postprocess results after RunInference.\n",
+        "One model to showcase classification of Iris flowers and one regression model to showcase prediction of housing prices"

Review Comment:
   ```suggestion
           "- Generate predictions\n",
           "- Postprocess results after RunInference\n",
           "- One model to showcase classification of Iris flowers\n",
           "- One regression model to showcase prediction of housing prices"
   ```
   
   Nit: I think this reads cleaner with bullet points (and will render correctly with the `\n` problem mentioned later)



##########
examples/notebooks/beam-ml/run_inference_xgboost.ipynb:
##########
@@ -0,0 +1,321 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "XobBB6Sv8mB3"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Apache Beam RunInference for XGBoost\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "DUGbrRuv89CS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see Machine Learning in the Apache Beam documentation.\n",
+        "\n",
+        "You can choose the appropriate model handler based on your input data type:\n",
+        "\n",
+        "- NumPy model handler\n",
+        "- Pandas DataFrame model handler\n",
+        "- Datatable model handler\n",
+        "- SciPy model handler\n",
+        "\n",
+        "With RunInference, these model handlers manage batching, vectorization, and prediction optimization for your XGBoost pipeline or model.\n",
+        "\n",
+        "This notebook demonstrates the following common RunInference patterns:\n",
+        "\n",
+        "Generate predictions.\n",
+        "Postprocess results after RunInference.\n",
+        "One model to showcase classification of Iris flowers and one regression model to showcase prediction of housing prices"
+      ],
+      "metadata": {
+        "id": "6nh2h-sIOAOg"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Before you begin\n",
+        "Complete the following setup steps:\n",
+        "\n",
+        "- Install dependencies for Apache Beam."
+      ],
+      "metadata": {
+        "id": "nRCJBcTUOq1k"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install apache-beam[gcp,dataframe] --quiet"
+      ],
+      "metadata": {
+        "id": "gbmH329jOuj1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import xgboost\n",
+        "import apache_beam as beam\n",
+        "from sklearn.datasets import fetch_california_housing\n",
+        "from sklearn.datasets import load_iris\n",
+        "from sklearn.model_selection import train_test_split\n",
+        "\n",
+        "from apache_beam.ml.inference import RunInference\n",
+        "from apache_beam.ml.inference.xgboost_inference import XGBoostModelHandlerNumpy\n",
+        "from apache_beam.options.pipeline_options import PipelineOptions"
+      ],
+      "metadata": {
+        "id": "_O0BN_XqOwp1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "SEED = 999\n",
+        "CLASSIFICATION_MODEL_STATE = '/tmp/classification_model.json'\n",
+        "REGRESSION_MODEL_STATE = '/tmp/regression_model.json'"
+      ],
+      "metadata": {
+        "id": "ue_5a-oaO-Lz"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Load the data from scikit-learn and train XGBoost models\n",
+        "This section demonstrates the following steps:\n",
+        "1. Load the iris and Califorina Housing datasets from scikit-learn and create a classification and regression model.\n",
+        "2. Train the classification and regression model.\n",
+        "3. Save the models in a JSON file using `mode.save_model`. (https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html)\n",
+        "\n",
+        "In this example, you create two models, one to classify Iris flowers and one to predict housing prices in California."
+      ],
+      "metadata": {
+        "id": "74oE5pGgPE0M"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Train the classification model\n",
+        "iris_dataset = load_iris()\n",
+        "x_train_classification, x_test_classification, y_train_classification, y_test_classification = train_test_split(\n",
+        "    iris_dataset['data'], iris_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "booster = xgboost.XGBClassifier(\n",
+        "    n_estimators=2, max_depth=2, learning_rate=1, objective='binary:logistic')\n",
+        "booster.fit(x_train_classification, y_train_classification)\n",
+        "booster.save_model(CLASSIFICATION_MODEL_STATE)\n",
+        "\n",
+        "# Train the regression model\n",
+        "california_dataset = fetch_california_housing()\n",
+        "x_train_regression, x_test_regression, y_train_regression, y_test_regression = train_test_split(\n",
+        "    california_dataset['data'], california_dataset['target'], test_size=.2, random_state=SEED)\n",
+        "model = xgboost.XGBRegressor(\n",
+        "    n_estimators=1000,\n",
+        "    max_depth=8,\n",
+        "    eta=0.1,\n",
+        "    subsample=0.75,\n",
+        "    colsample_bytree=0.8)\n",
+        "model.fit(x_train_regression, y_train_regression)\n",
+        "model.save_model(REGRESSION_MODEL_STATE)\n",
+        "\n",
+        "\n",
+        "# Reshape the test data as XGBoost expects a batch instead of a single element\n",
+        "# More information: https://xgboost.readthedocs.io/en/stable/prediction.html\n",
+        "x_test_classification = x_test.reshape(5, 6, 4)\n",
+        "x_test_regression = x_test_regression.reshape(258, 16, 8)"
+      ],
+      "metadata": {
+        "id": "KVSKt3pFPBnj"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Create a scikit-learn RunInference pipeline\n",
+        "This section demonstrates how to do the following:\n",
+        "1. Define a XGBoost model handler that accepts an `numpy.ndarray` object as input.\n",
+        "2. Load the data from the datasets.\n",
+        "3. Use the XGBoost trained models and the XGBoost RunInference transform on unkeyed data."
+      ],
+      "metadata": {
+        "id": "ItuxdQoXSNTQ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "xgboost_classification_model_handler = XGBoostModelHandlerNumpy(\n",
+        "    model_class=xgboost.XGBClassifier, model_state=CLASSIFICATION_MODEL_STATE)\n",
+        "\n",
+        "pipeline_options = PipelineOptions().from_dictionary({})\n",
+        "\n",
+        "with beam.Pipeline(options=pipeline_options) as p:\n",
+        "  (\n",
+        "      p\n",
+        "      | \"Load Data\" >> beam.Create(x_test_classification)\n",
+        "      | \"RunInferenceXGBoost\" >>\n",
+        "      RunInference(model_handler=xgboost_classification_model_handler)\n",
+        "      | beam.Map(print))"
+      ],
+      "metadata": {
+        "id": "SBdMq3-CSGqZ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "xgboost_regression_model_handler = XGBoostModelHandlerNumpy(\n",
+        "    model_class=xgboost.XGBRegressor, model_state=REGRESSION_MODEL_STATE)\n",
+        "\n",
+        "pipeline_options = PipelineOptions().from_dictionary({})\n",
+        "\n",
+        "with beam.Pipeline(options=pipeline_options) as p:\n",
+        "  (\n",
+        "      p\n",
+        "      | \"Load Data\" >> beam.Create(x_test_regression)\n",
+        "      | \"RunInferenceSklearn\" >>\n",
+        "      RunInference(model_handler=xgboost_regression_model_handler)\n",
+        "      | beam.Map(print))"
+      ],
+      "metadata": {
+        "id": "IYUXIJt7UIm6"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Use XGBoost RunInference on keyed inputs\n",

Review Comment:
   Could you add a sentence motivating this section? Something like: 
   ```
   It is often useful to associate examples with a key before doing inference so that you can retain metadata about the example (e.g. the original url of a preprocessed image or a non-preprocessed input). RunInference allows you to do this using a `KeyedModelHandler`. This section demonstrates how to do the following with a KeyedModelHandler:
   ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm merged pull request #25904: Add XGBoost example notebook

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm merged PR #25904:
URL: https://github.com/apache/beam/pull/25904


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jaxpr commented on a diff in pull request #25904: Add XGBoost example notebook

Posted by "jaxpr (via GitHub)" <gi...@apache.org>.
jaxpr commented on code in PR #25904:
URL: https://github.com/apache/beam/pull/25904#discussion_r1155892121


##########
examples/notebooks/beam-ml/run_inference_xgboost.ipynb:
##########
@@ -0,0 +1,321 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "cellView": "form",
+        "id": "XobBB6Sv8mB3"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Apache Beam RunInference for XGBoost\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_xgboost.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ],
+      "metadata": {
+        "id": "DUGbrRuv89CS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "This notebook demonstrates the use of the RunInference transform for XGBoost. Apache Beam RunInference has implementations of the ModelHandler class prebuilt for XGBoost. For more information about the RunInference API, see Machine Learning in the Apache Beam documentation.\n",
+        "\n",
+        "You can choose the appropriate model handler based on your input data type:\n",
+        "\n",
+        "- NumPy model handler\n",
+        "- Pandas DataFrame model handler\n",
+        "- Datatable model handler\n",
+        "- SciPy model handler\n",
+        "\n",
+        "With RunInference, these model handlers manage batching, vectorization, and prediction optimization for your XGBoost pipeline or model.\n",
+        "\n",
+        "This notebook demonstrates the following common RunInference patterns:\n",
+        "\n",
+        "Generate predictions.\n",
+        "Postprocess results after RunInference.\n",
+        "One model to showcase classification of Iris flowers and one regression model to showcase prediction of housing prices"

Review Comment:
   I added some simple postprocessing to print the flowers' names and flatten outputted batches.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org