You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/11 20:17:07 UTC

[GitHub] [beam] rszper opened a new pull request, #24125: Editorial review of the ML notebooks.

rszper opened a new pull request, #24125:
URL: https://github.com/apache/beam/pull/24125

   In addition to doing a copy edit of the notebooks, I did the following:
   
   - Removed the Beam RunInference button that some have at the top and replaced it with an inline link. The button isn't rendering correctly when the notebooks are imported to Devsite, and the it's not clear where the link in the button takes you.
   - Tried to fix the formatting of the licenses to match the multi model format, which renders correctly on devsite.
   - Fixed the headings in some notebooks so that they render correctly.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #24125: Editorial review of the ML notebooks.

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #24125:
URL: https://github.com/apache/beam/pull/24125#discussion_r1022153365


##########
examples/notebooks/beam-ml/README.md:
##########
@@ -18,27 +18,27 @@
 -->
 # ML Sample Notebooks
 
-As of Beam 2.40 users now have access to a
+Starting with the Apache Beam SDK version 2.40, users have access to a
 [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 transform.
 
-This allows inferences or predictions of on data for
-popular ML frameworks like TensorFlow, PyTorch and
+This transform allows you to make inferences or predictions on data for
+popular machine learning frameworks like TensorFlow, PyTorch, and

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm merged pull request #24125: Editorial review of the ML notebooks.

Posted by GitBox <gi...@apache.org>.

damccorm merged PR #24125:
URL: https://github.com/apache/beam/pull/24125


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #24125: Editorial review of the ML notebooks.

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #24125:
URL: https://github.com/apache/beam/pull/24125#discussion_r1022141133


##########
examples/notebooks/beam-ml/custom_remote_inference.ipynb:
##########
@@ -279,15 +291,19 @@
       "source": [
         "### Batching\n",
         "\n",
-        "Before we can chain all the different steps together in a pipeline, there is one more thing we need to understand: batching. When running inference with your model (both in Beam itself or in an external API), you can batch your input together to allow for more efficient execution of your model. When using a custom DoFn, you need to take care of the batching yourself, in contrast with the RunInference API which takes care of this for you.\n",
+        "Before we can chain together the pipeline steps, we need to understand batching.\n",
+        "When running inference with your model, either in Apache Beam or in an external API, you can batch your input to increase the efficiency of the model execution.\n",
+        "When using a custom DoFn, as in this example, you need to manage the batching.\n",
         "\n",
-        "In order to achieve this in our pipeline: we will introduce one more step in our pipeline, a `BatchElements` transform that will group elements together to form a batch of the desired size.\n",
+        "To manage the batching in this pipeline, include a `BatchElements` transform to group elements together and form a batch of the desired size.\n",
         "\n",
-        "⚠️ If you have a streaming pipeline, you may considering using [GroupIntoBatches](https://beam.apache.org/documentation/transforms/python/aggregation/groupintobatches/) as `BatchElements` doesn't batch things across bundles. `GroupIntoBatches` requires choosing a key within which things are batched.\n",
+        "**Caution:**\n",

Review Comment:
   Happily removing the caution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #24125: Editorial review of the ML notebooks.

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #24125:
URL: https://github.com/apache/beam/pull/24125#discussion_r1022154406


##########
examples/notebooks/beam-ml/README.md:
##########
@@ -18,27 +18,27 @@
 -->
 # ML Sample Notebooks
 
-As of Beam 2.40 users now have access to a
+Starting with the Apache Beam SDK version 2.40, users have access to a
 [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 transform.
 
-This allows inferences or predictions of on data for
-popular ML frameworks like TensorFlow, PyTorch and
+This transform allows you to make inferences or predictions on data for
+popular machine learning frameworks like TensorFlow, PyTorch, and
 scikit-learn.
 
 ## Using The Notebooks
 
-These notebooks illustrate usages of Beam's RunInference, as well as different
-usages of implementations of [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler).
-Beam comes with various implementations of ModelHandler.
+These notebooks illustrate ways to use Apache Beam's RunInference transforms, as well as different
+use cases for [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) implementations.
+Beam comes with multiple ModelHandler implementations.

Review Comment:
   Added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] github-actions[bot] commented on pull request #24125: Editorial review of the ML notebooks.

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on PR #24125:
URL: https://github.com/apache/beam/pull/24125#issuecomment-1312160805

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on pull request #24125: Editorial review of the ML notebooks.

Posted by GitBox <gi...@apache.org>.

rszper commented on PR #24125:
URL: https://github.com/apache/beam/pull/24125#issuecomment-1312160117

   R: @dusan-rychnovsky 
   R: @rezarokni 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on pull request #24125: Editorial review of the ML notebooks.

Posted by GitBox <gi...@apache.org>.

rszper commented on PR #24125:
URL: https://github.com/apache/beam/pull/24125#issuecomment-1312160437

   R: @ryanthompson591 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #24125: Editorial review of the ML notebooks.

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #24125:
URL: https://github.com/apache/beam/pull/24125#discussion_r1022153551


##########
examples/notebooks/beam-ml/run_custom_inference.ipynb:
##########
@@ -37,21 +36,15 @@
         "id": "b6f8f3af-744e-4eaa-8a30-6d03e8e4d21e"
       },
       "source": [
-        "# Bring your own Machine Learning (ML) model to Beam RunInference\n",
+        "# Bring your own machine learning (ML) model to Beam RunInference\n",
         "\n",
-        "<button>\n",
-        "  <a href=\"https://beam.apache.org/documentation/sdks/python-machine-learning/\">\n",
-        "    <img src=\"https://beam.apache.org/images/favicon.ico\" alt=\"Open the docs\" height=\"16\"/>\n",
-        "    Beam RunInference\n",
-        "  </a>\n",
-        "</button>\n",
-        "\n",
-        "In this notebook, we walk through a simple example to show how to build your own ML model handler using\n",
+        "This notebook demonstrates how to build your own ML model handler using\n",
         "[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler).\n",

Review Comment:
   Updated. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #24125: Editorial review of the ML notebooks.

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #24125:
URL: https://github.com/apache/beam/pull/24125#discussion_r1022130309


##########
examples/notebooks/beam-ml/custom_remote_inference.ipynb:
##########
@@ -34,13 +34,17 @@
         "id": "0UGzzndTBPWQ"
       },
       "source": [
-        "# Remote inference in Beam\n",
+        "# Remote inference in Apache Beam\n",
         "\n",
-        "The prefered way of running inference in Beam is by using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The RunInference API enables you to run your models as part of your pipeline in a way that is optimized for machine learning inference. It supports features such as batching, so that you do not need to take care of it yourself. For more info on the RunInference API you can check out the [RunInference notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb), which demonstrates how you can implement model inference in pytorch, scikit-learn and tensorflow.\n",
+        "The prefered way to run inference in Apache Beam is by using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). \n",
+        "The RunInference API enables you to run your models as part of your pipeline in a way that is optimized for machine learning inference. \n",
+        "To reduce the number of steps that you need to take, RunInference supports features like batching. For more infomation about the RunInference API, review the [RunInference notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb), \n",

Review Comment:
   I also find this notebook confusing. I agree about the link. Will fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] damccorm commented on a diff in pull request #24125: Editorial review of the ML notebooks.

Posted by GitBox <gi...@apache.org>.

damccorm commented on code in PR #24125:
URL: https://github.com/apache/beam/pull/24125#discussion_r1022063494


##########
examples/notebooks/beam-ml/run_inference_multi_model.ipynb:
##########
@@ -57,12 +57,12 @@
     {
       "cell_type": "markdown",
       "source": [
-        "A single machine learning  model may not always be the perfect solution for a give task. Oftentimes, machine learning model tasks involve aggregating mutliple models together to produce one optimal predictive model and boost performance. \n",
+        "A single machine learning model might not be the right solution for your task. Often, machine learning model tasks involve aggregating mutliple models together to produce one optimal predictive model and to boost performance. \n",
         " \n",
         "\n",
-        "In this notebook, we will shows you an example on how to implement a cascade model in Beam using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The RunInference API enables you to run your Beam transfroms as part of your pipeline for optimal machine learning inference in beam.     \n",
+        "This notebook shows how to implement a cascade model in Apache Beam using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The RunInference API enables you to run your Beam transfroms as part of your pipeline for optimal machine learning inference.\n",

Review Comment:
   ```suggestion
           "This notebook shows how to implement a cascade model in Apache Beam using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The RunInference API enables you to run your Beam transforms as part of your pipeline for optimal machine learning inference.\n",
   ```



##########
examples/notebooks/beam-ml/run_custom_inference.ipynb:
##########
@@ -37,21 +36,15 @@
         "id": "b6f8f3af-744e-4eaa-8a30-6d03e8e4d21e"
       },
       "source": [
-        "# Bring your own Machine Learning (ML) model to Beam RunInference\n",
+        "# Bring your own machine learning (ML) model to Beam RunInference\n",
         "\n",
-        "<button>\n",
-        "  <a href=\"https://beam.apache.org/documentation/sdks/python-machine-learning/\">\n",
-        "    <img src=\"https://beam.apache.org/images/favicon.ico\" alt=\"Open the docs\" height=\"16\"/>\n",
-        "    Beam RunInference\n",
-        "  </a>\n",
-        "</button>\n",
-        "\n",
-        "In this notebook, we walk through a simple example to show how to build your own ML model handler using\n",
+        "This notebook demonstrates how to build your own ML model handler using\n",
         "[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler).\n",

Review Comment:
   This sentence is kinda odd - maybe we could do something like "This notebook demonstrates how to run inference on your custom framework using the [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) class"



##########
examples/notebooks/beam-ml/run_inference_tensorflow.ipynb:
##########
@@ -190,9 +189,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Create and Test a Simple Model\n",
+        "## Create and test a simple model\n",
         "\n",
-        "This creates a model that predicts the 5 times table."
+        "This steps creates a model that predicts the 5 times table."

Review Comment:
   ```suggestion
           "This step creates a model that predicts the 5 times table."
   ```



##########
examples/notebooks/beam-ml/run_inference_sklearn.ipynb:
##########
@@ -52,26 +51,18 @@
       },
       "source": [
         "# Apache Beam RunInference for scikit-learn\n",
-        "\n",
-        "<button>\n",
-        "  <a href=\"https://beam.apache.org/documentation/sdks/python-machine-learning/\">\n",
-        "    <img src=\"https://beam.apache.org/images/favicon.ico\" alt=\"Open the docs\" height=\"16\"/>\n",
-        "    Beam RunInference\n",
-        "  </a>\n",
-        "</button>\n",
-        "\n",
-        "In this notebook, we walk through the use of the RunInference transform for [scikit-learn](https://scikit-learn.org/) also called sklearn.\n",
-        "Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) has implementations of [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) prebuilt for scikit-learn.\n",
+        "This notebook demonstrates the use of the RunInference transform for [scikit-learn](https://scikit-learn.org/) also called sklearn.\n",
+        "Apache Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) has implementations of [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) prebuilt for scikit-learn. For more information about the RunInference API, see [Machine Learning](https://beam.apache.org/documentation/sdks/python-machine-learning) in the Apache Beam documentation.\n",

Review Comment:
   ```suggestion
           "Apache Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) has implementations of the [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) class prebuilt for scikit-learn. For more information about the RunInference API, see [Machine Learning](https://beam.apache.org/documentation/sdks/python-machine-learning) in the Apache Beam documentation.\n",
   ```



##########
examples/notebooks/beam-ml/run_inference_tensorflow.ipynb:
##########
@@ -248,7 +247,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "### Test the Model\n"
+        "### Test the model\n",
+        "\n",
+        "This steps tests the model that you created."

Review Comment:
   ```suggestion
           "This step tests the model that you created."
   ```



##########
examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb:
##########
@@ -946,9 +977,9 @@
       "source": [
         "input_strings_file = 'input_strings.tfrecord'\n",
         "\n",
-        "# Preprocess the input as RunInference is expecting a serialized tf.example as an input\n",
-        "# Write the processed input to a file \n",
-        "# One can also do it as a pipeline step by using beam.Map() \n",
+        "# Because RunInferene is expecting a serialized tf.example as an input, preprocess the input.\n",

Review Comment:
   ```suggestion
           "# Because RunInference is expecting a serialized tf.example as an input, preprocess the input.\n",
   ```



##########
examples/notebooks/beam-ml/custom_remote_inference.ipynb:
##########
@@ -34,13 +34,17 @@
         "id": "0UGzzndTBPWQ"
       },
       "source": [
-        "# Remote inference in Beam\n",
+        "# Remote inference in Apache Beam\n",
         "\n",
-        "The prefered way of running inference in Beam is by using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The RunInference API enables you to run your models as part of your pipeline in a way that is optimized for machine learning inference. It supports features such as batching, so that you do not need to take care of it yourself. For more info on the RunInference API you can check out the [RunInference notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb), which demonstrates how you can implement model inference in pytorch, scikit-learn and tensorflow.\n",
+        "The prefered way to run inference in Apache Beam is by using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). \n",
+        "The RunInference API enables you to run your models as part of your pipeline in a way that is optimized for machine learning inference. \n",
+        "To reduce the number of steps that you need to take, RunInference supports features like batching. For more infomation about the RunInference API, review the [RunInference notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb), \n",
+        "which demonstrates how to implement model inference in PyTorch, scikit-learn, and TensorFlow.\n",
         "\n",
-        "As of now, RunInference API doesn't support making remote inference calls (e.g. Natural Language API, Cloud Vision API and others). Therefore, in order to use these remote APIs with Beam, one needs to write custom inference call. \n",
+        "Currently, the RunInference API doesn't support making remote inference calls using the Natural Language API, Cloud Vision API, and so on. \n",
+        "Therefore, to use these remote APIs with Apache Beam, you need to write custom inference calls.\n",
         "\n",
-        "This notebook shows how you can implement such a custom inference call in Beam. We are using Cloud Vision API for demonstration. \n"
+        "This notebook shows how to implement a custom inference call in Apache Beam. This example uses Cloud Vision API."

Review Comment:
   ```suggestion
           "This notebook shows how to implement a custom inference call in Apache Beam. This example uses the Google Cloud Vision API."
   ```
   
   I actually think not having `the` is valid based on https://cloud.google.com/vision, but we prefix with `the` elsewhere and it feels more natural to me. Consistency is the bigger thing though IMO.
   
   Same thing with `Google` - we use this elsewhere and introducing it here probably is less confusing



##########
examples/notebooks/beam-ml/run_inference_multi_model.ipynb:
##########
@@ -164,7 +164,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "The RunInference library is available in Apache Beam version **2.40** or later. "
+        "This section shows how to demonstrate the dependencies for this example.\n",

Review Comment:
   ```suggestion
           "This section shows how to install the dependencies for this example.\n",
   ```



##########
examples/notebooks/beam-ml/custom_remote_inference.ipynb:
##########
@@ -279,15 +291,19 @@
       "source": [
         "### Batching\n",
         "\n",
-        "Before we can chain all the different steps together in a pipeline, there is one more thing we need to understand: batching. When running inference with your model (both in Beam itself or in an external API), you can batch your input together to allow for more efficient execution of your model. When using a custom DoFn, you need to take care of the batching yourself, in contrast with the RunInference API which takes care of this for you.\n",
+        "Before we can chain together the pipeline steps, we need to understand batching.\n",
+        "When running inference with your model, either in Apache Beam or in an external API, you can batch your input to increase the efficiency of the model execution.\n",
+        "When using a custom DoFn, as in this example, you need to manage the batching.\n",
         "\n",
-        "In order to achieve this in our pipeline: we will introduce one more step in our pipeline, a `BatchElements` transform that will group elements together to form a batch of the desired size.\n",
+        "To manage the batching in this pipeline, include a `BatchElements` transform to group elements together and form a batch of the desired size.\n",
         "\n",
-        "⚠️ If you have a streaming pipeline, you may considering using [GroupIntoBatches](https://beam.apache.org/documentation/transforms/python/aggregation/groupintobatches/) as `BatchElements` doesn't batch things across bundles. `GroupIntoBatches` requires choosing a key within which things are batched.\n",
+        "**Caution:**\n",

Review Comment:
   I don't think this really merits a caution - it is an aside/alternate approach, but nothing bad will happen if you use BatchElements (it will just produce small batch sizes)



##########
examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb:
##########
@@ -1001,7 +1034,7 @@
         "saved_model_spec = model_spec_pb2.SavedModelSpec(model_path=model_dir)\n",
         "inference_spec_type = model_spec_pb2.InferenceSpecType(saved_model_spec=saved_model_spec)\n",
         "\n",
-        "#A Beam IO that reads a file of serialized tf.Examples\n",
+        "#A Beam I/O that reads a file of serialized tf.Examples\n",

Review Comment:
   ```suggestion
           "# A Beam I/O that reads a file of serialized tf.Examples\n",
   ```



##########
examples/notebooks/beam-ml/run_inference_multi_model.ipynb:
##########
@@ -103,17 +103,17 @@
     {
       "cell_type": "markdown",
       "source": [
-        "The steps needed to build this pipeline can be summarized as follows:\n",
+        "The steps to build this pipeline are as follows:\n",
         "* Read the images.\n",
         "* Preprocess the images for caption generation for inference with the BLIP model.\n",
-        "* Inference with BLIP to generate a list of caption candidates . \n",
+        "* Inference with BLIP to generate a list of caption candidates.\n",
         "* Aggregate the generated captions with their source image.\n",
         "* Preprocess the aggregated image-caption pair to rank them with CLIP.\n",
-        "* Inference wih CLIP to generated the caption ranking. \n",
-        "* Print the image names and the captions sorted according to their ranking\n",
+        "* Inference with CLIP to generated the caption ranking. \n",

Review Comment:
   ```suggestion
           "* Inference with CLIP to generate the caption ranking. \n",
   ```



##########
examples/notebooks/beam-ml/run_inference_tensorflow.ipynb:
##########
@@ -46,33 +49,29 @@
       "cell_type": "markdown",
       "source": [
         "# Apache Beam RunInference with TensorFlow\n",
-        "\n",
-        "<button>\n",
-        "  <a href=\"https://beam.apache.org/documentation/sdks/python-machine-learning/\">\n",
-        "    <img src=\"https://beam.apache.org/images/favicon.ico\" alt=\"Open the docs\" height=\"16\"/>\n",
-        "    Beam RunInference\n",
-        "  </a>\n",
-        "</button>\n",
+        "This notebook demonstrates the use of the RunInference transform for [TensorFlow](https://www.tensorflow.org/).\n",
+        "Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) accepts ModelHandler generated from [`tfx-bsl`](https://github.com/tensorflow/tfx-bsl) via CreateModelHandler.\n",

Review Comment:
   ```suggestion
           "Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) accepts a ModelHandler generated from [`tfx-bsl`](https://github.com/tensorflow/tfx-bsl) via CreateModelHandler.\n",
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] ryanthompson591 commented on a diff in pull request #24125: Editorial review of the ML notebooks.

Posted by GitBox <gi...@apache.org>.

ryanthompson591 commented on code in PR #24125:
URL: https://github.com/apache/beam/pull/24125#discussion_r1021997227


##########
examples/notebooks/beam-ml/custom_remote_inference.ipynb:
##########
@@ -34,13 +34,17 @@
         "id": "0UGzzndTBPWQ"
       },
       "source": [
-        "# Remote inference in Beam\n",
+        "# Remote inference in Apache Beam\n",
         "\n",
-        "The prefered way of running inference in Beam is by using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The RunInference API enables you to run your models as part of your pipeline in a way that is optimized for machine learning inference. It supports features such as batching, so that you do not need to take care of it yourself. For more info on the RunInference API you can check out the [RunInference notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb), which demonstrates how you can implement model inference in pytorch, scikit-learn and tensorflow.\n",
+        "The prefered way to run inference in Apache Beam is by using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). \n",
+        "The RunInference API enables you to run your models as part of your pipeline in a way that is optimized for machine learning inference. \n",
+        "To reduce the number of steps that you need to take, RunInference supports features like batching. For more infomation about the RunInference API, review the [RunInference notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb), \n",

Review Comment:
   I didn't know about this notebook, since at one point we decided to break up the notebooks into a few notebooks (one per framework).  I suppose we recombined them again later.
   
   That aside, instead of pointing to the notebook we should point to the API documentation if anything here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rezarokni commented on a diff in pull request #24125: Editorial review of the ML notebooks.

Posted by GitBox <gi...@apache.org>.

rezarokni commented on code in PR #24125:
URL: https://github.com/apache/beam/pull/24125#discussion_r1021940461


##########
examples/notebooks/beam-ml/README.md:
##########
@@ -18,27 +18,27 @@
 -->
 # ML Sample Notebooks
 
-As of Beam 2.40 users now have access to a
+Starting with the Apache Beam SDK version 2.40, users have access to a
 [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 transform.
 
-This allows inferences or predictions of on data for
-popular ML frameworks like TensorFlow, PyTorch and
+This transform allows you to make inferences or predictions on data for
+popular machine learning frameworks like TensorFlow, PyTorch, and

Review Comment:
   From
   predictions on data for popular machine learning frameworks 
   To 
   predictions and inference on data with ML models. The model handler abstracts the user from the configuration needed for specific frameworks, for example Tensorflow, PyTorch and others. A full list available here...



##########
examples/notebooks/beam-ml/README.md:
##########
@@ -18,27 +18,27 @@
 -->
 # ML Sample Notebooks
 
-As of Beam 2.40 users now have access to a
+Starting with the Apache Beam SDK version 2.40, users have access to a
 [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 transform.
 
-This allows inferences or predictions of on data for
-popular ML frameworks like TensorFlow, PyTorch and
+This transform allows you to make inferences or predictions on data for
+popular machine learning frameworks like TensorFlow, PyTorch, and
 scikit-learn.
 
 ## Using The Notebooks
 
-These notebooks illustrate usages of Beam's RunInference, as well as different
-usages of implementations of [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler).
-Beam comes with various implementations of ModelHandler.
+These notebooks illustrate ways to use Apache Beam's RunInference transforms, as well as different
+use cases for [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) implementations.
+Beam comes with multiple ModelHandler implementations.

Review Comment:
   link



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org