You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/14 21:22:08 UTC

[GitHub] [beam] damccorm commented on a diff in pull request #24125: Editorial review of the ML notebooks.

damccorm commented on code in PR #24125:
URL: https://github.com/apache/beam/pull/24125#discussion_r1022063494


##########
examples/notebooks/beam-ml/run_inference_multi_model.ipynb:
##########
@@ -57,12 +57,12 @@
     {
       "cell_type": "markdown",
       "source": [
-        "A single machine learning  model may not always be the perfect solution for a give task. Oftentimes, machine learning model tasks involve aggregating mutliple models together to produce one optimal predictive model and boost performance. \n",
+        "A single machine learning model might not be the right solution for your task. Often, machine learning model tasks involve aggregating mutliple models together to produce one optimal predictive model and to boost performance. \n",
         " \n",
         "\n",
-        "In this notebook, we will shows you an example on how to implement a cascade model in Beam using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The RunInference API enables you to run your Beam transfroms as part of your pipeline for optimal machine learning inference in beam.     \n",
+        "This notebook shows how to implement a cascade model in Apache Beam using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The RunInference API enables you to run your Beam transfroms as part of your pipeline for optimal machine learning inference.\n",

Review Comment:
   ```suggestion
           "This notebook shows how to implement a cascade model in Apache Beam using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The RunInference API enables you to run your Beam transforms as part of your pipeline for optimal machine learning inference.\n",
   ```



##########
examples/notebooks/beam-ml/run_custom_inference.ipynb:
##########
@@ -37,21 +36,15 @@
         "id": "b6f8f3af-744e-4eaa-8a30-6d03e8e4d21e"
       },
       "source": [
-        "# Bring your own Machine Learning (ML) model to Beam RunInference\n",
+        "# Bring your own machine learning (ML) model to Beam RunInference\n",
         "\n",
-        "<button>\n",
-        "  <a href=\"https://beam.apache.org/documentation/sdks/python-machine-learning/\">\n",
-        "    <img src=\"https://beam.apache.org/images/favicon.ico\" alt=\"Open the docs\" height=\"16\"/>\n",
-        "    Beam RunInference\n",
-        "  </a>\n",
-        "</button>\n",
-        "\n",
-        "In this notebook, we walk through a simple example to show how to build your own ML model handler using\n",
+        "This notebook demonstrates how to build your own ML model handler using\n",
         "[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler).\n",

Review Comment:
   This sentence is kinda odd - maybe we could do something like "This notebook demonstrates how to run inference on your custom framework using the [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) class"



##########
examples/notebooks/beam-ml/run_inference_tensorflow.ipynb:
##########
@@ -190,9 +189,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Create and Test a Simple Model\n",
+        "## Create and test a simple model\n",
         "\n",
-        "This creates a model that predicts the 5 times table."
+        "This steps creates a model that predicts the 5 times table."

Review Comment:
   ```suggestion
           "This step creates a model that predicts the 5 times table."
   ```



##########
examples/notebooks/beam-ml/run_inference_sklearn.ipynb:
##########
@@ -52,26 +51,18 @@
       },
       "source": [
         "# Apache Beam RunInference for scikit-learn\n",
-        "\n",
-        "<button>\n",
-        "  <a href=\"https://beam.apache.org/documentation/sdks/python-machine-learning/\">\n",
-        "    <img src=\"https://beam.apache.org/images/favicon.ico\" alt=\"Open the docs\" height=\"16\"/>\n",
-        "    Beam RunInference\n",
-        "  </a>\n",
-        "</button>\n",
-        "\n",
-        "In this notebook, we walk through the use of the RunInference transform for [scikit-learn](https://scikit-learn.org/) also called sklearn.\n",
-        "Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) has implementations of [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) prebuilt for scikit-learn.\n",
+        "This notebook demonstrates the use of the RunInference transform for [scikit-learn](https://scikit-learn.org/) also called sklearn.\n",
+        "Apache Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) has implementations of [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) prebuilt for scikit-learn. For more information about the RunInference API, see [Machine Learning](https://beam.apache.org/documentation/sdks/python-machine-learning) in the Apache Beam documentation.\n",

Review Comment:
   ```suggestion
           "Apache Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) has implementations of the [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) class prebuilt for scikit-learn. For more information about the RunInference API, see [Machine Learning](https://beam.apache.org/documentation/sdks/python-machine-learning) in the Apache Beam documentation.\n",
   ```



##########
examples/notebooks/beam-ml/run_inference_tensorflow.ipynb:
##########
@@ -248,7 +247,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "### Test the Model\n"
+        "### Test the model\n",
+        "\n",
+        "This steps tests the model that you created."

Review Comment:
   ```suggestion
           "This step tests the model that you created."
   ```



##########
examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb:
##########
@@ -946,9 +977,9 @@
       "source": [
         "input_strings_file = 'input_strings.tfrecord'\n",
         "\n",
-        "# Preprocess the input as RunInference is expecting a serialized tf.example as an input\n",
-        "# Write the processed input to a file \n",
-        "# One can also do it as a pipeline step by using beam.Map() \n",
+        "# Because RunInferene is expecting a serialized tf.example as an input, preprocess the input.\n",

Review Comment:
   ```suggestion
           "# Because RunInference is expecting a serialized tf.example as an input, preprocess the input.\n",
   ```



##########
examples/notebooks/beam-ml/custom_remote_inference.ipynb:
##########
@@ -34,13 +34,17 @@
         "id": "0UGzzndTBPWQ"
       },
       "source": [
-        "# Remote inference in Beam\n",
+        "# Remote inference in Apache Beam\n",
         "\n",
-        "The prefered way of running inference in Beam is by using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). The RunInference API enables you to run your models as part of your pipeline in a way that is optimized for machine learning inference. It supports features such as batching, so that you do not need to take care of it yourself. For more info on the RunInference API you can check out the [RunInference notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb), which demonstrates how you can implement model inference in pytorch, scikit-learn and tensorflow.\n",
+        "The prefered way to run inference in Apache Beam is by using the [RunInference API](https://beam.apache.org/documentation/sdks/python-machine-learning/). \n",
+        "The RunInference API enables you to run your models as part of your pipeline in a way that is optimized for machine learning inference. \n",
+        "To reduce the number of steps that you need to take, RunInference supports features like batching. For more infomation about the RunInference API, review the [RunInference notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb), \n",
+        "which demonstrates how to implement model inference in PyTorch, scikit-learn, and TensorFlow.\n",
         "\n",
-        "As of now, RunInference API doesn't support making remote inference calls (e.g. Natural Language API, Cloud Vision API and others). Therefore, in order to use these remote APIs with Beam, one needs to write custom inference call. \n",
+        "Currently, the RunInference API doesn't support making remote inference calls using the Natural Language API, Cloud Vision API, and so on. \n",
+        "Therefore, to use these remote APIs with Apache Beam, you need to write custom inference calls.\n",
         "\n",
-        "This notebook shows how you can implement such a custom inference call in Beam. We are using Cloud Vision API for demonstration. \n"
+        "This notebook shows how to implement a custom inference call in Apache Beam. This example uses Cloud Vision API."

Review Comment:
   ```suggestion
           "This notebook shows how to implement a custom inference call in Apache Beam. This example uses the Google Cloud Vision API."
   ```
   
   I actually think not having `the` is valid based on https://cloud.google.com/vision, but we prefix with `the` elsewhere and it feels more natural to me. Consistency is the bigger thing though IMO.
   
   Same thing with `Google` - we use this elsewhere and introducing it here probably is less confusing



##########
examples/notebooks/beam-ml/run_inference_multi_model.ipynb:
##########
@@ -164,7 +164,9 @@
     {
       "cell_type": "markdown",
       "source": [
-        "The RunInference library is available in Apache Beam version **2.40** or later. "
+        "This section shows how to demonstrate the dependencies for this example.\n",

Review Comment:
   ```suggestion
           "This section shows how to install the dependencies for this example.\n",
   ```



##########
examples/notebooks/beam-ml/custom_remote_inference.ipynb:
##########
@@ -279,15 +291,19 @@
       "source": [
         "### Batching\n",
         "\n",
-        "Before we can chain all the different steps together in a pipeline, there is one more thing we need to understand: batching. When running inference with your model (both in Beam itself or in an external API), you can batch your input together to allow for more efficient execution of your model. When using a custom DoFn, you need to take care of the batching yourself, in contrast with the RunInference API which takes care of this for you.\n",
+        "Before we can chain together the pipeline steps, we need to understand batching.\n",
+        "When running inference with your model, either in Apache Beam or in an external API, you can batch your input to increase the efficiency of the model execution.\n",
+        "When using a custom DoFn, as in this example, you need to manage the batching.\n",
         "\n",
-        "In order to achieve this in our pipeline: we will introduce one more step in our pipeline, a `BatchElements` transform that will group elements together to form a batch of the desired size.\n",
+        "To manage the batching in this pipeline, include a `BatchElements` transform to group elements together and form a batch of the desired size.\n",
         "\n",
-        "⚠️ If you have a streaming pipeline, you may considering using [GroupIntoBatches](https://beam.apache.org/documentation/transforms/python/aggregation/groupintobatches/) as `BatchElements` doesn't batch things across bundles. `GroupIntoBatches` requires choosing a key within which things are batched.\n",
+        "**Caution:**\n",

Review Comment:
   I don't think this really merits a caution - it is an aside/alternate approach, but nothing bad will happen if you use BatchElements (it will just produce small batch sizes)



##########
examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb:
##########
@@ -1001,7 +1034,7 @@
         "saved_model_spec = model_spec_pb2.SavedModelSpec(model_path=model_dir)\n",
         "inference_spec_type = model_spec_pb2.InferenceSpecType(saved_model_spec=saved_model_spec)\n",
         "\n",
-        "#A Beam IO that reads a file of serialized tf.Examples\n",
+        "#A Beam I/O that reads a file of serialized tf.Examples\n",

Review Comment:
   ```suggestion
           "# A Beam I/O that reads a file of serialized tf.Examples\n",
   ```



##########
examples/notebooks/beam-ml/run_inference_multi_model.ipynb:
##########
@@ -103,17 +103,17 @@
     {
       "cell_type": "markdown",
       "source": [
-        "The steps needed to build this pipeline can be summarized as follows:\n",
+        "The steps to build this pipeline are as follows:\n",
         "* Read the images.\n",
         "* Preprocess the images for caption generation for inference with the BLIP model.\n",
-        "* Inference with BLIP to generate a list of caption candidates . \n",
+        "* Inference with BLIP to generate a list of caption candidates.\n",
         "* Aggregate the generated captions with their source image.\n",
         "* Preprocess the aggregated image-caption pair to rank them with CLIP.\n",
-        "* Inference wih CLIP to generated the caption ranking. \n",
-        "* Print the image names and the captions sorted according to their ranking\n",
+        "* Inference with CLIP to generated the caption ranking. \n",

Review Comment:
   ```suggestion
           "* Inference with CLIP to generate the caption ranking. \n",
   ```



##########
examples/notebooks/beam-ml/run_inference_tensorflow.ipynb:
##########
@@ -46,33 +49,29 @@
       "cell_type": "markdown",
       "source": [
         "# Apache Beam RunInference with TensorFlow\n",
-        "\n",
-        "<button>\n",
-        "  <a href=\"https://beam.apache.org/documentation/sdks/python-machine-learning/\">\n",
-        "    <img src=\"https://beam.apache.org/images/favicon.ico\" alt=\"Open the docs\" height=\"16\"/>\n",
-        "    Beam RunInference\n",
-        "  </a>\n",
-        "</button>\n",
+        "This notebook demonstrates the use of the RunInference transform for [TensorFlow](https://www.tensorflow.org/).\n",
+        "Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) accepts ModelHandler generated from [`tfx-bsl`](https://github.com/tensorflow/tfx-bsl) via CreateModelHandler.\n",

Review Comment:
   ```suggestion
           "Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) accepts a ModelHandler generated from [`tfx-bsl`](https://github.com/tensorflow/tfx-bsl) via CreateModelHandler.\n",
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org