You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/27 21:05:54 UTC

[GitHub] [beam] rszper opened a new pull request, #22069: Reviewing the RunInference ReadMe file for clarity.

rszper opened a new pull request, #22069:
URL: https://github.com/apache/beam/pull/22069

   Updating the RunInference ReadMe
   
   - Typo fixes
   - Clarifying requirements for each section
   R: @yeandy 
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r907886973


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -37,17 +37,18 @@ The RunInference API supports the PyTorch framework. To use PyTorch locally, fir
 pip install torch==1.11.0
 ```
 
-If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you may also need to install `torchvision`.
+If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you might also need to install `torchvision`.

Review Comment:
   I was imagining that users could adapt these examples, which is why I worded it as "may". But for these these particular examples, they should be a "will". 
   
   @rszper Should we be phrasing this with the assumption that users will use these files "as is"? If so, then we will want to say that these dependencies are a must. Maybe we could add (unless it's implied?) that if they adapt or modify the file, then the installation requirements will change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r918438071


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -30,57 +30,66 @@ because the `apache_beam.examples.inference` module was added in that release.
 pip install apache-beam==2.40.0
 ```
 
+**Note:** You cannot batch elements of different sizes, because [`torch.stack()` expects tensors of the same length](https://github.com/pytorch/nestedtensor). Either elements need to be a fixed size, or you need to disable batching. To disable batching, set the maximum batch size to one: `max_batch_size=1`.
+
 ### PyTorch dependencies
 
+The following installation requirements are for the files used in these examples.
+
 The RunInference API supports the PyTorch framework. To use PyTorch locally, first install `torch`.
 ```
 pip install torch==1.11.0

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r918286143


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -30,57 +30,66 @@ because the `apache_beam.examples.inference` module was added in that release.
 pip install apache-beam==2.40.0
 ```
 
+**Note:** You cannot batch elements of different sizes, because [`torch.stack()` expects tensors of the same length](https://github.com/pytorch/nestedtensor). Either elements need to be a fixed size, or you need to disable batching. To disable batching, set the maximum batch size to one: `max_batch_size=1`.
+
 ### PyTorch dependencies
 
+The following installation requirements are for the files used in these examples.
+
 The RunInference API supports the PyTorch framework. To use PyTorch locally, first install `torch`.
 ```
 pip install torch==1.11.0

Review Comment:
   Please bump this version down to 1.10.0



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -30,57 +30,66 @@ because the `apache_beam.examples.inference` module was added in that release.
 pip install apache-beam==2.40.0
 ```
 
+**Note:** You cannot batch elements of different sizes, because [`torch.stack()` expects tensors of the same length](https://github.com/pytorch/nestedtensor). Either elements need to be a fixed size, or you need to disable batching. To disable batching, set the maximum batch size to one: `max_batch_size=1`.
+
 ### PyTorch dependencies
 
+The following installation requirements are for the files used in these examples.
+
 The RunInference API supports the PyTorch framework. To use PyTorch locally, first install `torch`.
 ```
 pip install torch==1.11.0

Review Comment:
   @rszper Please bump this version down to 1.10.0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

tvalentyn commented on PR #22069:
URL: https://github.com/apache/beam/pull/22069#issuecomment-1171287754

   Just checking - is this ready for merge or some review feedback is not yet addressed? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] asf-ci commented on pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

asf-ci commented on PR #22069:
URL: https://github.com/apache/beam/pull/22069#issuecomment-1167898235

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r907885286


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -60,27 +61,28 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` (see below) have absolute paths.
+One popular dataset is from [ImageNet](https://www.image-net.org/). Follow their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. Paths can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+- **Required**: Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
 ```
 import torch
 from torchvision.models.detection import mobilenet_v2
 model = mobilenet_v2(pretrained=True)
 torch.save(model.state_dict(), 'mobilenet_v2.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+- **Required**: A path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `mobilenet_v2` model.

Review Comment:
   I broke these into two steps and changed the order so that users would see them in the order they would need to complete the steps. Presumably they download the model and then create the parameters. In general in documentation, it's a best practice for each step to only be one action.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r907884732


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -60,27 +61,28 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` (see below) have absolute paths.
+One popular dataset is from [ImageNet](https://www.image-net.org/). Follow their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. Paths can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+- **Required**: Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
 ```
 import torch
 from torchvision.models.detection import mobilenet_v2
 model = mobilenet_v2(pretrained=True)
 torch.save(model.state_dict(), 'mobilenet_v2.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.

Review Comment:
   I reordered and consolidated these so they would be easier to follow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] asf-ci commented on pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

asf-ci commented on PR #22069:
URL: https://github.com/apache/beam/pull/22069#issuecomment-1167898231

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

rszper commented on PR #22069:
URL: https://github.com/apache/beam/pull/22069#issuecomment-1180774482

   R: @rezarokni 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r922331873


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -32,55 +32,70 @@ pip install apache-beam==2.40.0
 
 ### PyTorch dependencies
 
+The following installation requirements are for the files used in these examples.
+
 The RunInference API supports the PyTorch framework. To use PyTorch locally, first install `torch`.
 ```
-pip install torch==1.11.0
+pip install torch==1.10.0
 ```
 
-If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you may also need to install `torchvision`.
+If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you also need to install `torchvision`.
 ```
 pip install torchvision
 ```
 
-If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you may also need to install `transformers`.
+If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you also need to install `transformers`.
 ```
 pip install transformers
 ```
 
-For installation of the `torch` dependency on a distributed runner, like Dataflow, refer to these [instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
+For installation of the `torch` dependency on a distributed runner such as Dataflow, refer to the 
+[PyPI dependency instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.
+
+To avoid this issue:
+
+1. Either use elements that have the same size, or resize image inputs and word embeddings to make them 
+the same size. Depending on the language model and encoding technique, this option might not be available. 
+2. Disable batching by overriding the `batch_elements_kwargs` function in your ModelHandler and setting the maximum batch size (`max_batch_size`) to one: `max_batch_size=1`. For more information, see BatchElements PTransforms.

Review Comment:
   Yeah. Not sure why this was posted. Disregard 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

tvalentyn commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r922416093


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -218,16 +228,19 @@ is the word that the model predicts for the mask.
 The pipeline reads rows of pixels corresponding to a digit, performs basic preprocessing, passes the pixels to the Scikit-learn implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
-- **Required**: A path to a file called `INPUT` that contains label and pixels to feed into the model. Each row should have elements that are comma-separated. The first element is the label. All subsuequent elements would be pixel values. It should look something like this:
+
+To use this transform, you need a dataset and model for language modeling.
+
+1. Create a file named `INPUT` that contains labels and pixels to feed into the model. Each row should have comma-separated elements. The first element is the label. All other elements are pixel values. The content of the file should be similar to the following example:
 ```
 1,0,0,0...
 0,0,0,0...
 1,0,0,0...
 4,0,0,0...
 ...
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Required**: A path to a file called `MODEL_PATH` that contains the pickled file of a scikit-learn model trained on MNIST data. Please refer to this scikit-learn [documentation](https://scikit-learn.org/stable/model_persistence.html) on how to serialize models.
+2. Create a file named `OUTPUT`. This file is used by the pipeline to write the predictions.

Review Comment:
   Doesn't pipeline create output files?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] asf-ci commented on pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

asf-ci commented on PR #22069:
URL: https://github.com/apache/beam/pull/22069#issuecomment-1167898244

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] codecov[bot] commented on pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

codecov[bot] commented on PR #22069:
URL: https://github.com/apache/beam/pull/22069#issuecomment-1167924198

   # [Codecov](https://codecov.io/gh/apache/beam/pull/22069?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#22069](https://codecov.io/gh/apache/beam/pull/22069?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e321d0e) into [master](https://codecov.io/gh/apache/beam/commit/71f9dd609c61d77272852137426a7f5c896b2c85?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (71f9dd6) will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@           Coverage Diff           @@
   ##           master   #22069   +/-   ##
   =======================================
     Coverage   73.99%   73.99%           
   =======================================
     Files         703      703           
     Lines       92936    92936           
   =======================================
   + Hits        68769    68771    +2     
   + Misses      22901    22899    -2     
     Partials     1266     1266           
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `83.58% <ø> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/22069?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../python/apache\_beam/testing/test\_stream\_service.py](https://codecov.io/gh/apache/beam/pull/22069/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy90ZXN0X3N0cmVhbV9zZXJ2aWNlLnB5) | `88.09% <0.00%> (-4.77%)` | :arrow_down: |
   | [sdks/python/apache\_beam/utils/interactive\_utils.py](https://codecov.io/gh/apache/beam/pull/22069/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaW50ZXJhY3RpdmVfdXRpbHMucHk=) | `95.12% <0.00%> (-2.44%)` | :arrow_down: |
   | [...che\_beam/runners/interactive/interactive\_runner.py](https://codecov.io/gh/apache/beam/pull/22069/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9pbnRlcmFjdGl2ZV9ydW5uZXIucHk=) | `90.06% <0.00%> (-1.33%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/combiners.py](https://codecov.io/gh/apache/beam/pull/22069/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb21iaW5lcnMucHk=) | `93.05% <0.00%> (-0.39%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/22069/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `93.67% <0.00%> (+0.12%)` | :arrow_up: |
   | [...ks/python/apache\_beam/runners/worker/sdk\_worker.py](https://codecov.io/gh/apache/beam/pull/22069/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvc2RrX3dvcmtlci5weQ==) | `88.94% <0.00%> (+0.47%)` | :arrow_up: |
   | [...eam/runners/portability/fn\_api\_runner/execution.py](https://codecov.io/gh/apache/beam/pull/22069/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL2V4ZWN1dGlvbi5weQ==) | `93.08% <0.00%> (+0.64%)` | :arrow_up: |
   | [...python/apache\_beam/runners/worker/worker\_status.py](https://codecov.io/gh/apache/beam/pull/22069/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvd29ya2VyX3N0YXR1cy5weQ==) | `79.71% <0.00%> (+0.72%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/22069?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/22069?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [71f9dd6...e321d0e](https://codecov.io/gh/apache/beam/pull/22069?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

rszper commented on PR #22069:
URL: https://github.com/apache/beam/pull/22069#issuecomment-1167998407

   @yeandy Thank you for reviewing and for the suggestions. I've committed those changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] TheNeuralBit commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

TheNeuralBit commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r907877286


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -60,27 +61,28 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` (see below) have absolute paths.
+One popular dataset is from [ImageNet](https://www.image-net.org/). Follow their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. Paths can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+- **Required**: Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:

Review Comment:
   nit: this is a script, not commands
   ```suggestion
   - **Required**: Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following script:



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -60,27 +61,28 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` (see below) have absolute paths.
+One popular dataset is from [ImageNet](https://www.image-net.org/). Follow their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. Paths can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+- **Required**: Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
 ```
 import torch
 from torchvision.models.detection import mobilenet_v2
 model = mobilenet_v2(pretrained=True)
 torch.save(model.state_dict(), 'mobilenet_v2.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+- **Required**: A path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `mobilenet_v2` model.

Review Comment:
   Could we instead consolidate the above bullet with this one? That way each bullet corresponds to one of the inputs in "Running `pytorch_image_classification.py`." Something like this:
   
   ```suggestion
   - **Required**: A path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `mobilenet_v2` model. You can download the mobilenet_v2 model from ... use the following script:
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -157,21 +159,22 @@ Each line has data separated by a semicolon ";". The first item is the file name
 ---
 ## Language modeling
 
-[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM architecture from Hugging Face.
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (that is, decoding a masked token in a sentence) using the `BertForMaskedLM` architecture from Hugging Face.
 
 The pipeline reads sentences, performs basic preprocessing to convert the last word into a `[MASK]` token, passes the masked sentence to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
 
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the BertForMaskedLM model. You will need to download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. Make sure you have installed `transformers` too.
+- **Required**: Download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. You must already have `transformers` installed. To download this model, run the following commands:
 ```
 import torch
 from transformers import BertForMaskedLM
 model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True)
 torch.save(model.state_dict(), 'BertForMaskedLM.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: A path to a file called `SENTENCES` that contains sentences to feed into the model. It should look something like this:
+- **Required**: A path to a file namedd `MODEL_STATE_DICT` that contains the saved parameters of the `BertForMaskedLM` model.

Review Comment:
   Similarly here



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -37,17 +37,18 @@ The RunInference API supports the PyTorch framework. To use PyTorch locally, fir
 pip install torch==1.11.0
 ```
 
-If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you may also need to install `torchvision`.
+If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you might also need to install `torchvision`.

Review Comment:
   @yeandy shouldn't these be "will" or "you must install"? Are there cases where users can use this without installing them?



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -60,27 +61,28 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` (see below) have absolute paths.
+One popular dataset is from [ImageNet](https://www.image-net.org/). Follow their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. Paths can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+- **Required**: Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
 ```
 import torch
 from torchvision.models.detection import mobilenet_v2
 model = mobilenet_v2(pretrained=True)
 torch.save(model.state_dict(), 'mobilenet_v2.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.

Review Comment:
   Was removing this intentional? It's still referenced below



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -157,21 +159,22 @@ Each line has data separated by a semicolon ";". The first item is the file name
 ---
 ## Language modeling
 
-[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM architecture from Hugging Face.
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (that is, decoding a masked token in a sentence) using the `BertForMaskedLM` architecture from Hugging Face.
 
 The pipeline reads sentences, performs basic preprocessing to convert the last word into a `[MASK]` token, passes the masked sentence to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
 
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the BertForMaskedLM model. You will need to download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. Make sure you have installed `transformers` too.
+- **Required**: Download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. You must already have `transformers` installed. To download this model, run the following commands:
 ```
 import torch
 from transformers import BertForMaskedLM
 model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True)
 torch.save(model.state_dict(), 'BertForMaskedLM.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: A path to a file called `SENTENCES` that contains sentences to feed into the model. It should look something like this:
+- **Required**: A path to a file namedd `MODEL_STATE_DICT` that contains the saved parameters of the `BertForMaskedLM` model.

Review Comment:
   Also there's a small typo
   ```suggestion
   - **Required**: A path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `BertForMaskedLM` model.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r922427110


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -218,16 +228,19 @@ is the word that the model predicts for the mask.
 The pipeline reads rows of pixels corresponding to a digit, performs basic preprocessing, passes the pixels to the Scikit-learn implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
-- **Required**: A path to a file called `INPUT` that contains label and pixels to feed into the model. Each row should have elements that are comma-separated. The first element is the label. All subsuequent elements would be pixel values. It should look something like this:
+
+To use this transform, you need a dataset and model for language modeling.
+
+1. Create a file named `INPUT` that contains labels and pixels to feed into the model. Each row should have comma-separated elements. The first element is the label. All other elements are pixel values. The content of the file should be similar to the following example:
 ```
 1,0,0,0...
 0,0,0,0...
 1,0,0,0...
 4,0,0,0...
 ...
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Required**: A path to a file called `MODEL_PATH` that contains the pickled file of a scikit-learn model trained on MNIST data. Please refer to this scikit-learn [documentation](https://scikit-learn.org/stable/model_persistence.html) on how to serialize models.
+2. Create a file named `OUTPUT`. This file is used by the pipeline to write the predictions.

Review Comment:
   I updated the wording to: 
   
   Note the path to the `OUTPUT` file created by the pipeline. This file is used by the pipeline to write the predictions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] asf-ci commented on pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

asf-ci commented on PR #22069:
URL: https://github.com/apache/beam/pull/22069#issuecomment-1167898241

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] asf-ci commented on pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

asf-ci commented on PR #22069:
URL: https://github.com/apache/beam/pull/22069#issuecomment-1167898232

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

yeandy commented on PR #22069:
URL: https://github.com/apache/beam/pull/22069#issuecomment-1171393994

   I think we're still doing some final review.
   
   > I think referring to all these inputs as "named" or "called" MODEL_STATE_DICT, OUTPUT is odd. These are just placeholder we're using to refer to the arguments. Why not just refer to the arguments directly? Something like:
   > - `--model_state_dict_path` (**Required**): ...
   > - `--images_dir` (**Optional**): ...
   Makes sense. I was originally thinking it would be more intuitive to read this way. But I'm fine with removing all of those `OUTPUT`, `MODEL_STATE_DICT `, etc. I'll defer to your judgement on this @rszper 
   
   And also, we will transfer these README changes to be reflected in doc for the RunInference webpage?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r922421902


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -218,16 +228,19 @@ is the word that the model predicts for the mask.
 The pipeline reads rows of pixels corresponding to a digit, performs basic preprocessing, passes the pixels to the Scikit-learn implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
-- **Required**: A path to a file called `INPUT` that contains label and pixels to feed into the model. Each row should have elements that are comma-separated. The first element is the label. All subsuequent elements would be pixel values. It should look something like this:
+
+To use this transform, you need a dataset and model for language modeling.
+
+1. Create a file named `INPUT` that contains labels and pixels to feed into the model. Each row should have comma-separated elements. The first element is the label. All other elements are pixel values. The content of the file should be similar to the following example:
 ```
 1,0,0,0...
 0,0,0,0...
 1,0,0,0...
 4,0,0,0...
 ...
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Required**: A path to a file called `MODEL_PATH` that contains the pickled file of a scikit-learn model trained on MNIST data. Please refer to this scikit-learn [documentation](https://scikit-learn.org/stable/model_persistence.html) on how to serialize models.
+2. Create a file named `OUTPUT`. This file is used by the pipeline to write the predictions.

Review Comment:
   Correct. I overlooked this change.
   
   Let's do something along the lines of what we had originally: `A path called `OUTPUT`, to which the pipeline will write the predictions.` What do you think? @rszper



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] tvalentyn merged pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

tvalentyn merged PR #22069:
URL: https://github.com/apache/beam/pull/22069


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] github-actions[bot] commented on pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on PR #22069:
URL: https://github.com/apache/beam/pull/22069#issuecomment-1167904706

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r909673903


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -37,17 +37,18 @@ The RunInference API supports the PyTorch framework. To use PyTorch locally, fir
 pip install torch==1.11.0
 ```
 
-If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you may also need to install `torchvision`.
+If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you might also need to install `torchvision`.

Review Comment:
   Agreed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r907842950


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -108,27 +110,27 @@ This writes the output to the `predictions.csv` with contents like:
 ---
 ## Image segmentation
 
-[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains an implementation for a RunInference pipeline that performs image segementation using the maskrcnn_resnet50_fpn architecture.
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains an implementation for a RunInference pipeline that performs image segementation using the `maskrcnn_resnet50_fpn` architecture.
 
-The pipeline reads images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes predictions to a text file.
 
 ### Dataset and model for image segmentation
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.

Review Comment:
   ```suggestion
   Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` (see below) have absolute paths.
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -60,27 +61,28 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.

Review Comment:
   ```suggestion
   Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` (see below) have absolute paths.
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -108,27 +110,27 @@ This writes the output to the `predictions.csv` with contents like:
 ---
 ## Image segmentation
 
-[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains an implementation for a RunInference pipeline that performs image segementation using the maskrcnn_resnet50_fpn architecture.
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains an implementation for a RunInference pipeline that performs image segementation using the `maskrcnn_resnet50_fpn` architecture.
 
-The pipeline reads images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes predictions to a text file.
 
 ### Dataset and model for image segmentation
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+A popular dataset is from [Coco](https://cocodataset.org/#home). Follow their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image segmentation. Paths can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
-model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+- **Required**: Download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:

Review Comment:
   ```suggestion
   - **Required**: Download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70) model from Pytorch's repository of pretrained models. This model requires the `torchvision` library. To download this model, run the following commands:
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -157,21 +159,22 @@ Each line has data separated by a semicolon ";". The first item is the file name
 ---
 ## Language modeling
 
-[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM architecture from Hugging Face.
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (that is, decoding a masked token in a sentence) using the `BertForMaskedLM` architecture from Hugging Face.
 
 The pipeline reads sentences, performs basic preprocessing to convert the last word into a `[MASK]` token, passes the masked sentence to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
 
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the BertForMaskedLM model. You will need to download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. Make sure you have installed `transformers` too.
+- **Required**: Download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. You must already have `transformers` installed.

Review Comment:
   ```suggestion
   - **Required**: Download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. You must already have `transformers` installed. To download this model, run the following commands:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

yeandy commented on PR #22069:
URL: https://github.com/apache/beam/pull/22069#issuecomment-1168042771

   > This doesn't really discuss that the paths should be on cloud storage if using a distributed runner, and instead uses local filepaths everywhere. Should we provide directions that include staging the models on cloud storage, and use cloud storage file paths? (this is more a question for @yeandy, and could be left for another PR)
   
   Good point. I can make a separate PR to add this information, and then tag @rszper for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

rszper commented on PR #22069:
URL: https://github.com/apache/beam/pull/22069#issuecomment-1167903583

   R: @yeandy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r918927434


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -32,55 +32,70 @@ pip install apache-beam==2.40.0
 
 ### PyTorch dependencies
 
+The following installation requirements are for the files used in these examples.
+
 The RunInference API supports the PyTorch framework. To use PyTorch locally, first install `torch`.
 ```
-pip install torch==1.11.0
+pip install torch==1.10.0
 ```
 
-If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you may also need to install `torchvision`.
+If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you also need to install `torchvision`.
 ```
 pip install torchvision
 ```
 
-If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you may also need to install `transformers`.
+If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you also need to install `transformers`.
 ```
 pip install transformers
 ```
 
-For installation of the `torch` dependency on a distributed runner, like Dataflow, refer to these [instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
+For installation of the `torch` dependency on a distributed runner such as Dataflow, refer to the 
+[PyPI dependency instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.
+
+To avoid this issue:
+
+1. Either use elements that have the same size, or resize image inputs and word embeddings to make them 
+the same size. Depending on the language model and encoding technique, this option might not be available. 
+2. Disable batching by overriding the `batch_elements_kwargs` function in your ModelHandler and setting the maximum batch size (`max_batch_size`) to one: `max_batch_size=1`. For more information, see BatchElements PTransforms.
 
 <!---
 TODO: Add link to full documentation on Beam website when it's published.
 
-i.e. "See the
-[documentation](https://beam.apache.org/documentation/dsls/dataframes/overview/#pre-requisites)
-for details."
+i.e. "For more information, see the
+[Machine Learning](https://beam.apache.org/documentation/sdks/python-machine-learning/) documentation."
+
+Also relevant: https://beam.apache.org/documentation/transforms/python/elementwise/runinference/
 -->
 
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+To use this transform, you need a dataset and model for image classification.
+
+1. Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+One popular dataset is from [ImageNet](https://www.image-net.org/). Follow their instructions to download the images.
+2. Create a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. The path to the file can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+3. Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
 ```
 import torch
 from torchvision.models.detection import mobilenet_v2

Review Comment:
   I have a type here. sorry about that!
   ```suggestion
   from torchvision.models import mobilenet_v2
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -157,26 +163,30 @@ Each line has data separated by a semicolon ";". The first item is the file name
 ---
 ## Language modeling
 
-[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM architecture from Hugging Face.
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (that is, decoding a masked token in a sentence) using the `BertForMaskedLM` architecture from Hugging Face.
 
 The pipeline reads sentences, performs basic preprocessing to convert the last word into a `[MASK]` token, passes the masked sentence to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
 
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the BertForMaskedLM model. You will need to download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. Make sure you have installed `transformers` too.
+To use this transform, you need a dataset and model for language modeling. 

Review Comment:
   ```suggestion
   To use this transform, you need a dataset and model for language modeling.
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -32,55 +32,57 @@ pip install apache-beam==2.40.0
 
 ### PyTorch dependencies
 
+The following installation requirements are for the files used in these examples.
+
 The RunInference API supports the PyTorch framework. To use PyTorch locally, first install `torch`.
 ```
-pip install torch==1.11.0
+pip install torch==1.10.0
 ```
 
-If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you may also need to install `torchvision`.
+If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you also need to install `torchvision`.
 ```
 pip install torchvision
 ```
 
-If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you may also need to install `transformers`.
+If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you also need to install `transformers`.
 ```
 pip install transformers
 ```
 
-For installation of the `torch` dependency on a distributed runner, like Dataflow, refer to these [instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
-
-<!---
-TODO: Add link to full documentation on Beam website when it's published.
+For installation of the `torch` dependency on a distributed runner such as Dataflow, refer to the 

Review Comment:
   ```suggestion
   For installation of the `torch` dependency on a distributed runner such as Dataflow, refer to the
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -32,55 +32,70 @@ pip install apache-beam==2.40.0
 
 ### PyTorch dependencies
 
+The following installation requirements are for the files used in these examples.
+
 The RunInference API supports the PyTorch framework. To use PyTorch locally, first install `torch`.
 ```
-pip install torch==1.11.0
+pip install torch==1.10.0
 ```
 
-If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you may also need to install `torchvision`.
+If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you also need to install `torchvision`.
 ```
 pip install torchvision
 ```
 
-If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you may also need to install `transformers`.
+If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you also need to install `transformers`.
 ```
 pip install transformers
 ```
 
-For installation of the `torch` dependency on a distributed runner, like Dataflow, refer to these [instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
+For installation of the `torch` dependency on a distributed runner such as Dataflow, refer to the 
+[PyPI dependency instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.
+
+To avoid this issue:
+
+1. Either use elements that have the same size, or resize image inputs and word embeddings to make them 
+the same size. Depending on the language model and encoding technique, this option might not be available. 
+2. Disable batching by overriding the `batch_elements_kwargs` function in your ModelHandler and setting the maximum batch size (`max_batch_size`) to one: `max_batch_size=1`. For more information, see BatchElements PTransforms.

Review Comment:
   This should have its own subheader I think. Doesn't belong to the `Dependencies` section. Could we call it something like `Usage Notes` or `Notes`, and have it be in the same header level as `Prerequisites` (i.e. `## Usage Notes`)? 



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -108,27 +110,31 @@ This writes the output to the `predictions.csv` with contents like:
 ---
 ## Image segmentation
 
-[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains an implementation for a RunInference pipeline that performs image segementation using the maskrcnn_resnet50_fpn architecture.
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains an implementation for a RunInference pipeline that performs image segementation using the `maskrcnn_resnet50_fpn` architecture.
 
-The pipeline reads images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes predictions to a text file.
 
 ### Dataset and model for image segmentation
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+
+To use this transform, you need a dataset and model for image segmentation. 

Review Comment:
   ```suggestion
   To use this transform, you need a dataset and model for image segmentation.
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -108,27 +110,31 @@ This writes the output to the `predictions.csv` with contents like:
 ---
 ## Image segmentation
 
-[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains an implementation for a RunInference pipeline that performs image segementation using the maskrcnn_resnet50_fpn architecture.
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains an implementation for a RunInference pipeline that performs image segementation using the `maskrcnn_resnet50_fpn` architecture.
 
-The pipeline reads images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes predictions to a text file.
 
 ### Dataset and model for image segmentation
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+
+To use this transform, you need a dataset and model for image segmentation. 
+
+1. Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+A popular dataset is from [Coco](https://cocodataset.org/#home). Follow their instructions to download the images.
+2. Create a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image segmentation. The path to the file can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
-model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+3. Download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
 ```
 import torch
 from torchvision.models.detection import maskrcnn_resnet50_fpn
 model = maskrcnn_resnet50_fpn(pretrained=True)
 torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+4. Create a path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `maskrcnn_resnet50_fpn` model. 

Review Comment:
   ```suggestion
   4. Create a path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `maskrcnn_resnet50_fpn` model.
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -157,26 +163,30 @@ Each line has data separated by a semicolon ";". The first item is the file name
 ---
 ## Language modeling
 
-[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM architecture from Hugging Face.
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (that is, decoding a masked token in a sentence) using the `BertForMaskedLM` architecture from Hugging Face.
 
 The pipeline reads sentences, performs basic preprocessing to convert the last word into a `[MASK]` token, passes the masked sentence to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
 
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the BertForMaskedLM model. You will need to download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. Make sure you have installed `transformers` too.
+To use this transform, you need a dataset and model for language modeling. 
+
+1. Download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. You must already have `transformers` installed.
 ```
 import torch
 from transformers import BertForMaskedLM
 model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True)
 torch.save(model.state_dict(), 'BertForMaskedLM.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: A path to a file called `SENTENCES` that contains sentences to feed into the model. It should look something like this:
+2. Create a file named `MODEL_STATE_DICT` that contains the saved parameters of the `BertForMaskedLM` model. 

Review Comment:
   ```suggestion
   2. Create a file named `MODEL_STATE_DICT` that contains the saved parameters of the `BertForMaskedLM` model.
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -218,16 +228,19 @@ is the word that the model predicts for the mask.
 The pipeline reads rows of pixels corresponding to a digit, performs basic preprocessing, passes the pixels to the Scikit-learn implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
-- **Required**: A path to a file called `INPUT` that contains label and pixels to feed into the model. Each row should have elements that are comma-separated. The first element is the label. All subsuequent elements would be pixel values. It should look something like this:
+
+To use this transform, you need a dataset and model for language modeling. 

Review Comment:
   ```suggestion
   To use this transform, you need a dataset and model for language modeling.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r917066375


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -60,27 +61,28 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` (see below) have absolute paths.
+One popular dataset is from [ImageNet](https://www.image-net.org/). Follow their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. Paths can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+- **Required**: Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
 ```
 import torch
 from torchvision.models.detection import mobilenet_v2
 model = mobilenet_v2(pretrained=True)
 torch.save(model.state_dict(), 'mobilenet_v2.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+- **Required**: A path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `mobilenet_v2` model.

Review Comment:
   You're right. I halfway made these steps. I'll update them to be steps instead of a list of requirements, because steps will be easier for users to follow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] TheNeuralBit commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

TheNeuralBit commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r916989262


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -60,27 +61,28 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` (see below) have absolute paths.
+One popular dataset is from [ImageNet](https://www.image-net.org/). Follow their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. Paths can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+- **Required**: Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
 ```
 import torch
 from torchvision.models.detection import mobilenet_v2
 model = mobilenet_v2(pretrained=True)
 torch.save(model.state_dict(), 'mobilenet_v2.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+- **Required**: A path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `mobilenet_v2` model.

Review Comment:
   That makes sense, but to me this looks like a bulleted list defining the four inputs to the script, not a list of directions/actions. For each input it identifies if they are Required/Optional, and explains how to populate them. It's odd to me to then have one separate bullet that represents an instruction.



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -60,27 +61,28 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` (see below) have absolute paths.
+One popular dataset is from [ImageNet](https://www.image-net.org/). Follow their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. Paths can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+- **Required**: Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
 ```
 import torch
 from torchvision.models.detection import mobilenet_v2
 model = mobilenet_v2(pretrained=True)
 torch.save(model.state_dict(), 'mobilenet_v2.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+- **Required**: A path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `mobilenet_v2` model.

Review Comment:
   That makes sense, but to me this looks like a bulleted list defining the four inputs to the script, not a list of directions/actions. For each input it identifies if they are Required/Optional, and explains how to populate them. It's odd to me to then have one separate bullet that represents an action.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r922328517


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -32,55 +32,70 @@ pip install apache-beam==2.40.0
 
 ### PyTorch dependencies
 
+The following installation requirements are for the files used in these examples.
+
 The RunInference API supports the PyTorch framework. To use PyTorch locally, first install `torch`.
 ```
-pip install torch==1.11.0
+pip install torch==1.10.0
 ```
 
-If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you may also need to install `torchvision`.
+If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you also need to install `torchvision`.
 ```
 pip install torchvision
 ```
 
-If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you may also need to install `transformers`.
+If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you also need to install `transformers`.
 ```
 pip install transformers
 ```
 
-For installation of the `torch` dependency on a distributed runner, like Dataflow, refer to these [instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
+For installation of the `torch` dependency on a distributed runner such as Dataflow, refer to the 
+[PyPI dependency instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
+
+RunInference uses dynamic batching. However, the RunInference API cannot batch tensor elements of different sizes, because `torch.stack()` expects tensors of the same length. If you provide images of different sizes or word embeddings of different lengths, errors might occur.
+
+To avoid this issue:
+
+1. Either use elements that have the same size, or resize image inputs and word embeddings to make them 
+the same size. Depending on the language model and encoding technique, this option might not be available. 
+2. Disable batching by overriding the `batch_elements_kwargs` function in your ModelHandler and setting the maximum batch size (`max_batch_size`) to one: `max_batch_size=1`. For more information, see BatchElements PTransforms.

Review Comment:
   I'm pretty sure I removed this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] rszper commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

rszper commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r908946773


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -37,17 +37,18 @@ The RunInference API supports the PyTorch framework. To use PyTorch locally, fir
 pip install torch==1.11.0
 ```
 
-If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you may also need to install `torchvision`.
+If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you might also need to install `torchvision`.

Review Comment:
   If they need the dependencies for this example to work, we should say,
   
   "..., you must also install `torchvision`." or "..., you must also have `torchvision` installed."
   
   We can definitely add a line saying that these installation requirements are for the example files provided on this page.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r907850829


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -37,17 +37,18 @@ The RunInference API supports the PyTorch framework. To use PyTorch locally, fir
 pip install torch==1.11.0
 ```
 
-If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you may also need to install `torchvision`.
+If you are using pretrained models from Pytorch's `torchvision.models` [subpackage](https://pytorch.org/vision/0.12/models.html#models-and-pre-trained-weights), you might also need to install `torchvision`.
 ```
 pip install torchvision
 ```
 
-If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you may also need to install `transformers`.
+If you are using pretrained models from Hugging Face's `transformers` [package](https://huggingface.co/docs/transformers/index), you might also need to install `transformers`.
 ```
 pip install transformers
 ```
 
-For installation of the `torch` dependency on a distributed runner, like Dataflow, refer to these [instructions](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#pypi-dependencies).
+For installation of the `torch` dependency on a distributed runner such as Dataflow, refer to the 

Review Comment:
   Remove space due to `Trailing whitespace` error.
   ```suggestion
   For installation of the `torch` dependency on a distributed runner such as Dataflow, refer to the
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -157,21 +159,22 @@ Each line has data separated by a semicolon ";". The first item is the file name
 ---
 ## Language modeling
 
-[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM architecture from Hugging Face.
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains an implementation for a RunInference pipeline that performs masked language modeling (that is, decoding a masked token in a sentence) using the `BertForMaskedLM` architecture from Hugging Face.
 
 The pipeline reads sentences, performs basic preprocessing to convert the last word into a `[MASK]` token, passes the masked sentence to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
 
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the BertForMaskedLM model. You will need to download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. Make sure you have installed `transformers` too.
+- **Required**: Download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. You must already have `transformers` installed.
 ```
 import torch
 from transformers import BertForMaskedLM
 model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True)
 torch.save(model.state_dict(), 'BertForMaskedLM.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: A path to a file called `SENTENCES` that contains sentences to feed into the model. It should look something like this:
+- **Required**: A path to a file namedd `MODEL_STATE_DICT` that contains the saved parameters of the `BertForMaskedLM` model. 

Review Comment:
   Remove space due to `Trailing whitespace` error.
   ```suggestion
   - **Required**: A path to a file namedd `MODEL_STATE_DICT` that contains the saved parameters of the `BertForMaskedLM` model.
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -108,27 +110,27 @@ This writes the output to the `predictions.csv` with contents like:
 ---
 ## Image segmentation
 
-[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains an implementation for a RunInference pipeline that performs image segementation using the maskrcnn_resnet50_fpn architecture.
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains an implementation for a RunInference pipeline that performs image segementation using the `maskrcnn_resnet50_fpn` architecture.
 
-The pipeline reads images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes predictions to a text file.
 
 ### Dataset and model for image segmentation
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+A popular dataset is from [Coco](https://cocodataset.org/#home). Follow their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image segmentation. Paths can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
-model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+- **Required**: Download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
 ```
 import torch
 from torchvision.models.detection import maskrcnn_resnet50_fpn
 model = maskrcnn_resnet50_fpn(pretrained=True)
 torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+- **Required**: A path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `maskrcnn_resnet50_fpn` model. 

Review Comment:
   Remove space due to `Trailing whitespace` error.
   ```suggestion
   - **Required**: A path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `maskrcnn_resnet50_fpn` model.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] yeandy commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

Posted by GitBox <gi...@apache.org>.

yeandy commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r907884959


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -60,27 +61,28 @@ for details."
 ---
 ## Image classification
 
-[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the mobilenet_v2 architecture.
+[`pytorch_image_classification.py`](./pytorch_image_classification.py) contains an implementation for a RunInference pipeline that performs image classification using the `mobilenet_v2` architecture.
 
-The pipeline reads the images, performs basic preprocessing, passes them to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
+The pipeline reads the images, performs basic preprocessing, passes the images to the PyTorch implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for image classification
 
-You will need to create or download images, and place them into your `IMAGES_DIR` directory. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
-- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. Paths can be different types of URIs such as your local file system, a AWS S3 bucket or GCP Cloud Storage bucket. For example:
+Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` (see below) have absolute paths.
+One popular dataset is from [ImageNet](https://www.image-net.org/). Follow their instructions to download the images.
+- **Required**: A path to a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. Paths can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. Note that this requires `torchvision` library.
+- **Required**: Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
 ```
 import torch
 from torchvision.models.detection import mobilenet_v2
 model = mobilenet_v2(pretrained=True)
 torch.save(model.state_dict(), 'mobilenet_v2.pth')
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.

Review Comment:
   It was moved to the to beginning of this section.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org