You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/14 21:50:15 UTC

[GitHub] [beam] yeandy opened a new pull request, #21871: Modify README for 3 pytorch examples

yeandy opened a new pull request, #21871:
URL: https://github.com/apache/beam/pull/21871

   Modify README for the 3 examples. They instruct users on how to set up data and models for the pipeline.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #21871:
URL: https://github.com/apache/beam/pull/21871#issuecomment-1155744964

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #21871:
URL: https://github.com/apache/beam/pull/21871#issuecomment-1155744965

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #21871:
URL: https://github.com/apache/beam/pull/21871#issuecomment-1155744963

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn merged pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
tvalentyn merged PR #21871:
URL: https://github.com/apache/beam/pull/21871


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pcoet commented on a diff in pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
pcoet commented on code in PR #21871:
URL: https://github.com/apache/beam/pull/21871#discussion_r897404144


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.

Review Comment:
   It looks like this should be deleted: "that contains the images you want to feed into your model. "



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.

Review Comment:
   Looks like this should be deleted: "that contains the images you want to feed into your model. "



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.

Review Comment:
   Also, "absolute path" -> "absolute paths"



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
+### Running `pytorch_image_segmentation.py`
+
+To run the image segmentation pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input image_file_names.txt \
+  --output predictions.csv \
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
+```
+/absolute/path/to/image1.jpg;['parking meter', 'bottle', 'person', 'traffic light', 'traffic light', 'traffic light']
+/absolute/path/to/image2.jpg;['bottle', 'person', 'person']
 ...
 ```
+Each line has data separated by a semicolon ";".
+The first item is the file name. The second item are a list of predicted instances.

Review Comment:
   "item are" -> "item is"



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
+### Running `pytorch_image_segmentation.py`
+
+To run the image segmentation pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input image_file_names.txt \
+  --output predictions.csv \
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
+```
+/absolute/path/to/image1.jpg;['parking meter', 'bottle', 'person', 'traffic light', 'traffic light', 'traffic light']
+/absolute/path/to/image2.jpg;['bottle', 'person', 'person']
 ...
 ```
+Each line has data separated by a semicolon ";".
+The first item is the file name. The second item are a list of predicted instances.
 
-The second item in each line is the integer representing the predicted class of the
-image.
+---
+## Language modeling
+
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains
+an implementation for a RunInference pipeline that peforms masked language
+modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM
+architecture from Hugging Face.
+
+The pipeline reads sentences, performs basic preprocessing to conver the last
+word into a `[MASK]` token, passes the masked sentence to PyTorch
+implementation of RunInference, and then writes the predictions to a text file.
+
+### Dataset and model for language modeling
+
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the BertForMaskedLM model. You will need to download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM)
+model from Hugging Face's repository of pretrained models.
+Make sure you have installed `transformers` too.
+```
+import torch
+from transformers import BertForMaskedLM
+model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True)
+torch.save(model.state_dict(), 'BertForMaskedLM.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: A path to a file called `SENTENCES` that contains sentences to
+feed into the model. It should look something like this:
+```
+The capital of France is Paris .
+It is raining cats and dogs .
+...
+```
+### Running `pytorch_language_modeling.py`
+
+To run the language modeling pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_language_modeling \
+  --input SENTENCES \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_language_modeling \
+  --input sentences.txt \
+  --output predictions.csv \
+  --model_state_dict_path BertForMaskedLM.pth
+```
+If you don't provide a sentences file, it will run the pipeline with some
+example sentences we created.

Review Comment:
   You can probably remove "we created".



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
+### Running `pytorch_image_segmentation.py`
+
+To run the image segmentation pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input image_file_names.txt \
+  --output predictions.csv \
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
+```
+/absolute/path/to/image1.jpg;['parking meter', 'bottle', 'person', 'traffic light', 'traffic light', 'traffic light']
+/absolute/path/to/image2.jpg;['bottle', 'person', 'person']
 ...
 ```
+Each line has data separated by a semicolon ";".
+The first item is the file name. The second item are a list of predicted instances.
 
-The second item in each line is the integer representing the predicted class of the
-image.
+---
+## Language modeling
+
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains
+an implementation for a RunInference pipeline that peforms masked language
+modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM
+architecture from Hugging Face.
+
+The pipeline reads sentences, performs basic preprocessing to conver the last

Review Comment:
   "conver" -> "convert"



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.

Review Comment:
   Also, "absolute path" -> "absolute paths"



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
+### Running `pytorch_image_segmentation.py`
+
+To run the image segmentation pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input image_file_names.txt \
+  --output predictions.csv \
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
+```
+/absolute/path/to/image1.jpg;['parking meter', 'bottle', 'person', 'traffic light', 'traffic light', 'traffic light']
+/absolute/path/to/image2.jpg;['bottle', 'person', 'person']
 ...
 ```
+Each line has data separated by a semicolon ";".
+The first item is the file name. The second item are a list of predicted instances.
 
-The second item in each line is the integer representing the predicted class of the
-image.
+---
+## Language modeling
+
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains
+an implementation for a RunInference pipeline that peforms masked language
+modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM
+architecture from Hugging Face.
+
+The pipeline reads sentences, performs basic preprocessing to conver the last
+word into a `[MASK]` token, passes the masked sentence to PyTorch

Review Comment:
   "to PyTorch" -> "to the PyTorch"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on a diff in pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
yeandy commented on code in PR #21871:
URL: https://github.com/apache/beam/pull/21871#discussion_r897420513


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.

Review Comment:
   Done.



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on a diff in pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
yeandy commented on code in PR #21871:
URL: https://github.com/apache/beam/pull/21871#discussion_r897420411


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
+### Running `pytorch_image_segmentation.py`
+
+To run the image segmentation pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input image_file_names.txt \
+  --output predictions.csv \
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
+```
+/absolute/path/to/image1.jpg;['parking meter', 'bottle', 'person', 'traffic light', 'traffic light', 'traffic light']
+/absolute/path/to/image2.jpg;['bottle', 'person', 'person']
 ...
 ```
+Each line has data separated by a semicolon ";".
+The first item is the file name. The second item are a list of predicted instances.
 
-The second item in each line is the integer representing the predicted class of the
-image.
+---
+## Language modeling
+
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains
+an implementation for a RunInference pipeline that peforms masked language
+modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM
+architecture from Hugging Face.
+
+The pipeline reads sentences, performs basic preprocessing to conver the last
+word into a `[MASK]` token, passes the masked sentence to PyTorch
+implementation of RunInference, and then writes the predictions to a text file.
+
+### Dataset and model for language modeling
+
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the BertForMaskedLM model. You will need to download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM)
+model from Hugging Face's repository of pretrained models.
+Make sure you have installed `transformers` too.
+```
+import torch
+from transformers import BertForMaskedLM
+model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True)
+torch.save(model.state_dict(), 'BertForMaskedLM.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: A path to a file called `SENTENCES` that contains sentences to
+feed into the model. It should look something like this:
+```
+The capital of France is Paris .
+It is raining cats and dogs .
+...
+```
+### Running `pytorch_language_modeling.py`
+
+To run the language modeling pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_language_modeling \
+  --input SENTENCES \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_language_modeling \
+  --input sentences.txt \
+  --output predictions.csv \
+  --model_state_dict_path BertForMaskedLM.pth
+```
+If you don't provide a sentences file, it will run the pipeline with some
+example sentences we created.

Review Comment:
   Done.



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
+### Running `pytorch_image_segmentation.py`
+
+To run the image segmentation pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input image_file_names.txt \
+  --output predictions.csv \
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
+```
+/absolute/path/to/image1.jpg;['parking meter', 'bottle', 'person', 'traffic light', 'traffic light', 'traffic light']
+/absolute/path/to/image2.jpg;['bottle', 'person', 'person']
 ...
 ```
+Each line has data separated by a semicolon ";".
+The first item is the file name. The second item are a list of predicted instances.
 
-The second item in each line is the integer representing the predicted class of the
-image.
+---
+## Language modeling
+
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains
+an implementation for a RunInference pipeline that peforms masked language
+modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM
+architecture from Hugging Face.
+
+The pipeline reads sentences, performs basic preprocessing to conver the last
+word into a `[MASK]` token, passes the masked sentence to PyTorch

Review Comment:
   Done.



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
+### Running `pytorch_image_segmentation.py`
+
+To run the image segmentation pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input image_file_names.txt \
+  --output predictions.csv \
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
+```
+/absolute/path/to/image1.jpg;['parking meter', 'bottle', 'person', 'traffic light', 'traffic light', 'traffic light']
+/absolute/path/to/image2.jpg;['bottle', 'person', 'person']
 ...
 ```
+Each line has data separated by a semicolon ";".
+The first item is the file name. The second item are a list of predicted instances.
 
-The second item in each line is the integer representing the predicted class of the
-image.
+---
+## Language modeling
+
+[`pytorch_language_modeling.py`](./pytorch_language_modeling.py) contains
+an implementation for a RunInference pipeline that peforms masked language
+modeling (i.e. decoding a masked token in a sentence) using the BertForMaskedLM
+architecture from Hugging Face.
+
+The pipeline reads sentences, performs basic preprocessing to conver the last

Review Comment:
   Done.



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
+```
+import torch
+from torchvision.models.detection import mobilenet_v2
+model = mobilenet_v2(pretrained=True)
+torch.save(model.state_dict(), 'mobilenet_v2.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
 
 ### Running `pytorch_image_classification.py`
 
 To run the image classification pipeline locally, use the following command:
 ```sh
 python -m apache_beam.examples.inference.pytorch_image_classification \
-  --input gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_classification \
+  --input image_file_names.txt \
   --output predictions.csv \
-  --model_state_dict_path gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
 ```
+/absolute/path/to/image1.jpg;1
+/absolute/path/to/image2.jpg;333
+...
+```
+---
+## Image segmentation
+
+[`pytorch_image_segmentation.py`](./pytorch_image_segmentation.py) contains
+an implementation for a RunInference pipeline that peforms image segementation using the
+maskrcnn_resnet50_fpn architecture.
 
-This writes the output to `predictions.csv` with contents like:
+The pipeline reads images, performs basic preprocessing, passes them to the
+PyTorch implementation of RunInference, and then writes the predictions
+to a text file.
+
+### Dataset and model for image segmentation
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. Another popular dataset is from [Coco](https://cocodataset.org/#home). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.
 ```
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005002.JPEG,333
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005003.JPEG,711
-gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00005004.JPEG,286
+import torch
+from torchvision.models.detection import maskrcnn_resnet50_fpn
+model = maskrcnn_resnet50_fpn(pretrained=True)
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Optional**: `IMAGES_DIR`, which is the path to the directory where images are stored. that contains the images you want to feed into your model. Not required if image names in the input file `IMAGE_FILE_NAMES` have absolute path.
+### Running `pytorch_image_segmentation.py`
+
+To run the image segmentation pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input IMAGE_FILE_NAMES \
+  --images_dir IMAGES_DIR \
+  --output OUTPUT \
+  --model_state_dict_path MODEL_STATE_DICT
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.pytorch_image_segmentation \
+  --input image_file_names.txt \
+  --output predictions.csv \
+  --model_state_dict_path maskrcnn_resnet50_fpn.pth
+```
+This writes the output to the `predictions.csv` with contents like:
+```
+/absolute/path/to/image1.jpg;['parking meter', 'bottle', 'person', 'traffic light', 'traffic light', 'traffic light']
+/absolute/path/to/image2.jpg;['bottle', 'person', 'person']
 ...
 ```
+Each line has data separated by a semicolon ";".
+The first item is the file name. The second item are a list of predicted instances.

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
yeandy commented on PR #21871:
URL: https://github.com/apache/beam/pull/21871#issuecomment-1155745570

   R: @AnandInguva @ryanthompson591 @tvalentyn @TheNeuralBit @rezarokni @pcoet 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #21871:
URL: https://github.com/apache/beam/pull/21871#issuecomment-1155744966

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on a diff in pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
yeandy commented on code in PR #21871:
URL: https://github.com/apache/beam/pull/21871#discussion_r897405701


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.

Review Comment:
   Fixed



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2

Review Comment:
   Fixed



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #21871:
URL: https://github.com/apache/beam/pull/21871#issuecomment-1155804275

   # [Codecov](https://codecov.io/gh/apache/beam/pull/21871?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#21871](https://codecov.io/gh/apache/beam/pull/21871?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7a84887) into [master](https://codecov.io/gh/apache/beam/commit/18ab78c8c42a83816a43a3d5252f6d843a82b36c?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (18ab78c) will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #21871      +/-   ##
   ==========================================
   - Coverage   74.07%   74.06%   -0.02%     
   ==========================================
     Files         698      698              
     Lines       92574    92600      +26     
   ==========================================
   + Hits        68577    68581       +4     
   - Misses      22742    22764      +22     
     Partials     1255     1255              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `83.73% <ø> (-0.03%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/21871?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../python/apache\_beam/testing/test\_stream\_service.py](https://codecov.io/gh/apache/beam/pull/21871/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdGVzdGluZy90ZXN0X3N0cmVhbV9zZXJ2aWNlLnB5) | `88.09% <0.00%> (-4.77%)` | :arrow_down: |
   | [...che\_beam/runners/interactive/interactive\_runner.py](https://codecov.io/gh/apache/beam/pull/21871/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9pbnRlcmFjdGl2ZV9ydW5uZXIucHk=) | `90.06% <0.00%> (-1.33%)` | :arrow_down: |
   | [...eam/runners/portability/fn\_api\_runner/execution.py](https://codecov.io/gh/apache/beam/pull/21871/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL2V4ZWN1dGlvbi5weQ==) | `92.44% <0.00%> (-0.65%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/21871/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `93.30% <0.00%> (-0.25%)` | :arrow_down: |
   | [sdks/python/apache\_beam/pipeline.py](https://codecov.io/gh/apache/beam/pull/21871/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcGlwZWxpbmUucHk=) | `91.80% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/urns.py](https://codecov.io/gh/apache/beam/pull/21871/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvdXJucy5weQ==) | `88.70% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/coders/coders.py](https://codecov.io/gh/apache/beam/pull/21871/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vY29kZXJzL2NvZGVycy5weQ==) | `88.22% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/runners/common.py](https://codecov.io/gh/apache/beam/pull/21871/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=) | `88.68% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/utils/counters.py](https://codecov.io/gh/apache/beam/pull/21871/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvY291bnRlcnMucHk=) | `85.39% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/21871/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `70.07% <0.00%> (ø)` | |
   | ... and [36 more](https://codecov.io/gh/apache/beam/pull/21871/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/21871?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/21871?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [18ab78c...7a84887](https://codecov.io/gh/apache/beam/pull/21871?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on PR #21871:
URL: https://github.com/apache/beam/pull/21871#issuecomment-1156321674

   Run Whitespace PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #21871:
URL: https://github.com/apache/beam/pull/21871#issuecomment-1155744967

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] rezarokni commented on a diff in pull request #21871: Modify README for 3 pytorch examples

Posted by GitBox <gi...@apache.org>.
rezarokni commented on code in PR #21871:
URL: https://github.com/apache/beam/pull/21871#discussion_r897368194


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.

Review Comment:
   change 
   place them into this directory called `IMAGES_DIR`. 
   
   place them into your `IMAGES_DIR`  directory. 



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2

Review Comment:
   c/thatpeforms/ that performs 



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -48,77 +58,181 @@ i.e. "See the
 for details."
 -->
 
-### Datasets and models for RunInference
-
-The RunInference example pipelines read example data from `gs://apache-beam-ml/`.
-You can view the data
-[here](https://console.cloud.google.com/storage/browser/apache-beam-ml). You can
-also list the example data using the
-[gsutil tool](https://cloud.google.com/storage/docs/gsutil#gettingstarted).
-
-```
-gsutil ls gs://apache-beam-ml
-```
-
 ---
-## Image classification with ImageNet dataset
+## Image classification
 
 [`pytorch_image_classification.py`](./pytorch_image_classification.py) contains
-an implementation for a RunInference pipeline that peforms image classification
-on the [ImageNet dataset](https://www.image-net.org/) using the MobileNetV2
+an implementation for a RunInference pipeline thatpeforms image classification using the mobilenet_v2
 architecture.
 
 The pipeline reads the images, performs basic preprocessing, passes them to the
 PyTorch implementation of RunInference, and then writes the predictions
-to a text file in GCS.
+to a text file.
 
 ### Dataset and model for image classification
 
-The image classification pipeline uses the following data:
-<!---
-TODO: Add once benchmark test is released
-- `gs://apache-beam-ml/testing/inputs/imagenet_validation_inputs.txt`:
-  text file containing the GCS paths of the images of all 5000 imagenet validation data
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-    - ...
-    - gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00050000.JPEG
--->
-- `gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt`:
-  Text file containing the GCS paths of the images of a subset of ImageNet
-  validation data. The following example command shows how to view contents of
-  the file:
-
-  ```
-  $ gsutil cat gs://apache-beam-ml/testing/inputs/it_imagenet_validation_inputs.txt
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000001.JPEG
-  ...
-  gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_00000015.JPEG
-  ```
-
-- `gs://apache-beam-ml/datasets/imagenet/raw-data/validation/ILSVRC2012_val_*.JPEG`:
-  JPEG images for the entire validation dataset.
-
-- `gs://apache-beam-ml/models/torchvision.models.mobilenet_v2.pth`: Path to
-  the location of the saved `state_dict` of the pretrained `mobilenet_v2` model
-  from the `torchvision.models` subdirectory.
+You will need to create or download images, and place them into this directory called `IMAGES_DIR`. One popular dataset is from [ImageNet](https://www.image-net.org/). Please follow their instructions to download the images.
+- **Required**: A path to a file called `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` on which you want to run image segmentation. For example:
+```
+/absolute/path/to/image1.jpg
+/absolute/path/to/image2.jpg
+```
+- **Required**: A path to a file called `MODEL_STATE_DICT` that contains the saved
+parameters of the maskrcnn_resnet50_fpn model. You will need to download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html)
+model from Pytorch's repository of pretrained models. Make sure you have installed `torchvision` too.

Review Comment:
   Note this requires `torchvision` library 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org