You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/09/06 12:17:55 UTC
[GitHub] [beam] yeandy commented on a diff in pull request #23018: Clarify inference example docs

yeandy commented on code in PR #23018:
URL: https://github.com/apache/beam/pull/23018#discussion_r963621805


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -67,22 +67,20 @@ The pipeline reads the images, performs basic preprocessing, passes the images t
 
 To use this transform, you need a dataset and model for image classification.
 
-1. Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+1. Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES.txt` you create in step 2 have absolute paths.
 One popular dataset is from [ImageNet](https://www.image-net.org/). Follow their instructions to download the images.
-2. Create a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. The path to the file can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
+2. Create a file named `IMAGE_FILE_NAMES.txt` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. The path to the file can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-3. Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
+3. Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands from a python shell:

Review Comment:
   ```suggestion
   3. Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands from a Python shell:
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -118,22 +121,21 @@ The pipeline reads images, performs basic preprocessing, passes the images to th
 
 To use this transform, you need a dataset and model for image segmentation.
 
-1. Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+1. Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES.txt` you create in step 2 have absolute paths.
 A popular dataset is from [Coco](https://cocodataset.org/#home). Follow their instructions to download the images.
-2. Create a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image segmentation. The path to the file can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
+2. Create a file named `IMAGE_FILE_NAMES.txt` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image segmentation. The path to the file can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-3. Download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
+3. Download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands from a python shell:

Review Comment:
   ```suggestion
   3. Download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands from a Python shell:
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -118,22 +121,21 @@ The pipeline reads images, performs basic preprocessing, passes the images to th
 
 To use this transform, you need a dataset and model for image segmentation.
 
-1. Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+1. Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES.txt` you create in step 2 have absolute paths.
 A popular dataset is from [Coco](https://cocodataset.org/#home). Follow their instructions to download the images.
-2. Create a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image segmentation. The path to the file can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
+2. Create a file named `IMAGE_FILE_NAMES.txt` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image segmentation. The path to the file can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-3. Download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
+3. Download the [maskrcnn_resnet50_fpn](https://pytorch.org/vision/0.12/models.html#id70) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands from a python shell:
 ```
 import torch
 from torchvision.models.detection import maskrcnn_resnet50_fpn
 model = maskrcnn_resnet50_fpn(pretrained=True)
-torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth')
+torch.save(model.state_dict(), 'maskrcnn_resnet50_fpn.pth') # You can replace maskrcnn_resnet50_fpn.pth with your preferred file name for your model state dictionary.
 ```
-4. Create a path to a file named `MODEL_STATE_DICT` that contains the saved parameters of the `maskrcnn_resnet50_fpn` model.

Review Comment:
   Why did you remove comments about `MODEL_STATE_DICT`?



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -67,22 +67,20 @@ The pipeline reads the images, performs basic preprocessing, passes the images t
 
 To use this transform, you need a dataset and model for image classification.
 
-1. Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES` have absolute paths.
+1. Create a directory named `IMAGES_DIR`. Create or download images and put them in this directory. The directory is not required if image names in the input file `IMAGE_FILE_NAMES.txt` you create in step 2 have absolute paths.
 One popular dataset is from [ImageNet](https://www.image-net.org/). Follow their instructions to download the images.
-2. Create a file named `IMAGE_FILE_NAMES` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. The path to the file can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
+2. Create a file named `IMAGE_FILE_NAMES.txt` that contains the absolute paths of each of the images in `IMAGES_DIR` that you want to use to run image classification. The path to the file can be different types of URIs such as your local file system, an AWS S3 bucket, or a GCP Cloud Storage bucket. For example:
 ```
 /absolute/path/to/image1.jpg
 /absolute/path/to/image2.jpg
 ```
-3. Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands:
+3. Download the [mobilenet_v2](https://pytorch.org/vision/stable/_modules/torchvision/models/mobilenetv2.html) model from Pytorch's repository of pretrained models. This model requires the torchvision library. To download this model, run the following commands from a python shell:
 ```
 import torch
 from torchvision.models import mobilenet_v2
 model = mobilenet_v2(pretrained=True)
-torch.save(model.state_dict(), 'mobilenet_v2.pth')
+torch.save(model.state_dict(), 'mobilenet_v2.pth') # You can replace mobilenet_v2.pth with your preferred file name for your model state dictionary.
 ```
-4. Create a file named `MODEL_STATE_DICT` that contains the saved parameters of the `mobilenet_v2` model.
-5. Note the path to the `OUTPUT` file. This file is used by the pipeline to write the predictions.

Review Comment:
   Why did you remove comments about `MODEL_STATE_DICT` and `OUTPUT`?



##########
sdks/python/apache_beam/examples/inference/pytorch_image_classification.py:
##########
@@ -136,8 +142,8 @@ def run(
 
   filename_value_pair = (
       pipeline
-      | 'ReadImageNames' >> beam.io.ReadFromText(
-          known_args.input, skip_header_lines=1)
+      | 'ReadImageNames' >> beam.io.ReadFromText(known_args.input)
+      | 'RemoveEmptyLines' >> beam.ParDo(filter_empty_lines)

Review Comment:
   ```suggestion
         | 'FilterEmptyLines' >> beam.ParDo(filter_empty_lines)
   ```



##########
sdks/python/apache_beam/examples/inference/pytorch_image_segmentation.py:
##########
@@ -225,8 +231,8 @@ def run(
 
   filename_value_pair = (
       pipeline
-      | 'ReadImageNames' >> beam.io.ReadFromText(
-          known_args.input, skip_header_lines=1)
+      | 'ReadImageNames' >> beam.io.ReadFromText(known_args.input)
+      | 'RemoveEmptyLines' >> beam.ParDo(filter_empty_lines)

Review Comment:
   ```suggestion
         | 'FilterEmptyLines' >> beam.ParDo(filter_empty_lines)
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -171,16 +175,14 @@ The pipeline reads sentences, performs basic preprocessing to convert the last w
 
 To use this transform, you need a dataset and model for language modeling.
 
-1. Download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. You must already have `transformers` installed.
+1. Download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. You must already have `transformers` installed, then from a python shell run:

Review Comment:
   ```suggestion
   1. Download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. You must already have `transformers` installed, then from a Python shell run:
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -94,10 +92,12 @@ python -m apache_beam.examples.inference.pytorch_image_classification \
   --output OUTPUT \
   --model_state_dict_path MODEL_STATE_DICT
 ```
-For example:
+`images_dir` is only needed if your `IMAGE_FILE_NAMES` file contains relative paths (they will be relative from `IMAGES_DIR`).

Review Comment:
   ```suggestion
   `images_dir` is only needed if your `IMAGE_FILE_NAMES.txt` file contains relative paths (they will be relative from `IMAGES_DIR`).
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -145,10 +147,12 @@ python -m apache_beam.examples.inference.pytorch_image_segmentation \
   --output OUTPUT \
   --model_state_dict_path MODEL_STATE_DICT
 ```
-For example:
+`images_dir` is only needed if your `IMAGE_FILE_NAMES` file contains relative paths (they will be relative from `IMAGES_DIR`).

Review Comment:
   ```suggestion
   `images_dir` is only needed if your `IMAGE_FILE_NAMES.txt` file contains relative paths (they will be relative from `IMAGES_DIR`).
   ```



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -171,16 +175,14 @@ The pipeline reads sentences, performs basic preprocessing to convert the last w
 
 To use this transform, you need a dataset and model for language modeling.
 
-1. Download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. You must already have `transformers` installed.
+1. Download the [BertForMaskedLM](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForMaskedLM) model from Hugging Face's repository of pretrained models. You must already have `transformers` installed, then from a python shell run:
 ```
 import torch
 from transformers import BertForMaskedLM
 model = BertForMaskedLM.from_pretrained('bert-base-uncased', return_dict=True)
-torch.save(model.state_dict(), 'BertForMaskedLM.pth')
+torch.save(model.state_dict(), 'BertForMaskedLM.pth') # You can replace BertForMaskedLM.pth with your preferred file name for your model state dictionary.
 ```
-2. Create a file named `MODEL_STATE_DICT` that contains the saved parameters of the `BertForMaskedLM` model.

Review Comment:
   Why did you remove comments about `MODEL_STATE_DICT` and `OUTPUT`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org