You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/15 18:19:24 UTC

[GitHub] [beam] tvalentyn commented on a diff in pull request #22069: Reviewing the RunInference ReadMe file for clarity.

tvalentyn commented on code in PR #22069:
URL: https://github.com/apache/beam/pull/22069#discussion_r922416093


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -218,16 +228,19 @@ is the word that the model predicts for the mask.
 The pipeline reads rows of pixels corresponding to a digit, performs basic preprocessing, passes the pixels to the Scikit-learn implementation of RunInference, and then writes the predictions to a text file.
 
 ### Dataset and model for language modeling
-- **Required**: A path to a file called `INPUT` that contains label and pixels to feed into the model. Each row should have elements that are comma-separated. The first element is the label. All subsuequent elements would be pixel values. It should look something like this:
+
+To use this transform, you need a dataset and model for language modeling.
+
+1. Create a file named `INPUT` that contains labels and pixels to feed into the model. Each row should have comma-separated elements. The first element is the label. All other elements are pixel values. The content of the file should be similar to the following example:
 ```
 1,0,0,0...
 0,0,0,0...
 1,0,0,0...
 4,0,0,0...
 ...
 ```
-- **Required**: A path to a file called `OUTPUT`, to which the pipeline will write the predictions.
-- **Required**: A path to a file called `MODEL_PATH` that contains the pickled file of a scikit-learn model trained on MNIST data. Please refer to this scikit-learn [documentation](https://scikit-learn.org/stable/model_persistence.html) on how to serialize models.
+2. Create a file named `OUTPUT`. This file is used by the pipeline to write the predictions.

Review Comment:
   Doesn't pipeline create output files?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org