You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/15 15:07:52 UTC

[GitHub] [beam] yeandy opened a new pull request, #21887: Add README documentation for scikit-learn MNIST example

yeandy opened a new pull request, #21887:
URL: https://github.com/apache/beam/pull/21887

   Add README for scikit-learn example
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on a diff in pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
AnandInguva commented on code in PR #21887:
URL: https://github.com/apache/beam/pull/21887#discussion_r898117591


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -235,3 +235,54 @@ He looked up and saw the sun and stars .;moon
 Each line has data separated by a semicolon ";".
 The first item is the sentence with the last word masked. The second item
 is the word that the model predicts for the mask.
+
+---
+## MNITST digit classification
+[`sklearn_mnist_classification.py`](./sklearn_mnist_classification.py) contains
+an implementation for a RunInference pipeline that performs image classification on handwritten digits from the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) database.
+
+The pipeline reads rows of pixels corresponding to a digit, performs basic preprocessing, passes the pixels to the Scikit-learn implementation of RunInference, and then writes the predictions to a text file.
+
+### Dataset and model for language modeling
+- **Required**: A path to a file called `INPUT` that contains label and pixels to
+feed into the model. Each row should have elements that are comma-separated. The first element is the label. All subsuequent values are pixels from pixel0 to pixel784. It should look something like this:

Review Comment:
   It can be of any length.  the one you mentioned is of 28 x 28 image flattened. It could be 32 x 32 or 64 x 64 flattened as well. 
   
   We can say
   
   `All subsuequent elements would be pixel values.`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #21887:
URL: https://github.com/apache/beam/pull/21887#issuecomment-1156594264

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #21887:
URL: https://github.com/apache/beam/pull/21887#issuecomment-1156594260

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #21887:
URL: https://github.com/apache/beam/pull/21887#issuecomment-1156640193

   # [Codecov](https://codecov.io/gh/apache/beam/pull/21887?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#21887](https://codecov.io/gh/apache/beam/pull/21887?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (076ca93) into [master](https://codecov.io/gh/apache/beam/commit/b1a313e03807882e1bb178c689e4f935246d5534?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b1a313e) will **decrease** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #21887      +/-   ##
   ==========================================
   - Coverage   74.01%   74.00%   -0.01%     
   ==========================================
     Files         699      699              
     Lines       92675    92675              
   ==========================================
   - Hits        68592    68587       -5     
   - Misses      22828    22833       +5     
     Partials     1255     1255              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `83.65% <ø> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/21887?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/io/localfilesystem.py](https://codecov.io/gh/apache/beam/pull/21887/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vbG9jYWxmaWxlc3lzdGVtLnB5) | `90.97% <0.00%> (-0.76%)` | :arrow_down: |
   | [...eam/runners/portability/fn\_api\_runner/execution.py](https://codecov.io/gh/apache/beam/pull/21887/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9wb3J0YWJpbGl0eS9mbl9hcGlfcnVubmVyL2V4ZWN1dGlvbi5weQ==) | `92.44% <0.00%> (-0.65%)` | :arrow_down: |
   | [...ks/python/apache\_beam/runners/worker/sdk\_worker.py](https://codecov.io/gh/apache/beam/pull/21887/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvc2RrX3dvcmtlci5weQ==) | `88.94% <0.00%> (-0.16%)` | :arrow_down: |
   | [sdks/python/apache\_beam/transforms/util.py](https://codecov.io/gh/apache/beam/pull/21887/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy91dGlsLnB5) | `96.06% <0.00%> (-0.16%)` | :arrow_down: |
   | [...examples/inference/pytorch\_image\_classification.py](https://codecov.io/gh/apache/beam/pull/21887/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvaW5mZXJlbmNlL3B5dG9yY2hfaW1hZ2VfY2xhc3NpZmljYXRpb24ucHk=) | `0.00% <0.00%> (ø)` | |
   | [...python/apache\_beam/runners/worker/worker\_status.py](https://codecov.io/gh/apache/beam/pull/21887/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvd29ya2VyX3N0YXR1cy5weQ==) | `79.71% <0.00%> (+1.44%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/21887?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/21887?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [b1a313e...076ca93](https://codecov.io/gh/apache/beam/pull/21887?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
yeandy commented on PR #21887:
URL: https://github.com/apache/beam/pull/21887#issuecomment-1156594389

   R: @AnandInguva 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #21887:
URL: https://github.com/apache/beam/pull/21887#issuecomment-1156594267

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on a diff in pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
yeandy commented on code in PR #21887:
URL: https://github.com/apache/beam/pull/21887#discussion_r898180924


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -235,3 +235,54 @@ He looked up and saw the sun and stars .;moon
 Each line has data separated by a semicolon ";".
 The first item is the sentence with the last word masked. The second item
 is the word that the model predicts for the mask.
+
+---
+## MNITST digit classification
+[`sklearn_mnist_classification.py`](./sklearn_mnist_classification.py) contains
+an implementation for a RunInference pipeline that performs image classification on handwritten digits from the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) database.
+
+The pipeline reads rows of pixels corresponding to a digit, performs basic preprocessing, passes the pixels to the Scikit-learn implementation of RunInference, and then writes the predictions to a text file.
+
+### Dataset and model for language modeling
+- **Required**: A path to a file called `INPUT` that contains label and pixels to
+feed into the model. Each row should have elements that are comma-separated. The first element is the label. All subsuequent values are pixels from pixel0 to pixel784. It should look something like this:

Review Comment:
   Done.



##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -235,3 +235,54 @@ He looked up and saw the sun and stars .;moon
 Each line has data separated by a semicolon ";".
 The first item is the sentence with the last word masked. The second item
 is the word that the model predicts for the mask.
+
+---
+## MNITST digit classification
+[`sklearn_mnist_classification.py`](./sklearn_mnist_classification.py) contains
+an implementation for a RunInference pipeline that performs image classification on handwritten digits from the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) database.
+
+The pipeline reads rows of pixels corresponding to a digit, performs basic preprocessing, passes the pixels to the Scikit-learn implementation of RunInference, and then writes the predictions to a text file.
+
+### Dataset and model for language modeling
+- **Required**: A path to a file called `INPUT` that contains label and pixels to
+feed into the model. Each row should have elements that are comma-separated. The first element is the label. All subsuequent values are pixels from pixel0 to pixel784. It should look something like this:
+```
+1,0,0,0...
+0,0,0,0...
+1,0,0,0...
+4,0,0,0...
+...
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Required**: A path to a file called `MODEL_PATH` that contains the pickled file of a scikit-learn model trained on MNIST data. Please refer to this scikit-learn [documentation](https://scikit-learn.org/stable/model_persistence.html) on how to serialize models.
+
+
+### Running `sklearn_mnist_classification.py`
+
+To run the MNIST classification pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.sklearn_mnist_classification.py \
+  --input_file INPUT \
+  --output OUTPUT \
+  --model_path MODEL_PATH
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.sklearn_mnist_classification.py \
+  --input_file mnist_data.csv \
+  --output predictions.csv \
+  --model_path mnist_model_svm.pickle
+```
+
+This writes the output to the `predictions.csv` with contents like:

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
yeandy commented on PR #21887:
URL: https://github.com/apache/beam/pull/21887#issuecomment-1156603400

   R: @ryanthompson591 @tvalentyn @rezarokni @pcoet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #21887:
URL: https://github.com/apache/beam/pull/21887#issuecomment-1156594266

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #21887:
URL: https://github.com/apache/beam/pull/21887#issuecomment-1156594262

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on a diff in pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
AnandInguva commented on code in PR #21887:
URL: https://github.com/apache/beam/pull/21887#discussion_r898119092


##########
sdks/python/apache_beam/examples/inference/README.md:
##########
@@ -235,3 +235,54 @@ He looked up and saw the sun and stars .;moon
 Each line has data separated by a semicolon ";".
 The first item is the sentence with the last word masked. The second item
 is the word that the model predicts for the mask.
+
+---
+## MNITST digit classification
+[`sklearn_mnist_classification.py`](./sklearn_mnist_classification.py) contains
+an implementation for a RunInference pipeline that performs image classification on handwritten digits from the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) database.
+
+The pipeline reads rows of pixels corresponding to a digit, performs basic preprocessing, passes the pixels to the Scikit-learn implementation of RunInference, and then writes the predictions to a text file.
+
+### Dataset and model for language modeling
+- **Required**: A path to a file called `INPUT` that contains label and pixels to
+feed into the model. Each row should have elements that are comma-separated. The first element is the label. All subsuequent values are pixels from pixel0 to pixel784. It should look something like this:
+```
+1,0,0,0...
+0,0,0,0...
+1,0,0,0...
+4,0,0,0...
+...
+```
+- **Required**: A path to a file called `OUTPUT`, to which the pipeline will
+write the predictions.
+- **Required**: A path to a file called `MODEL_PATH` that contains the pickled file of a scikit-learn model trained on MNIST data. Please refer to this scikit-learn [documentation](https://scikit-learn.org/stable/model_persistence.html) on how to serialize models.
+
+
+### Running `sklearn_mnist_classification.py`
+
+To run the MNIST classification pipeline locally, use the following command:
+```sh
+python -m apache_beam.examples.inference.sklearn_mnist_classification.py \
+  --input_file INPUT \
+  --output OUTPUT \
+  --model_path MODEL_PATH
+```
+For example:
+```sh
+python -m apache_beam.examples.inference.sklearn_mnist_classification.py \
+  --input_file mnist_data.csv \
+  --output predictions.csv \
+  --model_path mnist_model_svm.pickle
+```
+
+This writes the output to the `predictions.csv` with contents like:

Review Comment:
   ```suggestion
   This writes the output to the `predictions.txt` with contents like:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
AnandInguva commented on PR #21887:
URL: https://github.com/apache/beam/pull/21887#issuecomment-1156810429

   LGTM! Thanks Andy.  
   
   Reference: Example PR https://github.com/apache/beam/pull/21781


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
yeandy commented on PR #21887:
URL: https://github.com/apache/beam/pull/21887#issuecomment-1156811322

   Unrelated (go licenses) failures. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn merged pull request #21887: Add README documentation for scikit-learn MNIST example

Posted by GitBox <gi...@apache.org>.
tvalentyn merged PR #21887:
URL: https://github.com/apache/beam/pull/21887


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org