You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/04/26 18:45:26 UTC

[GitHub] [beam] robertwb opened a new pull request, #17465: Add element weighting parameter to BatchElements.

robertwb opened a new pull request, #17465:
URL: https://github.com/apache/beam/pull/17465

   This is useful when various elements have largely disparate weights.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb merged pull request #17465: Add element weighting parameter to BatchElements.

Posted by GitBox <gi...@apache.org>.
robertwb merged PR #17465:
URL: https://github.com/apache/beam/pull/17465


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #17465: Add element weighting parameter to BatchElements.

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #17465:
URL: https://github.com/apache/beam/pull/17465#issuecomment-1110161079

   # [Codecov](https://codecov.io/gh/apache/beam/pull/17465?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#17465](https://codecov.io/gh/apache/beam/pull/17465?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (24da9fe) into [master](https://codecov.io/gh/apache/beam/commit/a936ef889859f0c93682c2480a68487ae53c03fe?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a936ef8) will **increase** coverage by `0.00%`.
   > The diff coverage is `91.17%`.
   
   ```diff
   @@           Coverage Diff           @@
   ##           master   #17465   +/-   ##
   =======================================
     Coverage   73.92%   73.92%           
   =======================================
     Files         689      689           
     Lines       90397    90409   +12     
   =======================================
   + Hits        66825    66839   +14     
   + Misses      22388    22386    -2     
     Partials     1184     1184           
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `83.65% <91.17%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/17465?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/util.py](https://codecov.io/gh/apache/beam/pull/17465/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy91dGlsLnB5) | `96.06% <91.17%> (+0.08%)` | :arrow_up: |
   | [.../python/apache\_beam/transforms/periodicsequence.py](https://codecov.io/gh/apache/beam/pull/17465/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9wZXJpb2RpY3NlcXVlbmNlLnB5) | `96.72% <0.00%> (-1.64%)` | :arrow_down: |
   | [sdks/python/apache\_beam/ml/inference/api.py](https://codecov.io/gh/apache/beam/pull/17465/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vbWwvaW5mZXJlbmNlL2FwaS5weQ==) | `90.00% <0.00%> (-0.48%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/17465/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `93.51% <0.00%> (-0.13%)` | :arrow_down: |
   | [...eam/runners/interactive/interactive\_environment.py](https://codecov.io/gh/apache/beam/pull/17465/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9pbnRlcmFjdGl2ZV9lbnZpcm9ubWVudC5weQ==) | `90.49% <0.00%> (+0.30%)` | :arrow_up: |
   | [sdks/python/apache\_beam/internal/metrics/metric.py](https://codecov.io/gh/apache/beam/pull/17465/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW50ZXJuYWwvbWV0cmljcy9tZXRyaWMucHk=) | `93.00% <0.00%> (+1.00%)` | :arrow_up: |
   | [sdks/python/apache\_beam/utils/interactive\_utils.py](https://codecov.io/gh/apache/beam/pull/17465/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaW50ZXJhY3RpdmVfdXRpbHMucHk=) | `92.68% <0.00%> (+4.87%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/17465?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/17465?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a936ef8...24da9fe](https://codecov.io/gh/apache/beam/pull/17465?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #17465: Add element weighting parameter to BatchElements.

Posted by GitBox <gi...@apache.org>.
robertwb commented on PR #17465:
URL: https://github.com/apache/beam/pull/17465#issuecomment-1111562685

   Run Python PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #17465: Add element weighting parameter to BatchElements.

Posted by GitBox <gi...@apache.org>.
robertwb commented on PR #17465:
URL: https://github.com/apache/beam/pull/17465#issuecomment-1111562356

   Irrelevant failure in apache_beam.transforms.userstate_test.StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on a diff in pull request #17465: Add element weighting parameter to BatchElements.

Posted by GitBox <gi...@apache.org>.
robertwb commented on code in PR #17465:
URL: https://github.com/apache/beam/pull/17465#discussion_r861397527


##########
sdks/python/apache_beam/transforms/util.py:
##########
@@ -632,6 +649,8 @@ def __init__(
       max_batch_size=10000,
       target_batch_overhead=.05,
       target_batch_duration_secs=1,
+      *,

Review Comment:
   Forces these later ones to be keyword arguments (so their order doesn't matter).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on pull request #17465: Add element weighting parameter to BatchElements.

Posted by GitBox <gi...@apache.org>.
robertwb commented on PR #17465:
URL: https://github.com/apache/beam/pull/17465#issuecomment-1110133891

   R: @ryanthompson591


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ryanthompson591 commented on a diff in pull request #17465: Add element weighting parameter to BatchElements.

Posted by GitBox <gi...@apache.org>.
ryanthompson591 commented on code in PR #17465:
URL: https://github.com/apache/beam/pull/17465#discussion_r860311591


##########
sdks/python/apache_beam/transforms/util.py:
##########
@@ -632,6 +649,8 @@ def __init__(
       max_batch_size=10000,
       target_batch_overhead=.05,
       target_batch_duration_secs=1,
+      *,

Review Comment:
   why * here?



##########
sdks/python/apache_beam/transforms/util_test.py:
##########
@@ -228,6 +228,16 @@ def test_windowed_batches(self):
               7,  # elements in [30, 47)
           ]))
 
+  def test_sized_batches(self):

Review Comment:
   I think when someone looks at this in the future and says, "why is this here", you could illustrate it better with a test like this (I'm not sure if this works but you get the point).
   
   def test_element_size_fn()
   """ Tests passing different sized arrays as elements and then weighting them by how large the arrays are. Batch sizes will depend on the size of the arrays within."""
   ...
   | beam.Create([[1]*1, [1] *1, [10] *10, [6] * 6 ...], reshuffle=False)
   | util.BatchElements(min_batch_size =10, max_batch_size=10, element_size_fn=lambda x: len(x))
   | beam.Map(len)
   assert_that(res, equal_to([12, 11, 10]))
   ...
   
   I think that might just make it clearer to the future reader as to why this feature might exist.  Feel free to take or ignore this suggestion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] robertwb commented on a diff in pull request #17465: Add element weighting parameter to BatchElements.

Posted by GitBox <gi...@apache.org>.
robertwb commented on code in PR #17465:
URL: https://github.com/apache/beam/pull/17465#discussion_r861397729


##########
sdks/python/apache_beam/transforms/util_test.py:
##########
@@ -228,6 +228,16 @@ def test_windowed_batches(self):
               7,  # elements in [30, 47)
           ]))
 
+  def test_sized_batches(self):

Review Comment:
   Good idea. The mix of `[]`'s and `*`'s are a bit hard for me to read, so I'm using strings.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org