You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/08 16:21:19 UTC
[GitHub] [beam] TheNeuralBit opened a new pull request, #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
TheNeuralBit opened a new pull request, #22198:
URL: https://github.com/apache/beam/pull/22198
Fixes #22197
This updates `DoFn.default_type_hints` to inspect the output type annotation for the appropriate method (process, or process_batch) depending on the yields_batches/yields_elements decorators.
GitHub Actions Tests Status (on master branch)
------------------------------------------------------------------------------------------------
[![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
[![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
[![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on PR #22198:
URL: https://github.com/apache/beam/pull/22198#issuecomment-1179314506
Run Python PreCommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on PR #22198:
URL: https://github.com/apache/beam/pull/22198#issuecomment-1192885732
assign to next reviewer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] codecov[bot] commented on pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #22198:
URL: https://github.com/apache/beam/pull/22198#issuecomment-1179198123
# [Codecov](https://codecov.io/gh/apache/beam/pull/22198?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#22198](https://codecov.io/gh/apache/beam/pull/22198?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5d146d5) into [master](https://codecov.io/gh/apache/beam/commit/d44c0440bc91f8fd63dcd082c2acf50b40e7af1b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d44c044) will **decrease** coverage by `0.00%`.
> The diff coverage is `92.85%`.
```diff
@@ Coverage Diff @@
## master #22198 +/- ##
==========================================
- Coverage 74.22% 74.22% -0.01%
==========================================
Files 702 702
Lines 92937 92951 +14
==========================================
+ Hits 68985 68990 +5
- Misses 22685 22694 +9
Partials 1267 1267
```
| Flag | Coverage Δ | |
|---|---|---|
| python | `83.59% <92.85%> (-0.02%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/beam/pull/22198?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [sdks/python/apache\_beam/transforms/core.py](https://codecov.io/gh/apache/beam/pull/22198/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy9jb3JlLnB5) | `92.62% <90.00%> (+0.05%)` | :arrow_up: |
| [sdks/python/apache\_beam/typehints/decorators.py](https://codecov.io/gh/apache/beam/pull/22198/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHlwZWhpbnRzL2RlY29yYXRvcnMucHk=) | `92.81% <100.00%> (+0.08%)` | :arrow_up: |
| [sdks/python/apache\_beam/io/source\_test\_utils.py](https://codecov.io/gh/apache/beam/pull/22198/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vc291cmNlX3Rlc3RfdXRpbHMucHk=) | `88.01% <0.00%> (-1.39%)` | :arrow_down: |
| [...python/apache\_beam/runners/worker/worker\_status.py](https://codecov.io/gh/apache/beam/pull/22198/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvd29ya2VyX3N0YXR1cy5weQ==) | `77.53% <0.00%> (-0.73%)` | :arrow_down: |
| [...ks/python/apache\_beam/runners/worker/sdk\_worker.py](https://codecov.io/gh/apache/beam/pull/22198/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvc2RrX3dvcmtlci5weQ==) | `88.46% <0.00%> (-0.48%)` | :arrow_down: |
| [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/22198/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `93.30% <0.00%> (-0.38%)` | :arrow_down: |
| [...eam/runners/interactive/interactive\_environment.py](https://codecov.io/gh/apache/beam/pull/22198/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9pbnRlcmFjdGl2ZV9lbnZpcm9ubWVudC5weQ==) | `92.02% <0.00%> (+0.30%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/22198?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/22198?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [d44c044...5d146d5](https://codecov.io/gh/apache/beam/pull/22198?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] TheNeuralBit merged pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
TheNeuralBit merged PR #22198:
URL: https://github.com/apache/beam/pull/22198
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ryanthompson591 commented on a diff in pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
ryanthompson591 commented on code in PR #22198:
URL: https://github.com/apache/beam/pull/22198#discussion_r918404835
##########
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:
##########
@@ -357,13 +357,15 @@ def infer_output_type(self, input_type):
return np.int64
with self.create_pipeline() as p:
- res = (
+ pc = (
p
| beam.Create(np.array([1, 2, 3], dtype=np.int64)).with_output_types(
np.int64)
- | beam.ParDo(ArrayProduceDoFn())
- | beam.ParDo(ArrayMultiplyDoFn())
- | beam.Map(lambda x: x * 3))
+ | beam.ParDo(ArrayProduceDoFn()))
+
+ self.assertEqual(pc.element_type, np.int64)
Review Comment:
This is up to you, my rule of thumb is smaller concise tests for unit tests.
Like:
def test_pardo_maintains_type_hint():
with self.create_pipeline() as p:
pc = (
p
| beam.Create(np.array([1, 2, 3], dtype=np.int64)).with_output_types(
np.int64)
| beam.ParDo(ArrayProduceDoFn()))
self.assertEqual(pc.element_type, np.int64)
And then end the test there.
##########
sdks/python/apache_beam/transforms/batch_dofn_test.py:
##########
@@ -170,6 +173,29 @@ def process_batch(self, batch, *args, **kwargs):
yield [element * 2 for element in batch]
+class MismatchedBatchProducingDoFn(beam.DoFn):
+ """A DoFn that produces batches from both process and process_batch, with
+ mismatched types. Should yield a construction time error when applied."""
Review Comment:
took me three looks to realize the mismatch was one produces floats and the other ints. Maybe just slightly amend the comment like this:
...with mismatched types (one has floats the other ints)....
##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -708,6 +708,36 @@ def get_function_arguments(self, func):
def default_type_hints(self):
fn_type_hints = typehints.decorators.IOTypeHints.from_callable(self.process)
+ batch_fn_type_hints = typehints.decorators.IOTypeHints.from_callable(
+ self.process_batch)
+
+ if fn_type_hints is not None:
Review Comment:
This nested logic is hard to read and a little messy and hard to follow.
For example here and below we have "if fn_type_hints is not None:"
I had to write this a little to really wrap my head around it. I don't know if any parts of my rewrite are useful.
```
# default type hints can come from one of two places. The process type hints or the process batch type hints.
# also the process might be set up to yield batches or to yield elements
has_fn_type_hints = fn_type_hints is not None
has_batch_type_hints = batch_fn_type_hints is not None
yields_batches = self._process_yields_batches
yeilds_elements = self._process_yields_elements
if has_fn_type_hints:
if yields_elements and has_batch_type_hints:
# because both batch and element type hints are defined, raise an exception if they are not equal.
self.validate_batch_and_fn_typehints_same()
fn_type_hints = fn_type_hints.with_output_types_from(
batch_fn_type_hints) # TODO: comment why this is done.
elif yields_batches:
# Because the process method produces batches, don't use its output typehints.
fn_type_hints = fn_type_hints.with_output_types_from(
typehints.decorators.IOTypeHints.empty())
elif yields_elements and has_batch_type_hints:
# process_batch produces elements, grab its output typehint
fn_type_hints = batch_fn_type_hints.with_input_types_from(
typehints.decorators.IOTypeHints.empty()) # TODO comment why does this use with_input_types_from
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22198:
URL: https://github.com/apache/beam/pull/22198#issuecomment-1188977387
Reminder, please take a look at this pr: @ryanthompson591
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
tvalentyn commented on PR #22198:
URL: https://github.com/apache/beam/pull/22198#issuecomment-1192921702
I checked that `get_type_hints(object).with_defaults(beam.typehints.decorators.IOTypeHints.empty().strip_iterable())` is the same as `get_type_hints(object).with_defaults(None)`. Looks like this doesn't change evaluation for non-batching DoFn's, so shouldn't be a breaking change. But you can also try to import and run TGP if you think this needs more coverage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22198:
URL: https://github.com/apache/beam/pull/22198#issuecomment-1192895489
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on PR #22198:
URL: https://github.com/apache/beam/pull/22198#issuecomment-1179752377
Run Python PreCommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22198:
URL: https://github.com/apache/beam/pull/22198#issuecomment-1179266307
Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
R: @ryanthompson591 for label python.
Available commands:
- `stop reviewer notifications` - opt out of the automated review tooling
- `remind me after tests pass` - tag the comment author after tests pass
- `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
The PR bot will only process comments in the main thread (not review comments).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a diff in pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on code in PR #22198:
URL: https://github.com/apache/beam/pull/22198#discussion_r926132674
##########
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:
##########
@@ -357,13 +357,15 @@ def infer_output_type(self, input_type):
return np.int64
with self.create_pipeline() as p:
- res = (
+ pc = (
p
| beam.Create(np.array([1, 2, 3], dtype=np.int64)).with_output_types(
np.int64)
- | beam.ParDo(ArrayProduceDoFn())
- | beam.ParDo(ArrayMultiplyDoFn())
- | beam.Map(lambda x: x * 3))
+ | beam.ParDo(ArrayProduceDoFn()))
+
+ self.assertEqual(pc.element_type, np.int64)
Review Comment:
Fair point, I initially added the test here for expediency, but now I have the concise test in `batch_dofn_test`. I'll drop this change.
##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -708,6 +708,36 @@ def get_function_arguments(self, func):
def default_type_hints(self):
fn_type_hints = typehints.decorators.IOTypeHints.from_callable(self.process)
+ batch_fn_type_hints = typehints.decorators.IOTypeHints.from_callable(
+ self.process_batch)
+
+ if fn_type_hints is not None:
Review Comment:
yeah agreed this logic was not pretty. I think part of the issue is that it's has to deal with both `None`, and the possibility of `IOTypeHints.empty()` in both sets of typehints. I rewrote it to collapse those two possibilities, and also separated concerns a bit, first we deal with process_yields_batches, then deal with process_batch_yields_elements. WDYT?
Can you clarify the "TODO comment why does this use with_input_types_from"?
##########
sdks/python/apache_beam/transforms/batch_dofn_test.py:
##########
@@ -170,6 +173,29 @@ def process_batch(self, batch, *args, **kwargs):
yield [element * 2 for element in batch]
+class MismatchedBatchProducingDoFn(beam.DoFn):
+ """A DoFn that produces batches from both process and process_batch, with
+ mismatched types. Should yield a construction time error when applied."""
Review Comment:
Thanks, done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on a diff in pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on code in PR #22198:
URL: https://github.com/apache/beam/pull/22198#discussion_r927864870
##########
sdks/python/apache_beam/transforms/core.py:
##########
@@ -708,6 +708,36 @@ def get_function_arguments(self, func):
def default_type_hints(self):
fn_type_hints = typehints.decorators.IOTypeHints.from_callable(self.process)
+ batch_fn_type_hints = typehints.decorators.IOTypeHints.from_callable(
+ self.process_batch)
+
+ if fn_type_hints is not None:
Review Comment:
I was a little concerned rewriting this way could break some esoteric use-case, but I ran all our internal tests and it looks good.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on PR #22198:
URL: https://github.com/apache/beam/pull/22198#issuecomment-1193024431
Thanks for checking on that! I ran a TGP and everything looks good.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on PR #22198:
URL: https://github.com/apache/beam/pull/22198#issuecomment-1192894757
R: @tvalentyn could you take a look? It looks like Ryan will be out through next week and I'd like to get this in for the release cut.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #22198: Update element_type inference (default_type_hints) for batched DoFns with yields_batches/yields_elements
Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on PR #22198:
URL: https://github.com/apache/beam/pull/22198#issuecomment-1179169076
FYI @robertwb this fixes a bug in Batched DoFns. I'm OOO next week and would like to get this in before the release cut the following Wednesday.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org