You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2020/09/01 17:07:06 UTC
[jira] [Commented] (BEAM-4132) Element type inference doesn't work
for multi-output DoFns
[ https://issues.apache.org/jira/browse/BEAM-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188672#comment-17188672 ]
Beam JIRA Bot commented on BEAM-4132:
-------------------------------------
This issue was marked "stale-P2" and has not received a public comment in 14 days. It is now automatically moved to P3. If you are still affected by it, you can comment and move it back to P2.
> Element type inference doesn't work for multi-output DoFns
> ----------------------------------------------------------
>
> Key: BEAM-4132
> URL: https://issues.apache.org/jira/browse/BEAM-4132
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Affects Versions: 2.4.0
> Reporter: Chuan Yu Foo
> Priority: P3
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> TLDR: if you have a multi-output DoFn, then the non-main PCollections with incorrectly have their element types set to None. This affects type checking for pipelines involving these PCollections.
> Minimal example:
> {code}
> import apache_beam as beam
> class TripleDoFn(beam.DoFn):
> def process(self, elem):
> yield_elem
> if elem % 2 == 0:
> yield beam.pvalue.TaggedOutput('ten_times', elem * 10)
> if elem % 3 == 0:
> yield beam.pvalue.TaggedOutput('hundred_times', elem * 100)
>
> @beam.typehints.with_input_types(int)
> @beam.typehints.with_output_types(int)
> class MultiplyBy(beam.DoFn):
> def __init__(self, multiplier):
> self._multiplier = multiplier
> def process(self, elem):
> return elem * self._multiplier
>
> def main():
> with beam.Pipeline() as p:
> x, a, b = (
> p
> | 'Create' >> beam.Create([1, 2, 3])
> | 'TripleDo' >> beam.ParDo(TripleDoFn()).with_outputs(
> 'ten_times', 'hundred_times', main='main_output'))
> _ = a | 'MultiplyBy2' >> beam.ParDo(MultiplyBy(2))
> if __name__ == '__main__':
> main()
> {code}
> Running this yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 'MultiplyBy2': requires <type 'int'> but got None for elem
> {noformat}
> Replacing {{a}} with {{b}}Â yields the same error. Replacing {{a}} with {{x}} instead yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 'MultiplyBy2': requires <type 'int'> but got Union[TaggedOutput, int] for elem
> {noformat}
> I would expect Beam to correctly infer that {{a}} and {{b}} have element types of {{int}} rather than {{None}}, and I would also expect Beam to correctly figure out that the element types of {{x}} are compatible with {{int}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)