You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (Jira)" <ji...@apache.org> on 2022/03/22 06:59:00 UTC

[jira] [Comment Edited] (BEAM-4132) Element type inference doesn't work for multi-output DoFns

    [ https://issues.apache.org/jira/browse/BEAM-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510241#comment-17510241 ] 

Valentyn Tymofieiev edited comment on BEAM-4132 at 3/22/22, 6:58 AM:
---------------------------------------------------------------------

This is not completely resolved. I am observing that the return type for DoFns yielding to multiple outputs includes beam.pvalue.TaggedOutput, instead of the type of the actual values.

This causes errors, for example, if the ParDo produces multiple pcollections of [K,V] pairs and a there is a GBK downstream. Since beam.pvalue.TaggedOutput cannot be coerced into [K, V], the type checking fails unless a user manually adds a type hint such as (for example):

def process(self, element) -> Iterator[Tuple[str, Any]]:
  ...  

Case in point: https://stackoverflow.com/questions/64563914/specify-type-of-a-taggedoutput-to-pass-through-groupbykey-as-a-part-of-combinep


was (Author: tvalentyn):
This is not completely resolved. I am observing that the return type for DoFns yielding to multiple outputs includes beam.pvalue.TaggedOutput, instead of the type of the actual values.

This causes errors, for example, if the ParDo produces multiple pcollections of [K,V] pairs and a there is a GBK downstream. Since beam.pvalue.TaggedOutput cannot be coerced into [K, V], the type checking fails unless a user manually adds a type hint such as (for example):

def process(self) -> Iterator[Tuple[str, Any]]:
  ...  

Case in point: https://stackoverflow.com/questions/64563914/specify-type-of-a-taggedoutput-to-pass-through-groupbykey-as-a-part-of-combinep

> Element type inference doesn't work for multi-output DoFns
> ----------------------------------------------------------
>
>                 Key: BEAM-4132
>                 URL: https://issues.apache.org/jira/browse/BEAM-4132
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.4.0
>            Reporter: Chuan Yu Foo
>            Priority: P3
>              Labels: types
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> TLDR: if you have a multi-output DoFn, then the non-main PCollections with incorrectly have their element types set to None. This affects type checking for pipelines involving these PCollections.
> Minimal example:
> {code}
> import apache_beam as beam
> class TripleDoFn(beam.DoFn):
>   def process(self, elem):
>     yield_elem
>     if elem % 2 == 0:
>       yield beam.pvalue.TaggedOutput('ten_times', elem * 10)
>     if elem % 3 == 0:
>       yield beam.pvalue.TaggedOutput('hundred_times', elem * 100)
>       
> @beam.typehints.with_input_types(int)
> @beam.typehints.with_output_types(int)
> class MultiplyBy(beam.DoFn):
>   def __init__(self, multiplier):
>     self._multiplier = multiplier
>   def process(self, elem):
>     return elem * self._multiplier
>   
> def main():
>   with beam.Pipeline() as p:
>     x, a, b = (
>       p
>       | 'Create' >> beam.Create([1, 2, 3])
>       | 'TripleDo' >> beam.ParDo(TripleDoFn()).with_outputs(
>         'ten_times', 'hundred_times', main='main_output'))
>     _ = a | 'MultiplyBy2' >> beam.ParDo(MultiplyBy(2))
> if __name__ == '__main__':
>   main()    
> {code}
> Running this yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 'MultiplyBy2': requires <type 'int'> but got None for elem
> {noformat}
> Replacing {{a}} with {{b}} yields the same error. Replacing {{a}} with {{x}} instead yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 'MultiplyBy2': requires <type 'int'> but got Union[TaggedOutput, int] for elem
> {noformat}
> I would expect Beam to correctly infer that {{a}} and {{b}} have element types of {{int}} rather than {{None}}, and I would also expect Beam to correctly figure out that the element types of {{x}} are compatible with {{int}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)