You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by "Joshua B. Harrison" <jo...@gmail.com> on 2020/03/27 05:05:02 UTC
Type hints do not work on multi-output PTransforms?
Hello all,
I am working on adding type hints to my pipeline, and ran into an issue
with PTransforms that produce multiple, tagged outputs.
My class looks like this:
@with_input_types(mytype.Data)
> @with_output_types(mytype.KeyedData)
> class DenormalizeData(ptransform.PTransform):
> MAIN = 'denormalized'
> SKIPPED = functions.DenormalizeData.SKIPPED
> def expand(self, pcol: mytype.Data) -> mytype.KeyedPriceData:
> return (pcol
> | 'Denormalize PriceData' >> core.ParDo(
> functions.DenormalizeData()).with_outputs(
> self.SKIPPED, main=self.MAIN))
Where functions.DenormalizeData is a core.DoFn. From what I can tell, the
type checking code here at
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L429
attempts
to access the pvalue._element_type. But in this case, the pvalue is a
DoOutputsTuple (
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pvalue.py#L239)
which overrides __getattr__ to check for tag names. In this case,
_element_type is not a valid tag, and I get the following partial stack:
"apache_beam_2_17_0/apache_beam/transforms/ptransform.py", line 401, in
> type_check_inputs_or_outputs
> if pvalue_.element_type is None:
> File "apache_beam_2_17_0/apache_beam/pvalue.py", line 241, in __getattr__
> return self[tag]
> File "apache_beam_2_17_0/apache_beam/pvalue.py", line 256, in __getitem__
> tag, self._main_tag, self._tags))
> ValueError: Tag 'element_type' is neither the main tag 'denormalized' nor
> any of the tags ('skipped',)
Is my diagnoses correct? Is this a known issue? Can type hints exist on
DoOutputsTuples?
Thank you for your time and help.
Best,
Joshua
--
Joshua Harrison | Software Engineer | joshharrison@gmail.com
<jo...@google.com> | 404-433-0242
Re: Type hints do not work on multi-output PTransforms?
Posted by "Joshua B. Harrison" <jo...@gmail.com>.
Sounds good - thank you.
On Mon, Mar 30, 2020 at 11:54 AM Robert Bradshaw <ro...@google.com>
wrote:
> On Mon, Mar 30, 2020 at 10:40 AM Joshua B. Harrison <
> josh.harrison@gmail.com> wrote:
>
>> Thank you for getting back to me. I would be happy to help contribute -
>> has there been any discussion around this issue before?
>>
>
> Udi has been pushing the type annotation work forward lately, though I
> don't know that he's looked into the multi-output much, if at all. It'd be
> great if you could contribute!
>
>
>> At the least, I think it be preferable to raise a not implemented error
>> in Python when encountering this case.
>>
>
> Agreed.
>
>
>> It seems like multi-input for CoGroupByKey is represented as a Union of
>> all the component collection types. Would it make sense to do the same for
>> the output types? Is this a better discussion for the dev group?
>>
>
> +1 to taking this to the dev group.
>
>
>> Thanks again for your time and help.
>>
>> Best,
>> Joshua
>>
>> On Mon, Mar 30, 2020 at 11:22 AM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> That is correct, type hints unfortunately are not yet supported for
>>> multiple-output PTransforms.
>>>
>>> On Thu, Mar 26, 2020 at 10:05 PM Joshua B. Harrison <
>>> josh.harrison@gmail.com> wrote:
>>>
>>>> Hello all,
>>>>
>>>> I am working on adding type hints to my pipeline, and ran into an issue
>>>> with PTransforms that produce multiple, tagged outputs.
>>>>
>>>> My class looks like this:
>>>>
>>>> @with_input_types(mytype.Data)
>>>>> @with_output_types(mytype.KeyedData)
>>>>> class DenormalizeData(ptransform.PTransform):
>>>>> MAIN = 'denormalized'
>>>>> SKIPPED = functions.DenormalizeData.SKIPPED
>>>>> def expand(self, pcol: mytype.Data) -> mytype.KeyedPriceData:
>>>>> return (pcol
>>>>> | 'Denormalize PriceData' >> core.ParDo(
>>>>> functions.DenormalizeData()).with_outputs(
>>>>> self.SKIPPED, main=self.MAIN))
>>>>
>>>>
>>>> Where functions.DenormalizeData is a core.DoFn. From what I can tell,
>>>> the type checking code here at
>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L429 attempts
>>>> to access the pvalue._element_type. But in this case, the pvalue is a
>>>> DoOutputsTuple (
>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pvalue.py#L239)
>>>> which overrides __getattr__ to check for tag names. In this case,
>>>> _element_type is not a valid tag, and I get the following partial stack:
>>>>
>>>> "apache_beam_2_17_0/apache_beam/transforms/ptransform.py", line 401, in
>>>>> type_check_inputs_or_outputs
>>>>> if pvalue_.element_type is None:
>>>>> File "apache_beam_2_17_0/apache_beam/pvalue.py", line 241, in
>>>>> __getattr__
>>>>> return self[tag]
>>>>> File "apache_beam_2_17_0/apache_beam/pvalue.py", line 256, in
>>>>> __getitem__
>>>>> tag, self._main_tag, self._tags))
>>>>> ValueError: Tag 'element_type' is neither the main tag 'denormalized'
>>>>> nor any of the tags ('skipped',)
>>>>
>>>>
>>>> Is my diagnoses correct? Is this a known issue? Can type hints exist on
>>>> DoOutputsTuples?
>>>>
>>>> Thank you for your time and help.
>>>>
>>>> Best,
>>>> Joshua
>>>>
>>>> --
>>>> Joshua Harrison | Software Engineer | joshharrison@gmail.com
>>>> <jo...@google.com> | 404-433-0242 <(404)%20433-0242>
>>>>
>>>
>>
>> --
>> Joshua Harrison | Software Engineer | joshharrison@gmail.com
>> <jo...@google.com> | 404-433-0242 <(404)%20433-0242>
>>
>
--
Joshua Harrison | Software Engineer | joshharrison@gmail.com
<jo...@google.com> | 404-433-0242
Re: Type hints do not work on multi-output PTransforms?
Posted by Robert Bradshaw <ro...@google.com>.
On Mon, Mar 30, 2020 at 10:40 AM Joshua B. Harrison <jo...@gmail.com>
wrote:
> Thank you for getting back to me. I would be happy to help contribute -
> has there been any discussion around this issue before?
>
Udi has been pushing the type annotation work forward lately, though I
don't know that he's looked into the multi-output much, if at all. It'd be
great if you could contribute!
> At the least, I think it be preferable to raise a not implemented error in
> Python when encountering this case.
>
Agreed.
> It seems like multi-input for CoGroupByKey is represented as a Union of
> all the component collection types. Would it make sense to do the same for
> the output types? Is this a better discussion for the dev group?
>
+1 to taking this to the dev group.
> Thanks again for your time and help.
>
> Best,
> Joshua
>
> On Mon, Mar 30, 2020 at 11:22 AM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> That is correct, type hints unfortunately are not yet supported for
>> multiple-output PTransforms.
>>
>> On Thu, Mar 26, 2020 at 10:05 PM Joshua B. Harrison <
>> josh.harrison@gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> I am working on adding type hints to my pipeline, and ran into an issue
>>> with PTransforms that produce multiple, tagged outputs.
>>>
>>> My class looks like this:
>>>
>>> @with_input_types(mytype.Data)
>>>> @with_output_types(mytype.KeyedData)
>>>> class DenormalizeData(ptransform.PTransform):
>>>> MAIN = 'denormalized'
>>>> SKIPPED = functions.DenormalizeData.SKIPPED
>>>> def expand(self, pcol: mytype.Data) -> mytype.KeyedPriceData:
>>>> return (pcol
>>>> | 'Denormalize PriceData' >> core.ParDo(
>>>> functions.DenormalizeData()).with_outputs(
>>>> self.SKIPPED, main=self.MAIN))
>>>
>>>
>>> Where functions.DenormalizeData is a core.DoFn. From what I can tell,
>>> the type checking code here at
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L429 attempts
>>> to access the pvalue._element_type. But in this case, the pvalue is a
>>> DoOutputsTuple (
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pvalue.py#L239)
>>> which overrides __getattr__ to check for tag names. In this case,
>>> _element_type is not a valid tag, and I get the following partial stack:
>>>
>>> "apache_beam_2_17_0/apache_beam/transforms/ptransform.py", line 401, in
>>>> type_check_inputs_or_outputs
>>>> if pvalue_.element_type is None:
>>>> File "apache_beam_2_17_0/apache_beam/pvalue.py", line 241, in
>>>> __getattr__
>>>> return self[tag]
>>>> File "apache_beam_2_17_0/apache_beam/pvalue.py", line 256, in
>>>> __getitem__
>>>> tag, self._main_tag, self._tags))
>>>> ValueError: Tag 'element_type' is neither the main tag 'denormalized'
>>>> nor any of the tags ('skipped',)
>>>
>>>
>>> Is my diagnoses correct? Is this a known issue? Can type hints exist on
>>> DoOutputsTuples?
>>>
>>> Thank you for your time and help.
>>>
>>> Best,
>>> Joshua
>>>
>>> --
>>> Joshua Harrison | Software Engineer | joshharrison@gmail.com
>>> <jo...@google.com> | 404-433-0242 <(404)%20433-0242>
>>>
>>
>
> --
> Joshua Harrison | Software Engineer | joshharrison@gmail.com
> <jo...@google.com> | 404-433-0242 <(404)%20433-0242>
>
Re: Type hints do not work on multi-output PTransforms?
Posted by "Joshua B. Harrison" <jo...@gmail.com>.
Thank you for getting back to me. I would be happy to help contribute - has
there been any discussion around this issue before?
At the least, I think it be preferable to raise a not implemented error in
Python when encountering this case.
It seems like multi-input for CoGroupByKey is represented as a Union of all
the component collection types. Would it make sense to do the same for the
output types? Is this a better discussion for the dev group?
Thanks again for your time and help.
Best,
Joshua
On Mon, Mar 30, 2020 at 11:22 AM Robert Bradshaw <ro...@google.com>
wrote:
> That is correct, type hints unfortunately are not yet supported for
> multiple-output PTransforms.
>
> On Thu, Mar 26, 2020 at 10:05 PM Joshua B. Harrison <
> josh.harrison@gmail.com> wrote:
>
>> Hello all,
>>
>> I am working on adding type hints to my pipeline, and ran into an issue
>> with PTransforms that produce multiple, tagged outputs.
>>
>> My class looks like this:
>>
>> @with_input_types(mytype.Data)
>>> @with_output_types(mytype.KeyedData)
>>> class DenormalizeData(ptransform.PTransform):
>>> MAIN = 'denormalized'
>>> SKIPPED = functions.DenormalizeData.SKIPPED
>>> def expand(self, pcol: mytype.Data) -> mytype.KeyedPriceData:
>>> return (pcol
>>> | 'Denormalize PriceData' >> core.ParDo(
>>> functions.DenormalizeData()).with_outputs(
>>> self.SKIPPED, main=self.MAIN))
>>
>>
>> Where functions.DenormalizeData is a core.DoFn. From what I can tell, the
>> type checking code here at
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L429 attempts
>> to access the pvalue._element_type. But in this case, the pvalue is a
>> DoOutputsTuple (
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pvalue.py#L239)
>> which overrides __getattr__ to check for tag names. In this case,
>> _element_type is not a valid tag, and I get the following partial stack:
>>
>> "apache_beam_2_17_0/apache_beam/transforms/ptransform.py", line 401, in
>>> type_check_inputs_or_outputs
>>> if pvalue_.element_type is None:
>>> File "apache_beam_2_17_0/apache_beam/pvalue.py", line 241, in
>>> __getattr__
>>> return self[tag]
>>> File "apache_beam_2_17_0/apache_beam/pvalue.py", line 256, in
>>> __getitem__
>>> tag, self._main_tag, self._tags))
>>> ValueError: Tag 'element_type' is neither the main tag 'denormalized'
>>> nor any of the tags ('skipped',)
>>
>>
>> Is my diagnoses correct? Is this a known issue? Can type hints exist on
>> DoOutputsTuples?
>>
>> Thank you for your time and help.
>>
>> Best,
>> Joshua
>>
>> --
>> Joshua Harrison | Software Engineer | joshharrison@gmail.com
>> <jo...@google.com> | 404-433-0242 <(404)%20433-0242>
>>
>
--
Joshua Harrison | Software Engineer | joshharrison@gmail.com
<jo...@google.com> | 404-433-0242
Re: Type hints do not work on multi-output PTransforms?
Posted by Robert Bradshaw <ro...@google.com>.
That is correct, type hints unfortunately are not yet supported for
multiple-output PTransforms.
On Thu, Mar 26, 2020 at 10:05 PM Joshua B. Harrison <jo...@gmail.com>
wrote:
> Hello all,
>
> I am working on adding type hints to my pipeline, and ran into an issue
> with PTransforms that produce multiple, tagged outputs.
>
> My class looks like this:
>
> @with_input_types(mytype.Data)
>> @with_output_types(mytype.KeyedData)
>> class DenormalizeData(ptransform.PTransform):
>> MAIN = 'denormalized'
>> SKIPPED = functions.DenormalizeData.SKIPPED
>> def expand(self, pcol: mytype.Data) -> mytype.KeyedPriceData:
>> return (pcol
>> | 'Denormalize PriceData' >> core.ParDo(
>> functions.DenormalizeData()).with_outputs(
>> self.SKIPPED, main=self.MAIN))
>
>
> Where functions.DenormalizeData is a core.DoFn. From what I can tell, the
> type checking code here at
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/ptransform.py#L429 attempts
> to access the pvalue._element_type. But in this case, the pvalue is a
> DoOutputsTuple (
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pvalue.py#L239)
> which overrides __getattr__ to check for tag names. In this case,
> _element_type is not a valid tag, and I get the following partial stack:
>
> "apache_beam_2_17_0/apache_beam/transforms/ptransform.py", line 401, in
>> type_check_inputs_or_outputs
>> if pvalue_.element_type is None:
>> File "apache_beam_2_17_0/apache_beam/pvalue.py", line 241, in
>> __getattr__
>> return self[tag]
>> File "apache_beam_2_17_0/apache_beam/pvalue.py", line 256, in
>> __getitem__
>> tag, self._main_tag, self._tags))
>> ValueError: Tag 'element_type' is neither the main tag 'denormalized' nor
>> any of the tags ('skipped',)
>
>
> Is my diagnoses correct? Is this a known issue? Can type hints exist on
> DoOutputsTuples?
>
> Thank you for your time and help.
>
> Best,
> Joshua
>
> --
> Joshua Harrison | Software Engineer | joshharrison@gmail.com
> <jo...@google.com> | 404-433-0242 <(404)%20433-0242>
>