You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Stephen Dewey <st...@gmail.com> on 2021/02/24 22:27:32 UTC

inconsistency found in DirectRunner API (arg should be _UnwindowedValues but is not)

Hi, I am reporting a minor bug.

Based on this answer by Pablo: https://stackoverflow.com/a/42283279/783314

It appears that you want to always have an _UnwindowedValues in
DirectRunner whenever it exists in DataflowRunner, to provide consistency
between the two.

What I have noticed is that if you subclass beam.CombineFn in Python, the
accumulators received by the merge_accumulators method (as its argument)
will be _UnwindowedValues in DataflowRunner, but not in DirectRunner. This
leads to an error if somebody passes that value to, say, len(). The error
will be: TypeError: object of type '_UnwindowedValues' has no len()

Hope this helps!
Stephen

Re: inconsistency found in DirectRunner API (arg should be _UnwindowedValues but is not)

Posted by Robert Bradshaw <ro...@google.com>.
Thanks. I've filed https://issues.apache.org/jira/browse/BEAM-11882 .

If you want to take a stab at fixing it, you could try replacing the
arguemnt passed to merge_accumulators at
https://github.com/apache/beam/blob/release-2.28.0/sdks/python/apache_beam/transforms/combiners.py#L963
with a new object whose __iter__ method returns iter(accumulators) and
create a pull request.


On Wed, Feb 24, 2021 at 2:45 PM Stephen Dewey <st...@gmail.com>
wrote:

> Oh, I forgot to mention that I am using SDK 2.27.0 and Python 3.8
>
> On Wed, Feb 24, 2021 at 5:27 PM Stephen Dewey <st...@gmail.com>
> wrote:
>
>> Hi, I am reporting a minor bug.
>>
>> Based on this answer by Pablo:
>> https://stackoverflow.com/a/42283279/783314
>>
>> It appears that you want to always have an _UnwindowedValues in
>> DirectRunner whenever it exists in DataflowRunner, to provide consistency
>> between the two.
>>
>> What I have noticed is that if you subclass beam.CombineFn in Python,
>> the accumulators received by the merge_accumulators method (as its
>> argument) will be _UnwindowedValues in DataflowRunner, but not in
>> DirectRunner. This leads to an error if somebody passes that value to, say,
>> len(). The error will be: TypeError: object of type '_UnwindowedValues'
>> has no len()
>>
>> Hope this helps!
>> Stephen
>>
>

Re: inconsistency found in DirectRunner API (arg should be _UnwindowedValues but is not)

Posted by Stephen Dewey <st...@gmail.com>.
Oh, I forgot to mention that I am using SDK 2.27.0 and Python 3.8

On Wed, Feb 24, 2021 at 5:27 PM Stephen Dewey <st...@gmail.com>
wrote:

> Hi, I am reporting a minor bug.
>
> Based on this answer by Pablo: https://stackoverflow.com/a/42283279/783314
>
> It appears that you want to always have an _UnwindowedValues in
> DirectRunner whenever it exists in DataflowRunner, to provide consistency
> between the two.
>
> What I have noticed is that if you subclass beam.CombineFn in Python, the
> accumulators received by the merge_accumulators method (as its argument)
> will be _UnwindowedValues in DataflowRunner, but not in DirectRunner. This
> leads to an error if somebody passes that value to, say, len(). The error
> will be: TypeError: object of type '_UnwindowedValues' has no len()
>
> Hope this helps!
> Stephen
>