You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Pablo Estrada <pa...@google.com> on 2019/04/19 20:58:13 UTC

Possible bug in accumulating triggers Python DirectRunner?

Hello all,
I've been slowly learning a bit about life in streaming, with state,
timers, triggers, etc.

The other day, I tried out a trigger pipeline that did not have the
behavior that I was expecting, and I am looking for feedback on whether I'm
missing something, or this is a bug.

Please take a look at this unit test:

https://github.com/apache/beam/pull/8364/files#diff-38fb631ae11ed485e2b99507e96ff9ffR451

Is the check correct that we would expect range [1, 6) to appear twice?
i.e. concat([1, 6), [1, 10]) ?

I have not tested this in other runners.
Thanks
-P.

Re: Possible bug in accumulating triggers Python DirectRunner?

Posted by Kenneth Knowles <ke...@apache.org>.
For example, my immediate suspicion with rather little to go on would be a
> versus >= issue in firing processing time triggers. Coincidentally still
showing up, as in https://github.com/apache/beam/pull/8366

If we had a portable runner with TestStream support, I would suggest using
it.

Kenn

On Fri, Apr 19, 2019 at 3:45 PM Kenneth Knowles <ke...@apache.org> wrote:

> What is the behavior you are seeing?
>
> Kenn
>
> On Fri, Apr 19, 2019 at 3:14 PM Ahmet Altay <al...@google.com> wrote:
>
>>
>>
>> On Fri, Apr 19, 2019 at 1:58 PM Pablo Estrada <pa...@google.com> wrote:
>>
>>> Hello all,
>>> I've been slowly learning a bit about life in streaming, with state,
>>> timers, triggers, etc.
>>>
>>> The other day, I tried out a trigger pipeline that did not have the
>>> behavior that I was expecting, and I am looking for feedback on whether I'm
>>> missing something, or this is a bug.
>>>
>>> Please take a look at this unit test:
>>>
>>>
>>> https://github.com/apache/beam/pull/8364/files#diff-38fb631ae11ed485e2b99507e96ff9ffR451
>>>
>>> Is the check correct that we would expect range [1, 6) to appear twice?
>>> i.e. concat([1, 6), [1, 10]) ?
>>>
>>
>> This is what I would expect. Your test code looks good to me. Could you
>> file an issue?
>>
>>
>>>
>>> I have not tested this in other runners.
>>> Thanks
>>> -P.
>>>
>>

Re: Possible bug in accumulating triggers Python DirectRunner?

Posted by Ahmet Altay <al...@google.com>.
I missed the lack of GBK.

assert_that would be the passert equivalent, but that has known issues in
streaming mode.

On Fri, Apr 19, 2019 at 4:12 PM Kenneth Knowles <ke...@apache.org> wrote:

> Oh, wait I didn't even read the pipeline well. You don't have a GBK so
> triggers don't do anything. They only apply to aggregations. Since it is
> just a ParDo the elements flow right through and your results are expected.
> If you did have a GBK then you would have this:
>
> Expected: [ ['1', '2', '3', '4', '5'], ['1', '2', '3', '4', '5', '6', '7',
> '8', '9', '10'] ]
> Actual: [ ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'] ]
>
> Where both outer lists are PCollections, hence could be reordered, and
> both inner lists are also in an undefined ordered. They have a pane index
> that says their logical order but they can be reordered. It is unusual and
> runner-dependent but best to check PCollection contents without
> order-dependence.
>
> Do you have PAssert for these sorts of checks?
>
> Kenn
>
> On Fri, Apr 19, 2019 at 4:02 PM Pablo Estrada <pa...@google.com> wrote:
>
>> created https://jira.apache.org/jira/browse/BEAM-7122
>> Best
>> -P.
>>
>> On Fri, Apr 19, 2019 at 3:50 PM Pablo Estrada <pa...@google.com> wrote:
>>
>>> Ah sorry for the lack of clarification. Each element appear only once in
>>> the final output. The failure is:
>>>
>>> ======================================================================
>>>> FAIL: test_multiple_accumulating_firings
>>>> (apache_beam.transforms.trigger_test.TriggerPipelineTest)
>>>> ----------------------------------------------------------------------
>>>> Traceback (most recent call last):
>>>>   File "apache_beam/transforms/trigger_test.py", line 491, in
>>>> test_multiple_accumulating_firings
>>>>     TriggerPipelineTest.all_records)
>>>> AssertionError: Lists differ: ['1', '2', '3', '4', '5', '1',... !=
>>>> ['1', '2', '3', '4', '5', '6',...
>>>>
>>> [...other output...]
>>>
>>> (expected is:)
>>>
>>>> - ['1', '2', '3', '4', '5', '1', '2', '3', '4', '5', '6', '7', '8',
>>>> '9', '10']
>>>> ?                           -------------------------
>>>>
>>> (actual is:)
>>>
>>>> + ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
>>>> ----------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>> On Fri, Apr 19, 2019 at 3:45 PM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>>> What is the behavior you are seeing?
>>>>
>>>> Kenn
>>>>
>>>> On Fri, Apr 19, 2019 at 3:14 PM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 19, 2019 at 1:58 PM Pablo Estrada <pa...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hello all,
>>>>>> I've been slowly learning a bit about life in streaming, with state,
>>>>>> timers, triggers, etc.
>>>>>>
>>>>>> The other day, I tried out a trigger pipeline that did not have the
>>>>>> behavior that I was expecting, and I am looking for feedback on whether I'm
>>>>>> missing something, or this is a bug.
>>>>>>
>>>>>> Please take a look at this unit test:
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/beam/pull/8364/files#diff-38fb631ae11ed485e2b99507e96ff9ffR451
>>>>>>
>>>>>> Is the check correct that we would expect range [1, 6) to appear
>>>>>> twice? i.e. concat([1, 6), [1, 10]) ?
>>>>>>
>>>>>
>>>>> This is what I would expect. Your test code looks good to me. Could
>>>>> you file an issue?
>>>>>
>>>>>
>>>>>>
>>>>>> I have not tested this in other runners.
>>>>>> Thanks
>>>>>> -P.
>>>>>>
>>>>>

Re: Possible bug in accumulating triggers Python DirectRunner?

Posted by Pablo Estrada <pa...@google.com>.
Aah that makes more sense... I'll try that out. Thanks!

On Fri, Apr 19, 2019 at 4:12 PM Kenneth Knowles <ke...@apache.org> wrote:

> Oh, wait I didn't even read the pipeline well. You don't have a GBK so
> triggers don't do anything. They only apply to aggregations. Since it is
> just a ParDo the elements flow right through and your results are expected.
> If you did have a GBK then you would have this:
>
> Expected: [ ['1', '2', '3', '4', '5'], ['1', '2', '3', '4', '5', '6', '7',
> '8', '9', '10'] ]
> Actual: [ ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'] ]
>
> Where both outer lists are PCollections, hence could be reordered, and
> both inner lists are also in an undefined ordered. They have a pane index
> that says their logical order but they can be reordered. It is unusual and
> runner-dependent but best to check PCollection contents without
> order-dependence.
>
> Do you have PAssert for these sorts of checks?
>
> Kenn
>
> On Fri, Apr 19, 2019 at 4:02 PM Pablo Estrada <pa...@google.com> wrote:
>
>> created https://jira.apache.org/jira/browse/BEAM-7122
>> Best
>> -P.
>>
>> On Fri, Apr 19, 2019 at 3:50 PM Pablo Estrada <pa...@google.com> wrote:
>>
>>> Ah sorry for the lack of clarification. Each element appear only once in
>>> the final output. The failure is:
>>>
>>> ======================================================================
>>>> FAIL: test_multiple_accumulating_firings
>>>> (apache_beam.transforms.trigger_test.TriggerPipelineTest)
>>>> ----------------------------------------------------------------------
>>>> Traceback (most recent call last):
>>>>   File "apache_beam/transforms/trigger_test.py", line 491, in
>>>> test_multiple_accumulating_firings
>>>>     TriggerPipelineTest.all_records)
>>>> AssertionError: Lists differ: ['1', '2', '3', '4', '5', '1',... !=
>>>> ['1', '2', '3', '4', '5', '6',...
>>>>
>>> [...other output...]
>>>
>>> (expected is:)
>>>
>>>> - ['1', '2', '3', '4', '5', '1', '2', '3', '4', '5', '6', '7', '8',
>>>> '9', '10']
>>>> ?                           -------------------------
>>>>
>>> (actual is:)
>>>
>>>> + ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
>>>> ----------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>> On Fri, Apr 19, 2019 at 3:45 PM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>>> What is the behavior you are seeing?
>>>>
>>>> Kenn
>>>>
>>>> On Fri, Apr 19, 2019 at 3:14 PM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 19, 2019 at 1:58 PM Pablo Estrada <pa...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hello all,
>>>>>> I've been slowly learning a bit about life in streaming, with state,
>>>>>> timers, triggers, etc.
>>>>>>
>>>>>> The other day, I tried out a trigger pipeline that did not have the
>>>>>> behavior that I was expecting, and I am looking for feedback on whether I'm
>>>>>> missing something, or this is a bug.
>>>>>>
>>>>>> Please take a look at this unit test:
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/beam/pull/8364/files#diff-38fb631ae11ed485e2b99507e96ff9ffR451
>>>>>>
>>>>>> Is the check correct that we would expect range [1, 6) to appear
>>>>>> twice? i.e. concat([1, 6), [1, 10]) ?
>>>>>>
>>>>>
>>>>> This is what I would expect. Your test code looks good to me. Could
>>>>> you file an issue?
>>>>>
>>>>>
>>>>>>
>>>>>> I have not tested this in other runners.
>>>>>> Thanks
>>>>>> -P.
>>>>>>
>>>>>

Re: Possible bug in accumulating triggers Python DirectRunner?

Posted by Kenneth Knowles <ke...@apache.org>.
Oh, wait I didn't even read the pipeline well. You don't have a GBK so
triggers don't do anything. They only apply to aggregations. Since it is
just a ParDo the elements flow right through and your results are expected.
If you did have a GBK then you would have this:

Expected: [ ['1', '2', '3', '4', '5'], ['1', '2', '3', '4', '5', '6', '7',
'8', '9', '10'] ]
Actual: [ ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'] ]

Where both outer lists are PCollections, hence could be reordered, and both
inner lists are also in an undefined ordered. They have a pane index that
says their logical order but they can be reordered. It is unusual and
runner-dependent but best to check PCollection contents without
order-dependence.

Do you have PAssert for these sorts of checks?

Kenn

On Fri, Apr 19, 2019 at 4:02 PM Pablo Estrada <pa...@google.com> wrote:

> created https://jira.apache.org/jira/browse/BEAM-7122
> Best
> -P.
>
> On Fri, Apr 19, 2019 at 3:50 PM Pablo Estrada <pa...@google.com> wrote:
>
>> Ah sorry for the lack of clarification. Each element appear only once in
>> the final output. The failure is:
>>
>> ======================================================================
>>> FAIL: test_multiple_accumulating_firings
>>> (apache_beam.transforms.trigger_test.TriggerPipelineTest)
>>> ----------------------------------------------------------------------
>>> Traceback (most recent call last):
>>>   File "apache_beam/transforms/trigger_test.py", line 491, in
>>> test_multiple_accumulating_firings
>>>     TriggerPipelineTest.all_records)
>>> AssertionError: Lists differ: ['1', '2', '3', '4', '5', '1',... != ['1',
>>> '2', '3', '4', '5', '6',...
>>>
>> [...other output...]
>>
>> (expected is:)
>>
>>> - ['1', '2', '3', '4', '5', '1', '2', '3', '4', '5', '6', '7', '8', '9',
>>> '10']
>>> ?                           -------------------------
>>>
>> (actual is:)
>>
>>> + ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
>>> ----------------------------------------------------------------------
>>
>>
>>
>>
>> On Fri, Apr 19, 2019 at 3:45 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> What is the behavior you are seeing?
>>>
>>> Kenn
>>>
>>> On Fri, Apr 19, 2019 at 3:14 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Apr 19, 2019 at 1:58 PM Pablo Estrada <pa...@google.com>
>>>> wrote:
>>>>
>>>>> Hello all,
>>>>> I've been slowly learning a bit about life in streaming, with state,
>>>>> timers, triggers, etc.
>>>>>
>>>>> The other day, I tried out a trigger pipeline that did not have the
>>>>> behavior that I was expecting, and I am looking for feedback on whether I'm
>>>>> missing something, or this is a bug.
>>>>>
>>>>> Please take a look at this unit test:
>>>>>
>>>>>
>>>>> https://github.com/apache/beam/pull/8364/files#diff-38fb631ae11ed485e2b99507e96ff9ffR451
>>>>>
>>>>> Is the check correct that we would expect range [1, 6) to appear
>>>>> twice? i.e. concat([1, 6), [1, 10]) ?
>>>>>
>>>>
>>>> This is what I would expect. Your test code looks good to me. Could you
>>>> file an issue?
>>>>
>>>>
>>>>>
>>>>> I have not tested this in other runners.
>>>>> Thanks
>>>>> -P.
>>>>>
>>>>

Re: Possible bug in accumulating triggers Python DirectRunner?

Posted by Pablo Estrada <pa...@google.com>.
created https://jira.apache.org/jira/browse/BEAM-7122
Best
-P.

On Fri, Apr 19, 2019 at 3:50 PM Pablo Estrada <pa...@google.com> wrote:

> Ah sorry for the lack of clarification. Each element appear only once in
> the final output. The failure is:
>
> ======================================================================
>> FAIL: test_multiple_accumulating_firings
>> (apache_beam.transforms.trigger_test.TriggerPipelineTest)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>   File "apache_beam/transforms/trigger_test.py", line 491, in
>> test_multiple_accumulating_firings
>>     TriggerPipelineTest.all_records)
>> AssertionError: Lists differ: ['1', '2', '3', '4', '5', '1',... != ['1',
>> '2', '3', '4', '5', '6',...
>>
> [...other output...]
>
> (expected is:)
>
>> - ['1', '2', '3', '4', '5', '1', '2', '3', '4', '5', '6', '7', '8', '9',
>> '10']
>> ?                           -------------------------
>>
> (actual is:)
>
>> + ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
>> ----------------------------------------------------------------------
>
>
>
>
> On Fri, Apr 19, 2019 at 3:45 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> What is the behavior you are seeing?
>>
>> Kenn
>>
>> On Fri, Apr 19, 2019 at 3:14 PM Ahmet Altay <al...@google.com> wrote:
>>
>>>
>>>
>>> On Fri, Apr 19, 2019 at 1:58 PM Pablo Estrada <pa...@google.com>
>>> wrote:
>>>
>>>> Hello all,
>>>> I've been slowly learning a bit about life in streaming, with state,
>>>> timers, triggers, etc.
>>>>
>>>> The other day, I tried out a trigger pipeline that did not have the
>>>> behavior that I was expecting, and I am looking for feedback on whether I'm
>>>> missing something, or this is a bug.
>>>>
>>>> Please take a look at this unit test:
>>>>
>>>>
>>>> https://github.com/apache/beam/pull/8364/files#diff-38fb631ae11ed485e2b99507e96ff9ffR451
>>>>
>>>> Is the check correct that we would expect range [1, 6) to appear twice?
>>>> i.e. concat([1, 6), [1, 10]) ?
>>>>
>>>
>>> This is what I would expect. Your test code looks good to me. Could you
>>> file an issue?
>>>
>>>
>>>>
>>>> I have not tested this in other runners.
>>>> Thanks
>>>> -P.
>>>>
>>>

Re: Possible bug in accumulating triggers Python DirectRunner?

Posted by Pablo Estrada <pa...@google.com>.
Ah sorry for the lack of clarification. Each element appear only once in
the final output. The failure is:

======================================================================
> FAIL: test_multiple_accumulating_firings
> (apache_beam.transforms.trigger_test.TriggerPipelineTest)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "apache_beam/transforms/trigger_test.py", line 491, in
> test_multiple_accumulating_firings
>     TriggerPipelineTest.all_records)
> AssertionError: Lists differ: ['1', '2', '3', '4', '5', '1',... != ['1',
> '2', '3', '4', '5', '6',...
>
[...other output...]

(expected is:)

> - ['1', '2', '3', '4', '5', '1', '2', '3', '4', '5', '6', '7', '8', '9',
> '10']
> ?                           -------------------------
>
(actual is:)

> + ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
> ----------------------------------------------------------------------




On Fri, Apr 19, 2019 at 3:45 PM Kenneth Knowles <ke...@apache.org> wrote:

> What is the behavior you are seeing?
>
> Kenn
>
> On Fri, Apr 19, 2019 at 3:14 PM Ahmet Altay <al...@google.com> wrote:
>
>>
>>
>> On Fri, Apr 19, 2019 at 1:58 PM Pablo Estrada <pa...@google.com> wrote:
>>
>>> Hello all,
>>> I've been slowly learning a bit about life in streaming, with state,
>>> timers, triggers, etc.
>>>
>>> The other day, I tried out a trigger pipeline that did not have the
>>> behavior that I was expecting, and I am looking for feedback on whether I'm
>>> missing something, or this is a bug.
>>>
>>> Please take a look at this unit test:
>>>
>>>
>>> https://github.com/apache/beam/pull/8364/files#diff-38fb631ae11ed485e2b99507e96ff9ffR451
>>>
>>> Is the check correct that we would expect range [1, 6) to appear twice?
>>> i.e. concat([1, 6), [1, 10]) ?
>>>
>>
>> This is what I would expect. Your test code looks good to me. Could you
>> file an issue?
>>
>>
>>>
>>> I have not tested this in other runners.
>>> Thanks
>>> -P.
>>>
>>

Re: Possible bug in accumulating triggers Python DirectRunner?

Posted by Kenneth Knowles <ke...@apache.org>.
What is the behavior you are seeing?

Kenn

On Fri, Apr 19, 2019 at 3:14 PM Ahmet Altay <al...@google.com> wrote:

>
>
> On Fri, Apr 19, 2019 at 1:58 PM Pablo Estrada <pa...@google.com> wrote:
>
>> Hello all,
>> I've been slowly learning a bit about life in streaming, with state,
>> timers, triggers, etc.
>>
>> The other day, I tried out a trigger pipeline that did not have the
>> behavior that I was expecting, and I am looking for feedback on whether I'm
>> missing something, or this is a bug.
>>
>> Please take a look at this unit test:
>>
>>
>> https://github.com/apache/beam/pull/8364/files#diff-38fb631ae11ed485e2b99507e96ff9ffR451
>>
>> Is the check correct that we would expect range [1, 6) to appear twice?
>> i.e. concat([1, 6), [1, 10]) ?
>>
>
> This is what I would expect. Your test code looks good to me. Could you
> file an issue?
>
>
>>
>> I have not tested this in other runners.
>> Thanks
>> -P.
>>
>

Re: Possible bug in accumulating triggers Python DirectRunner?

Posted by Ahmet Altay <al...@google.com>.
On Fri, Apr 19, 2019 at 1:58 PM Pablo Estrada <pa...@google.com> wrote:

> Hello all,
> I've been slowly learning a bit about life in streaming, with state,
> timers, triggers, etc.
>
> The other day, I tried out a trigger pipeline that did not have the
> behavior that I was expecting, and I am looking for feedback on whether I'm
> missing something, or this is a bug.
>
> Please take a look at this unit test:
>
>
> https://github.com/apache/beam/pull/8364/files#diff-38fb631ae11ed485e2b99507e96ff9ffR451
>
> Is the check correct that we would expect range [1, 6) to appear twice?
> i.e. concat([1, 6), [1, 10]) ?
>

This is what I would expect. Your test code looks good to me. Could you
file an issue?


>
> I have not tested this in other runners.
> Thanks
> -P.
>