You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Vilhelm von Ehrenheim <vo...@gmail.com> on 2018/02/20 18:13:37 UTC

Missing watermark?

Hi all!
I have a somewhat complicated stateful DoFn that i would like to add an
event time timer on. My goal with the timer is to not output anything until
sufficient amount of state has been built up in a Global window.

In doing this I realize that the watermark doesn’t seem to progress at all
(regardless of the timer) and in Google Dataflow the displayed watermark is
just “-“ when clicking on the ParDo(DoFn) node.

The DoFn is reading flattened input from a TextIO.watchForNewFiles and
KafkaIO. The Flatten element had a watermark set.

I have written tests for my DoFn that all pass using TestStream but since I
there explicitly set the watermark progression all is fine.

What can I do to look into why there is no watermark progression for a
specific  PTransform?

Regards,
Vilhelm von Ehrenheim

Re: Missing watermark?

Posted by Vilhelm von Ehrenheim <vo...@gmail.com>.
Ok, that might explain it. I have a topic with very few messages so it
might have empty partitions actually. Thanks!

I’m really curious about the TextIO watermark as well though. I cant find
the implementation either so if anyone know where to look I’d love to get a
pointer.

// Vilhelm

On Tue, 20 Feb 2018 at 19:31, Raghu Angadi <ra...@google.com> wrote:

> Both the sources have to provide watermark.
>
> - KafkaIO : It can have its watermark stuck if you don't have any records
> in one of the partitions. Thats is a bug. The management of timestamps and
> watermarks in KafkaIO are being updated in
> https://github.com/apache/beam/pull/4680.
> - TextIO.watchForNewFiles() - I am not sure how the watermark is handled
> by TextIO. Didn't notice any mentions of in implementation.
>
> On Tue, Feb 20, 2018 at 10:13 AM, Vilhelm von Ehrenheim <
> vonehrenheim@gmail.com> wrote:
>
>> Hi all!
>> I have a somewhat complicated stateful DoFn that i would like to add an
>> event time timer on. My goal with the timer is to not output anything until
>> sufficient amount of state has been built up in a Global window.
>>
>> In doing this I realize that the watermark doesn’t seem to progress at
>> all (regardless of the timer) and in Google Dataflow the displayed
>> watermark is just “-“ when clicking on the ParDo(DoFn) node.
>>
>> The DoFn is reading flattened input from a TextIO.watchForNewFiles and
>> KafkaIO. The Flatten element had a watermark set.
>>
>> I have written tests for my DoFn that all pass using TestStream but since
>> I there explicitly set the watermark progression all is fine.
>>
>> What can I do to look into why there is no watermark progression for a
>> specific  PTransform?
>>
>> Regards,
>> Vilhelm von Ehrenheim
>>
>
>

Re: Missing watermark?

Posted by Raghu Angadi <ra...@google.com>.
Both the sources have to provide watermark.

- KafkaIO : It can have its watermark stuck if you don't have any records
in one of the partitions. Thats is a bug. The management of timestamps and
watermarks in KafkaIO are being updated in
https://github.com/apache/beam/pull/4680.
- TextIO.watchForNewFiles() - I am not sure how the watermark is handled by
TextIO. Didn't notice any mentions of in implementation.

On Tue, Feb 20, 2018 at 10:13 AM, Vilhelm von Ehrenheim <
vonehrenheim@gmail.com> wrote:

> Hi all!
> I have a somewhat complicated stateful DoFn that i would like to add an
> event time timer on. My goal with the timer is to not output anything until
> sufficient amount of state has been built up in a Global window.
>
> In doing this I realize that the watermark doesn’t seem to progress at all
> (regardless of the timer) and in Google Dataflow the displayed watermark is
> just “-“ when clicking on the ParDo(DoFn) node.
>
> The DoFn is reading flattened input from a TextIO.watchForNewFiles and
> KafkaIO. The Flatten element had a watermark set.
>
> I have written tests for my DoFn that all pass using TestStream but since
> I there explicitly set the watermark progression all is fine.
>
> What can I do to look into why there is no watermark progression for a
> specific  PTransform?
>
> Regards,
> Vilhelm von Ehrenheim
>