You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Vilhelm von Ehrenheim <vo...@gmail.com> on 2018/02/21 15:29:31 UTC

TextIO.watchForNewFiles and Watermarks

Hi!
I have some problems with watermarks using the watchForNewFiles feature in
TextIO to read . The watermark is not progressing even though no new files
are added to the location. Does anyone know what the logic is behind this
watermark, and if it is possible to supply your own heuristic? I can't find
it anywhere in the source.

Regards,
Vilhelm von Ehrenheim

Re: TextIO.watchForNewFiles and Watermarks

Posted by Vilhelm von Ehrenheim <vo...@gmail.com>.
Ok thanks! I think that makes sense. I just got into some strange problems
when mixing this with KafkaIO. I'll checkout the Watch transform.

// Vilhelm

On Wed, Feb 21, 2018 at 6:12 PM, Eugene Kirpichov <ki...@google.com>
wrote:

> Hi! Currently it's not configurable - the watermark is min(timestamp of
> pending files) where timestamp is simply time when the file was seen.
> However, it's implemented as a very thin wrapper on top of the Watch
> transform, see implementation of FileIO.matchAll(). You can use the Watch
> transform directly and write a PollFn that uses FileSystems.match() to list
> the files and computes file timestamps and watermark in the way appropriate
> to your use case.
>
> On Wed, Feb 21, 2018 at 7:29 AM Vilhelm von Ehrenheim <
> vonehrenheim@gmail.com> wrote:
>
>> Hi!
>> I have some problems with watermarks using the watchForNewFiles feature
>> in TextIO to read . The watermark is not progressing even though no new
>> files are added to the location. Does anyone know what the logic is behind
>> this watermark, and if it is possible to supply your own heuristic? I can't
>> find it anywhere in the source.
>>
>> Regards,
>> Vilhelm von Ehrenheim
>>
>>
>>

Re: TextIO.watchForNewFiles and Watermarks

Posted by Eugene Kirpichov <ki...@google.com>.
Hi! Currently it's not configurable - the watermark is min(timestamp of
pending files) where timestamp is simply time when the file was seen.
However, it's implemented as a very thin wrapper on top of the Watch
transform, see implementation of FileIO.matchAll(). You can use the Watch
transform directly and write a PollFn that uses FileSystems.match() to list
the files and computes file timestamps and watermark in the way appropriate
to your use case.

On Wed, Feb 21, 2018 at 7:29 AM Vilhelm von Ehrenheim <
vonehrenheim@gmail.com> wrote:

> Hi!
> I have some problems with watermarks using the watchForNewFiles feature in
> TextIO to read . The watermark is not progressing even though no new files
> are added to the location. Does anyone know what the logic is behind this
> watermark, and if it is possible to supply your own heuristic? I can't find
> it anywhere in the source.
>
> Regards,
> Vilhelm von Ehrenheim
>
>
>