You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Fra <pe...@gmail.com> on 2021/08/19 09:26:03 UTC

Theory question on process_continously processing mode and watermarks

Hello, during my personal development of a Flink streaming Platform i found
something that perplexes me.

    
    
    Using FileProcessingMode. _PROCESS_CONTINUOUSLY_

Into a streaming job that uses tumbling Windows and watermarks causes my
streaming process to stop ad the reading files phase.

Meanwhile if i delete my declarations of Windows and watermark the program
works as expected.

Is there some meaning behind this behaviour ? my theory is that
PROCESS_CONTINOUSLY re-reads the file and that causes a contradiction with the
watermarks created in the first reading of the files, causing it to stop





Inviato da [Posta](https://go.microsoft.com/fwlink/?LinkId=550986) per Windows




Re: Theory question on process_continously processing mode and watermarks

Posted by Arvid Heise <ar...@apache.org>.
I think what you are seeing is that the files have records with similar
timestamps. That means after reading file1 your watermarks are already
progressed to the end of your time range. When Flink picks up file2, all
records are considered late records and no windows fire anymore.

See [1] for a possible soluton on DataStream. Table API is dealing much
better with that if you use upserts [2].

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/operators/windows/#allowed-lateness
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/concepts/versioned_tables/

On Thu, Aug 19, 2021 at 11:48 AM Caizhi Weng <ts...@gmail.com> wrote:

> Hi!
>
> FileProcessingMode.PROCESS_CONTINUOUSLY means to continuously scans the
> file for updates, and there should be nothing to do with stopping the
> streaming job.
>
> I'm suspecting that in the column you defined the watermark there is some
> data which exceeds Long.MAX_VALUE. A Long.MAX_VALUE watermark indicates the
> job to stop. You might also want to share your code in the mailing lists so
> others can look into this problem more deeply.
>
> Fra <pe...@gmail.com> 于2021年8月19日周四 下午5:26写道:
>
>> Hello, during my personal development of a Flink streaming Platform i
>> found something that perplexes me.
>>
>> Using FileProcessingMode.*PROCESS_CONTINUOUSLY*
>>
>> Into a streaming job that uses tumbling Windows and watermarks causes my
>> streaming process to stop ad the reading files phase.
>>
>> Meanwhile if i delete my declarations of Windows and watermark the
>> program works as expected.
>>
>> Is there some meaning behind this behaviour ? my theory is that
>> PROCESS_CONTINOUSLY re-reads the file and that causes a contradiction with
>> the watermarks created in the first reading of the files, causing it to stop
>>
>>
>>
>>
>>
>> Inviato da Posta <https://go.microsoft.com/fwlink/?LinkId=550986> per
>> Windows
>>
>>
>>
>

Re: Theory question on process_continously processing mode and watermarks

Posted by Caizhi Weng <ts...@gmail.com>.
Hi!

FileProcessingMode.PROCESS_CONTINUOUSLY means to continuously scans the
file for updates, and there should be nothing to do with stopping the
streaming job.

I'm suspecting that in the column you defined the watermark there is some
data which exceeds Long.MAX_VALUE. A Long.MAX_VALUE watermark indicates the
job to stop. You might also want to share your code in the mailing lists so
others can look into this problem more deeply.

Fra <pe...@gmail.com> 于2021年8月19日周四 下午5:26写道:

> Hello, during my personal development of a Flink streaming Platform i
> found something that perplexes me.
>
> Using FileProcessingMode.*PROCESS_CONTINUOUSLY*
>
> Into a streaming job that uses tumbling Windows and watermarks causes my
> streaming process to stop ad the reading files phase.
>
> Meanwhile if i delete my declarations of Windows and watermark the program
> works as expected.
>
> Is there some meaning behind this behaviour ? my theory is that
> PROCESS_CONTINOUSLY re-reads the file and that causes a contradiction with
> the watermarks created in the first reading of the files, causing it to stop
>
>
>
>
>
> Inviato da Posta <https://go.microsoft.com/fwlink/?LinkId=550986> per
> Windows
>
>
>