You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Cam Mach <ca...@gmail.com> on 2020/01/13 19:34:50 UTC

Understanding watermark

Hello Flink expert,

We have a pipeline that read both bounded and unbounded sources and our
understanding is that when the bounded sources complete they should get a
watermark of +inf and then we should be able to take a savepoint and safely
restart the pipeline. However, we have source that never get watermarks and
we are confused as to what we are seeing and what we should expect


Cam Mach
Software Engineer
E-mail: cammach84@gmail.com
Tel: 206 972 2768

Re: Understanding watermark

Posted by Guowei Ma <gu...@gmail.com>.
>>What I understand from you, one operator has two watermarks? If so, one
operator's output watermark would be an input watermark of the next
operator? Does it sounds redundant?
There are no two watermarks for an operator. What I want to say is
"watermark metrics".

>>Or do you mean the Web UI only show the input watermarks of every
operator, but since the source doesn't have input watermark show it show
"No Watermark" ? And we should have output watermark for source?
Yes. But the web UI only shows the task level watermarks metrics, not the
operator level. Yout could find more detail information about metrics in
the link[1].

>>And, yes we want to understand when we should expect to see watermarks
for our "combined" sources (bounded and un-bounded) for our pipeline?
Do you try a topology with only Kinesis source and the web UI shows the
Watermark of source?  Actually, I think it might not be related to the
"combined" source.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/metrics.html
Best,
Guowei


Cam Mach <ca...@gmail.com> 于2020年1月15日周三 下午3:53写道:

> Hi Guowei,
>
> Thanks for your response.
>
> What I understand from you, one operator has two watermarks? If so, one
> operator's output watermark would be an input watermark of the next
> operator? Does it sounds redundant?
>
> Or do you mean the Web UI only show the input watermarks of every
> operator, but since the source doesn't have input watermark show it show
> "No Watermark" ? And we should have output watermark for source?
>
> And, yes we want to understand when we should expect to see watermarks for
> our "combined" sources (bounded and un-bounded) for our pipeline?
>
> If you can be more directly, it would be very helpful.
>
> Thanks,
>
> On Tue, Jan 14, 2020 at 5:42 PM Guowei Ma <gu...@gmail.com> wrote:
>
>> Hi, Cam,
>> I think you might want to know why the web page does not show the
>> watermark of the source.
>> Currently, the web only shows the "input" watermark. The source only
>> outputs the watermark so the web shows you that there is "No Watermark".
>>  Actually Flink has "output" watermark metrics. I think Flink should also
>> show this information on the web. Would you mind open a Jira to track this?
>>
>>
>> Best,
>> Guowei
>>
>>
>> Cam Mach <ca...@gmail.com> 于2020年1月15日周三 上午4:05写道:
>>
>>> Hi Till,
>>>
>>> Thanks for your response.
>>>
>>> Our sources are S3 and Kinesis. We have run several tests, and we are
>>> able to take savepoint/checkpoint, but only when S3 complete reading. And
>>> at that point, our pipeline has watermarks for other operators, but not the
>>> source operator. We are not running `PROCESS_CONTINUOUSLY`, so we should
>>> have watermark for the source as well, right?
>>>
>>>  Attached is snapshot of our pipeline.
>>>
>>> [image: image.png]
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Tue, Jan 14, 2020 at 10:43 AM Till Rohrmann <tr...@apache.org>
>>> wrote:
>>>
>>>> Hi Cam,
>>>>
>>>> could you share a bit more details about your job (e.g. which sources
>>>> are you using, what are your settings, etc.). Ideally you can provide a
>>>> minimal example in order to better understand the program.
>>>>
>>>> From a high level perspective, there might be different problems: First
>>>> of all, Flink does not support checkpointing/taking a savepoint if some of
>>>> the job's operator have already terminated iirc. But your description
>>>> points rather into the direction that your bounded source does not
>>>> terminate. So maybe you are reading a file via
>>>> StreamExecutionEnvironment.createFileInput
>>>> with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to
>>>> tell without a better understanding of your job.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Mon, Jan 13, 2020 at 8:35 PM Cam Mach <ca...@gmail.com> wrote:
>>>>
>>>>> Hello Flink expert,
>>>>>
>>>>> We have a pipeline that read both bounded and unbounded sources and
>>>>> our understanding is that when the bounded sources complete they should get
>>>>> a watermark of +inf and then we should be able to take a savepoint and
>>>>> safely restart the pipeline. However, we have source that never get
>>>>> watermarks and we are confused as to what we are seeing and what we should
>>>>> expect
>>>>>
>>>>>
>>>>> Cam Mach
>>>>> Software Engineer
>>>>> E-mail: cammach84@gmail.com
>>>>> Tel: 206 972 2768
>>>>>
>>>>>

Re: Understanding watermark

Posted by Guowei Ma <gu...@gmail.com>.
>>What I understand from you, one operator has two watermarks? If so, one
operator's output watermark would be an input watermark of the next
operator? Does it sounds redundant?
There are no two watermarks for an operator. What I want to say is
"watermark metrics".

>>Or do you mean the Web UI only show the input watermarks of every
operator, but since the source doesn't have input watermark show it show
"No Watermark" ? And we should have output watermark for source?
Yes. But the web UI only shows the task level watermarks metrics, not the
operator level. Yout could find more detail information about metrics in
the link[1].

>>And, yes we want to understand when we should expect to see watermarks
for our "combined" sources (bounded and un-bounded) for our pipeline?
Do you try a topology with only Kinesis source and the web UI shows the
Watermark of source?  Actually, I think it might not be related to the
"combined" source.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/metrics.html
Best,
Guowei


Cam Mach <ca...@gmail.com> 于2020年1月15日周三 下午3:53写道:

> Hi Guowei,
>
> Thanks for your response.
>
> What I understand from you, one operator has two watermarks? If so, one
> operator's output watermark would be an input watermark of the next
> operator? Does it sounds redundant?
>
> Or do you mean the Web UI only show the input watermarks of every
> operator, but since the source doesn't have input watermark show it show
> "No Watermark" ? And we should have output watermark for source?
>
> And, yes we want to understand when we should expect to see watermarks for
> our "combined" sources (bounded and un-bounded) for our pipeline?
>
> If you can be more directly, it would be very helpful.
>
> Thanks,
>
> On Tue, Jan 14, 2020 at 5:42 PM Guowei Ma <gu...@gmail.com> wrote:
>
>> Hi, Cam,
>> I think you might want to know why the web page does not show the
>> watermark of the source.
>> Currently, the web only shows the "input" watermark. The source only
>> outputs the watermark so the web shows you that there is "No Watermark".
>>  Actually Flink has "output" watermark metrics. I think Flink should also
>> show this information on the web. Would you mind open a Jira to track this?
>>
>>
>> Best,
>> Guowei
>>
>>
>> Cam Mach <ca...@gmail.com> 于2020年1月15日周三 上午4:05写道:
>>
>>> Hi Till,
>>>
>>> Thanks for your response.
>>>
>>> Our sources are S3 and Kinesis. We have run several tests, and we are
>>> able to take savepoint/checkpoint, but only when S3 complete reading. And
>>> at that point, our pipeline has watermarks for other operators, but not the
>>> source operator. We are not running `PROCESS_CONTINUOUSLY`, so we should
>>> have watermark for the source as well, right?
>>>
>>>  Attached is snapshot of our pipeline.
>>>
>>> [image: image.png]
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Tue, Jan 14, 2020 at 10:43 AM Till Rohrmann <tr...@apache.org>
>>> wrote:
>>>
>>>> Hi Cam,
>>>>
>>>> could you share a bit more details about your job (e.g. which sources
>>>> are you using, what are your settings, etc.). Ideally you can provide a
>>>> minimal example in order to better understand the program.
>>>>
>>>> From a high level perspective, there might be different problems: First
>>>> of all, Flink does not support checkpointing/taking a savepoint if some of
>>>> the job's operator have already terminated iirc. But your description
>>>> points rather into the direction that your bounded source does not
>>>> terminate. So maybe you are reading a file via
>>>> StreamExecutionEnvironment.createFileInput
>>>> with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to
>>>> tell without a better understanding of your job.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Mon, Jan 13, 2020 at 8:35 PM Cam Mach <ca...@gmail.com> wrote:
>>>>
>>>>> Hello Flink expert,
>>>>>
>>>>> We have a pipeline that read both bounded and unbounded sources and
>>>>> our understanding is that when the bounded sources complete they should get
>>>>> a watermark of +inf and then we should be able to take a savepoint and
>>>>> safely restart the pipeline. However, we have source that never get
>>>>> watermarks and we are confused as to what we are seeing and what we should
>>>>> expect
>>>>>
>>>>>
>>>>> Cam Mach
>>>>> Software Engineer
>>>>> E-mail: cammach84@gmail.com
>>>>> Tel: 206 972 2768
>>>>>
>>>>>

Re: Understanding watermark

Posted by Cam Mach <ca...@gmail.com>.
Hi Guowei,

Thanks for your response.

What I understand from you, one operator has two watermarks? If so, one
operator's output watermark would be an input watermark of the next
operator? Does it sounds redundant?

Or do you mean the Web UI only show the input watermarks of every operator,
but since the source doesn't have input watermark show it show "No
Watermark" ? And we should have output watermark for source?

And, yes we want to understand when we should expect to see watermarks for
our "combined" sources (bounded and un-bounded) for our pipeline?

If you can be more directly, it would be very helpful.

Thanks,

On Tue, Jan 14, 2020 at 5:42 PM Guowei Ma <gu...@gmail.com> wrote:

> Hi, Cam,
> I think you might want to know why the web page does not show the
> watermark of the source.
> Currently, the web only shows the "input" watermark. The source only
> outputs the watermark so the web shows you that there is "No Watermark".
>  Actually Flink has "output" watermark metrics. I think Flink should also
> show this information on the web. Would you mind open a Jira to track this?
>
>
> Best,
> Guowei
>
>
> Cam Mach <ca...@gmail.com> 于2020年1月15日周三 上午4:05写道:
>
>> Hi Till,
>>
>> Thanks for your response.
>>
>> Our sources are S3 and Kinesis. We have run several tests, and we are
>> able to take savepoint/checkpoint, but only when S3 complete reading. And
>> at that point, our pipeline has watermarks for other operators, but not the
>> source operator. We are not running `PROCESS_CONTINUOUSLY`, so we should
>> have watermark for the source as well, right?
>>
>>  Attached is snapshot of our pipeline.
>>
>> [image: image.png]
>>
>> Thanks
>>
>>
>>
>> On Tue, Jan 14, 2020 at 10:43 AM Till Rohrmann <tr...@apache.org>
>> wrote:
>>
>>> Hi Cam,
>>>
>>> could you share a bit more details about your job (e.g. which sources
>>> are you using, what are your settings, etc.). Ideally you can provide a
>>> minimal example in order to better understand the program.
>>>
>>> From a high level perspective, there might be different problems: First
>>> of all, Flink does not support checkpointing/taking a savepoint if some of
>>> the job's operator have already terminated iirc. But your description
>>> points rather into the direction that your bounded source does not
>>> terminate. So maybe you are reading a file via
>>> StreamExecutionEnvironment.createFileInput
>>> with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to
>>> tell without a better understanding of your job.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Mon, Jan 13, 2020 at 8:35 PM Cam Mach <ca...@gmail.com> wrote:
>>>
>>>> Hello Flink expert,
>>>>
>>>> We have a pipeline that read both bounded and unbounded sources and our
>>>> understanding is that when the bounded sources complete they should get a
>>>> watermark of +inf and then we should be able to take a savepoint and safely
>>>> restart the pipeline. However, we have source that never get watermarks and
>>>> we are confused as to what we are seeing and what we should expect
>>>>
>>>>
>>>> Cam Mach
>>>> Software Engineer
>>>> E-mail: cammach84@gmail.com
>>>> Tel: 206 972 2768
>>>>
>>>>

Re: Understanding watermark

Posted by Cam Mach <ca...@gmail.com>.
Hi Guowei,

Thanks for your response.

What I understand from you, one operator has two watermarks? If so, one
operator's output watermark would be an input watermark of the next
operator? Does it sounds redundant?

Or do you mean the Web UI only show the input watermarks of every operator,
but since the source doesn't have input watermark show it show "No
Watermark" ? And we should have output watermark for source?

And, yes we want to understand when we should expect to see watermarks for
our "combined" sources (bounded and un-bounded) for our pipeline?

If you can be more directly, it would be very helpful.

Thanks,

On Tue, Jan 14, 2020 at 5:42 PM Guowei Ma <gu...@gmail.com> wrote:

> Hi, Cam,
> I think you might want to know why the web page does not show the
> watermark of the source.
> Currently, the web only shows the "input" watermark. The source only
> outputs the watermark so the web shows you that there is "No Watermark".
>  Actually Flink has "output" watermark metrics. I think Flink should also
> show this information on the web. Would you mind open a Jira to track this?
>
>
> Best,
> Guowei
>
>
> Cam Mach <ca...@gmail.com> 于2020年1月15日周三 上午4:05写道:
>
>> Hi Till,
>>
>> Thanks for your response.
>>
>> Our sources are S3 and Kinesis. We have run several tests, and we are
>> able to take savepoint/checkpoint, but only when S3 complete reading. And
>> at that point, our pipeline has watermarks for other operators, but not the
>> source operator. We are not running `PROCESS_CONTINUOUSLY`, so we should
>> have watermark for the source as well, right?
>>
>>  Attached is snapshot of our pipeline.
>>
>> [image: image.png]
>>
>> Thanks
>>
>>
>>
>> On Tue, Jan 14, 2020 at 10:43 AM Till Rohrmann <tr...@apache.org>
>> wrote:
>>
>>> Hi Cam,
>>>
>>> could you share a bit more details about your job (e.g. which sources
>>> are you using, what are your settings, etc.). Ideally you can provide a
>>> minimal example in order to better understand the program.
>>>
>>> From a high level perspective, there might be different problems: First
>>> of all, Flink does not support checkpointing/taking a savepoint if some of
>>> the job's operator have already terminated iirc. But your description
>>> points rather into the direction that your bounded source does not
>>> terminate. So maybe you are reading a file via
>>> StreamExecutionEnvironment.createFileInput
>>> with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to
>>> tell without a better understanding of your job.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Mon, Jan 13, 2020 at 8:35 PM Cam Mach <ca...@gmail.com> wrote:
>>>
>>>> Hello Flink expert,
>>>>
>>>> We have a pipeline that read both bounded and unbounded sources and our
>>>> understanding is that when the bounded sources complete they should get a
>>>> watermark of +inf and then we should be able to take a savepoint and safely
>>>> restart the pipeline. However, we have source that never get watermarks and
>>>> we are confused as to what we are seeing and what we should expect
>>>>
>>>>
>>>> Cam Mach
>>>> Software Engineer
>>>> E-mail: cammach84@gmail.com
>>>> Tel: 206 972 2768
>>>>
>>>>

Re: Understanding watermark

Posted by Guowei Ma <gu...@gmail.com>.
Hi, Cam,
I think you might want to know why the web page does not show the watermark
of the source.
Currently, the web only shows the "input" watermark. The source only
outputs the watermark so the web shows you that there is "No Watermark".
 Actually Flink has "output" watermark metrics. I think Flink should also
show this information on the web. Would you mind open a Jira to track this?


Best,
Guowei


Cam Mach <ca...@gmail.com> 于2020年1月15日周三 上午4:05写道:

> Hi Till,
>
> Thanks for your response.
>
> Our sources are S3 and Kinesis. We have run several tests, and we are able
> to take savepoint/checkpoint, but only when S3 complete reading. And at
> that point, our pipeline has watermarks for other operators, but not the
> source operator. We are not running `PROCESS_CONTINUOUSLY`, so we should
> have watermark for the source as well, right?
>
>  Attached is snapshot of our pipeline.
>
> [image: image.png]
>
> Thanks
>
>
>
> On Tue, Jan 14, 2020 at 10:43 AM Till Rohrmann <tr...@apache.org>
> wrote:
>
>> Hi Cam,
>>
>> could you share a bit more details about your job (e.g. which sources are
>> you using, what are your settings, etc.). Ideally you can provide a minimal
>> example in order to better understand the program.
>>
>> From a high level perspective, there might be different problems: First
>> of all, Flink does not support checkpointing/taking a savepoint if some of
>> the job's operator have already terminated iirc. But your description
>> points rather into the direction that your bounded source does not
>> terminate. So maybe you are reading a file via
>> StreamExecutionEnvironment.createFileInput
>> with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to
>> tell without a better understanding of your job.
>>
>> Cheers,
>> Till
>>
>> On Mon, Jan 13, 2020 at 8:35 PM Cam Mach <ca...@gmail.com> wrote:
>>
>>> Hello Flink expert,
>>>
>>> We have a pipeline that read both bounded and unbounded sources and our
>>> understanding is that when the bounded sources complete they should get a
>>> watermark of +inf and then we should be able to take a savepoint and safely
>>> restart the pipeline. However, we have source that never get watermarks and
>>> we are confused as to what we are seeing and what we should expect
>>>
>>>
>>> Cam Mach
>>> Software Engineer
>>> E-mail: cammach84@gmail.com
>>> Tel: 206 972 2768
>>>
>>>

Re: Understanding watermark

Posted by Guowei Ma <gu...@gmail.com>.
Hi, Cam,
I think you might want to know why the web page does not show the watermark
of the source.
Currently, the web only shows the "input" watermark. The source only
outputs the watermark so the web shows you that there is "No Watermark".
 Actually Flink has "output" watermark metrics. I think Flink should also
show this information on the web. Would you mind open a Jira to track this?


Best,
Guowei


Cam Mach <ca...@gmail.com> 于2020年1月15日周三 上午4:05写道:

> Hi Till,
>
> Thanks for your response.
>
> Our sources are S3 and Kinesis. We have run several tests, and we are able
> to take savepoint/checkpoint, but only when S3 complete reading. And at
> that point, our pipeline has watermarks for other operators, but not the
> source operator. We are not running `PROCESS_CONTINUOUSLY`, so we should
> have watermark for the source as well, right?
>
>  Attached is snapshot of our pipeline.
>
> [image: image.png]
>
> Thanks
>
>
>
> On Tue, Jan 14, 2020 at 10:43 AM Till Rohrmann <tr...@apache.org>
> wrote:
>
>> Hi Cam,
>>
>> could you share a bit more details about your job (e.g. which sources are
>> you using, what are your settings, etc.). Ideally you can provide a minimal
>> example in order to better understand the program.
>>
>> From a high level perspective, there might be different problems: First
>> of all, Flink does not support checkpointing/taking a savepoint if some of
>> the job's operator have already terminated iirc. But your description
>> points rather into the direction that your bounded source does not
>> terminate. So maybe you are reading a file via
>> StreamExecutionEnvironment.createFileInput
>> with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to
>> tell without a better understanding of your job.
>>
>> Cheers,
>> Till
>>
>> On Mon, Jan 13, 2020 at 8:35 PM Cam Mach <ca...@gmail.com> wrote:
>>
>>> Hello Flink expert,
>>>
>>> We have a pipeline that read both bounded and unbounded sources and our
>>> understanding is that when the bounded sources complete they should get a
>>> watermark of +inf and then we should be able to take a savepoint and safely
>>> restart the pipeline. However, we have source that never get watermarks and
>>> we are confused as to what we are seeing and what we should expect
>>>
>>>
>>> Cam Mach
>>> Software Engineer
>>> E-mail: cammach84@gmail.com
>>> Tel: 206 972 2768
>>>
>>>

Re: Understanding watermark

Posted by Cam Mach <ca...@gmail.com>.
Hi Till,

Thanks for your response.

Our sources are S3 and Kinesis. We have run several tests, and we are able
to take savepoint/checkpoint, but only when S3 complete reading. And at
that point, our pipeline has watermarks for other operators, but not the
source operator. We are not running `PROCESS_CONTINUOUSLY`, so we should
have watermark for the source as well, right?

 Attached is snapshot of our pipeline.

[image: image.png]

Thanks



On Tue, Jan 14, 2020 at 10:43 AM Till Rohrmann <tr...@apache.org> wrote:

> Hi Cam,
>
> could you share a bit more details about your job (e.g. which sources are
> you using, what are your settings, etc.). Ideally you can provide a minimal
> example in order to better understand the program.
>
> From a high level perspective, there might be different problems: First of
> all, Flink does not support checkpointing/taking a savepoint if some of the
> job's operator have already terminated iirc. But your description points
> rather into the direction that your bounded source does not terminate. So
> maybe you are reading a file via StreamExecutionEnvironment.createFileInput
> with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to
> tell without a better understanding of your job.
>
> Cheers,
> Till
>
> On Mon, Jan 13, 2020 at 8:35 PM Cam Mach <ca...@gmail.com> wrote:
>
>> Hello Flink expert,
>>
>> We have a pipeline that read both bounded and unbounded sources and our
>> understanding is that when the bounded sources complete they should get a
>> watermark of +inf and then we should be able to take a savepoint and safely
>> restart the pipeline. However, we have source that never get watermarks and
>> we are confused as to what we are seeing and what we should expect
>>
>>
>> Cam Mach
>> Software Engineer
>> E-mail: cammach84@gmail.com
>> Tel: 206 972 2768
>>
>>

Re: Understanding watermark

Posted by Cam Mach <ca...@gmail.com>.
Hi Till,

Thanks for your response.

Our sources are S3 and Kinesis. We have run several tests, and we are able
to take savepoint/checkpoint, but only when S3 complete reading. And at
that point, our pipeline has watermarks for other operators, but not the
source operator. We are not running `PROCESS_CONTINUOUSLY`, so we should
have watermark for the source as well, right?

 Attached is snapshot of our pipeline.

[image: image.png]

Thanks



On Tue, Jan 14, 2020 at 10:43 AM Till Rohrmann <tr...@apache.org> wrote:

> Hi Cam,
>
> could you share a bit more details about your job (e.g. which sources are
> you using, what are your settings, etc.). Ideally you can provide a minimal
> example in order to better understand the program.
>
> From a high level perspective, there might be different problems: First of
> all, Flink does not support checkpointing/taking a savepoint if some of the
> job's operator have already terminated iirc. But your description points
> rather into the direction that your bounded source does not terminate. So
> maybe you are reading a file via StreamExecutionEnvironment.createFileInput
> with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to
> tell without a better understanding of your job.
>
> Cheers,
> Till
>
> On Mon, Jan 13, 2020 at 8:35 PM Cam Mach <ca...@gmail.com> wrote:
>
>> Hello Flink expert,
>>
>> We have a pipeline that read both bounded and unbounded sources and our
>> understanding is that when the bounded sources complete they should get a
>> watermark of +inf and then we should be able to take a savepoint and safely
>> restart the pipeline. However, we have source that never get watermarks and
>> we are confused as to what we are seeing and what we should expect
>>
>>
>> Cam Mach
>> Software Engineer
>> E-mail: cammach84@gmail.com
>> Tel: 206 972 2768
>>
>>

Re: Understanding watermark

Posted by Till Rohrmann <tr...@apache.org>.
Hi Cam,

could you share a bit more details about your job (e.g. which sources are
you using, what are your settings, etc.). Ideally you can provide a minimal
example in order to better understand the program.

From a high level perspective, there might be different problems: First of
all, Flink does not support checkpointing/taking a savepoint if some of the
job's operator have already terminated iirc. But your description points
rather into the direction that your bounded source does not terminate. So
maybe you are reading a file via StreamExecutionEnvironment.createFileInput
with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to
tell without a better understanding of your job.

Cheers,
Till

On Mon, Jan 13, 2020 at 8:35 PM Cam Mach <ca...@gmail.com> wrote:

> Hello Flink expert,
>
> We have a pipeline that read both bounded and unbounded sources and our
> understanding is that when the bounded sources complete they should get a
> watermark of +inf and then we should be able to take a savepoint and safely
> restart the pipeline. However, we have source that never get watermarks and
> we are confused as to what we are seeing and what we should expect
>
>
> Cam Mach
> Software Engineer
> E-mail: cammach84@gmail.com
> Tel: 206 972 2768
>
>

Re: Understanding watermark

Posted by Till Rohrmann <tr...@apache.org>.
Hi Cam,

could you share a bit more details about your job (e.g. which sources are
you using, what are your settings, etc.). Ideally you can provide a minimal
example in order to better understand the program.

From a high level perspective, there might be different problems: First of
all, Flink does not support checkpointing/taking a savepoint if some of the
job's operator have already terminated iirc. But your description points
rather into the direction that your bounded source does not terminate. So
maybe you are reading a file via StreamExecutionEnvironment.createFileInput
with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to
tell without a better understanding of your job.

Cheers,
Till

On Mon, Jan 13, 2020 at 8:35 PM Cam Mach <ca...@gmail.com> wrote:

> Hello Flink expert,
>
> We have a pipeline that read both bounded and unbounded sources and our
> understanding is that when the bounded sources complete they should get a
> watermark of +inf and then we should be able to take a savepoint and safely
> restart the pipeline. However, we have source that never get watermarks and
> we are confused as to what we are seeing and what we should expect
>
>
> Cam Mach
> Software Engineer
> E-mail: cammach84@gmail.com
> Tel: 206 972 2768
>
>