You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Aeden Jameson <ae...@gmail.com> on 2021/02/24 23:49:21 UTC

BackPressure in RowTime Task of FlinkSql Job

    I have a job made up of a few FlinkSQL statements using a
statement set. In my job graph viewed through the Flink UI a few of
the tasks/statements are preceded by this task

rowtime field: (#11: event_time TIME ATTRIBUTE(ROWTIME))

that has an upstream Kafka source/sink task.

    Occasionally, some of the rowtime tasks appear back pressured
meaning they have high Outpool buffer usage however all of  downstream
sql tasks have low InPool and OutPool usage.  Also, the CPU and
memory, noo OOM errors, usage is also at acceptable levels as far as I
can tell. Another symptom I notice during these episodes is high
consumer fetch latency with Kafka, but I haven't been able to put my
finger on the direction of the causal arrow. What are some causes of
this behavior and what are the best metrics to look at?

Thank you,
Aeden

Re: BackPressure in RowTime Task of FlinkSql Job

Posted by Aeden Jameson <ae...@gmail.com>.
>>Is it possible that you
are generating to many watermarks that need to be send to all downstream
tasks?

This was it basically. I had unexpected flooding on specific keys, which
was guessing intermittently hot partitions that was back pressuring the
rowtime task.

I do have another question, how does the delay and buffering specified by
the watermark work? Is the rowtime task doing that or the downstream tasks?

Thank you

On Fri, Feb 26, 2021 at 1:19 AM Timo Walther <tw...@apache.org> wrote:

> Hi Aeden,
>
> the rowtime task is actually just a simple map function that extracts
> the event-time timestamp into a field of the row for the next operator.
> It should not be the problem. Can you share a screenshot of your
> pipeline? What is your watermarking strategy? Is it possible that you
> are generating to many watermarks that need to be send to all downstream
> tasks? Do all operators have the same parallelism?
>
> Regards,
> Timo
>
>
> On 25.02.21 00:49, Aeden Jameson wrote:
> >      I have a job made up of a few FlinkSQL statements using a
> > statement set. In my job graph viewed through the Flink UI a few of
> > the tasks/statements are preceded by this task
> >
> > rowtime field: (#11: event_time TIME ATTRIBUTE(ROWTIME))
> >
> > that has an upstream Kafka source/sink task.
> >
> >      Occasionally, some of the rowtime tasks appear back pressured
> > meaning they have high Outpool buffer usage however all of  downstream
> > sql tasks have low InPool and OutPool usage.  Also, the CPU and
> > memory, noo OOM errors, usage is also at acceptable levels as far as I
> > can tell. Another symptom I notice during these episodes is high
> > consumer fetch latency with Kafka, but I haven't been able to put my
> > finger on the direction of the causal arrow. What are some causes of
> > this behavior and what are the best metrics to look at?
> >
> > Thank you,
> > Aeden
> >
>
>

Re: BackPressure in RowTime Task of FlinkSql Job

Posted by Timo Walther <tw...@apache.org>.
Hi Aeden,

the rowtime task is actually just a simple map function that extracts 
the event-time timestamp into a field of the row for the next operator. 
It should not be the problem. Can you share a screenshot of your 
pipeline? What is your watermarking strategy? Is it possible that you 
are generating to many watermarks that need to be send to all downstream 
tasks? Do all operators have the same parallelism?

Regards,
Timo


On 25.02.21 00:49, Aeden Jameson wrote:
>      I have a job made up of a few FlinkSQL statements using a
> statement set. In my job graph viewed through the Flink UI a few of
> the tasks/statements are preceded by this task
> 
> rowtime field: (#11: event_time TIME ATTRIBUTE(ROWTIME))
> 
> that has an upstream Kafka source/sink task.
> 
>      Occasionally, some of the rowtime tasks appear back pressured
> meaning they have high Outpool buffer usage however all of  downstream
> sql tasks have low InPool and OutPool usage.  Also, the CPU and
> memory, noo OOM errors, usage is also at acceptable levels as far as I
> can tell. Another symptom I notice during these episodes is high
> consumer fetch latency with Kafka, but I haven't been able to put my
> finger on the direction of the causal arrow. What are some causes of
> this behavior and what are the best metrics to look at?
> 
> Thank you,
> Aeden
>