You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Sayat Satybaldiyev <st...@gmail.com> on 2018/12/03 19:07:16 UTC

High Job BackPressure

Dear Flink community,

Would anyone give a clue how to debug a job that has a high backpressure in
the kafka source? We have a flink job that joins two stream via Process
Function and Rocksdb state backend from two kafka topics. The job is
significantly lagging behind ~8 hours and produces an incorrect result.

Flink UI gives a hint that Source Functions(recommendation stream and
custom source) are backpressure while recommendation-click join is fine.

I've looked into JM and TM logs and there's nothing stage to me. Except
"Kafka error sending fetch request" which happens during a checkpoint.
Checkpoints happen once in 20min and utilize almost all network interface.

Please find UI screenshots and flink logs attached to this email.

https://drive.google.com/file/d/14h8zwC_49wxt5uNPYtM3LN6WhJ7lyeVS/view?usp=sharing
https://drive.google.com/file/d/1s6I___S7u0pBJyWdnmYaH0e_MwGr3CgY/view?usp=sharing

Re: High Job BackPressure

Posted by sayat <st...@gmail.com>.
I forgot to mention that the job was recently moved from the cluster with
SSD disk to SATA and SSD disk. On the old hardware, everything worked fine.
Flink version is 1.6.2. There were FLASH optimized setting for RocksDB.
I've changed to SPINNING_DISK_OPTIMIZED and it didn't have any effect.

Old servers:
https://www.hetzner.de/dedicated-rootserver/px91-ssd

New Server:
https://www.hetzner.de/dedicated-rootserver/ax60-ssd

On Mon, Dec 3, 2018 at 8:07 PM Sayat Satybaldiyev <st...@gmail.com>
wrote:

> Dear Flink community,
>
> Would anyone give a clue how to debug a job that has a high backpressure
> in the kafka source? We have a flink job that joins two stream via Process
> Function and Rocksdb state backend from two kafka topics. The job is
> significantly lagging behind ~8 hours and produces an incorrect result.
>
> Flink UI gives a hint that Source Functions(recommendation stream and
> custom source) are backpressure while recommendation-click join is fine.
>
> I've looked into JM and TM logs and there's nothing stage to me. Except
> "Kafka error sending fetch request" which happens during a checkpoint.
> Checkpoints happen once in 20min and utilize almost all network interface.
>
> Please find UI screenshots and flink logs attached to this email.
>
>
> https://drive.google.com/file/d/14h8zwC_49wxt5uNPYtM3LN6WhJ7lyeVS/view?usp=sharing
>
> https://drive.google.com/file/d/1s6I___S7u0pBJyWdnmYaH0e_MwGr3CgY/view?usp=sharing
>
>