You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Kumar Bolar, Harshith" <hk...@arity.com> on 2019/06/14 16:19:43 UTC

Effect of increasing parallelism on throughput

Hi all,

I ran a job first with Parallelism 1 and then with Parallelism 3. With Parallelism=1, the kafka source was reading records at rate ~500 records per second. With Parallelism=3, the throughput got divided among the three parallelisms, each reading approximately ~150 records per second. Note that the source is publishing records at a much higher rate (~1000 records per second).
Is this expected? I would imagine the throughput to increase with parallelism, but it is remaining the same. I checked the Backpressure status on the source, it was High.
Screenshots for reference:
Parallelism 1:
[cid:image001.png@01D522FB.1174A350]
Parallelism 3:
[cid:image002.png@01D522FB.1174A350]

Thank you,
Harshith



Re: Effect of increasing parallelism on throughput

Posted by zhijiang <wa...@aliyun.com>.
Hi Harshith,

I guess the throughput is limited by the lowest vertex which causes backpressure in topology. That means the downstream task could only consume that rate which distributes fairly in all the upstream source tasks. The higher source task would be blocked to produce more records in backpressure. In the non-backpressure mode I think throughput might be increased with parallelism.

Best,
Zhijiang
------------------------------------------------------------------
From:Kumar Bolar, Harshith <hk...@arity.com>
Send Time:2019年6月15日(星期六) 00:20
To:user <us...@flink.apache.org>
Subject:Effect of increasing parallelism on throughput


Hi all,

I ran a job first with Parallelism 1 and then with Parallelism 3. With Parallelism=1, the kafka source was reading records at rate ~500 records per second. With Parallelism=3, the throughput got divided among the three parallelisms, each reading approximately ~150 records per second. Note that the source is publishing records at a much higher rate (~1000 records per second).
Is this expected? I would imagine the throughput to increase with parallelism, but it is remaining the same. I checked the Backpressure status on the source, it was High.
Screenshots for reference:
Parallelism 1:

Parallelism 3:


Thank you,
 Harshith