You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Vijayendra Yadav <co...@gmail.com> on 2020/08/13 21:24:22 UTC

Performance Flink streaming kafka consumer sink to s3

Hi Team,

I am trying to increase throughput of my flink stream job streaming from
kafka source and sink to s3. Currently it is running fine for small events
records. But records with large payloads are running extremely slow like at
rate 2 TPS.

Could you provide some best practices to tune?
Also, can we increase parallel processing, beyond the number of
kafka partitions that we have, without causing any overhead ?

Regards,
Vijay

Re: Performance Flink streaming kafka consumer sink to s3

Posted by Vijayendra Yadav <co...@gmail.com>.

Hi, Do you think there can be any issue with Flinks performance, with 400Kb
up to 1 MB payload record sizes ? my Spark streaming seems to be doing
better. Are there any recommended configurations or increasing parallelism
to improve Flink streaming  using flink kafka connect?

Regards,
Vijay


On Fri, Aug 14, 2020 at 2:04 PM Vijayendra Yadav <co...@gmail.com>
wrote:

> Hi Robert,
>
> Thanks for information. payloads so far are 400KB (each record).
> To achieve high parallelism at the downstream operator do I rebalance the
> kafka stream ? Could you give me an example please.
>
> Regards,
> Vijay
>
>
> On Fri, Aug 14, 2020 at 12:50 PM Robert Metzger <rm...@apache.org>
> wrote:
>
>> Hi,
>>
>> Also, can we increase parallel processing, beyond the number of
>>> kafka partitions that we have, without causing any overhead ?
>>
>>
>> Yes, the Kafka sources produce a tiny bit of overhead, but the potential
>> benefit of having downstream operators at a high parallelism might be much
>> bigger.
>>
>> How large is a large payload in your case?
>>
>> Best practices:
>> Try to understand what's causing the performance slowdown: Kafka or S3 ?
>> You can do a test where you read from kafka, and write it into a
>> discarding sink.
>> Likewise, use a datagenerator source, and write into S3.
>>
>> Do the math on your job: What's the theoretical limits of your job:
>> https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines
>>
>> Hope this helps,
>> Robert
>>
>>
>> On Thu, Aug 13, 2020 at 11:25 PM Vijayendra Yadav <co...@gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> I am trying to increase throughput of my flink stream job streaming from
>>> kafka source and sink to s3. Currently it is running fine for small events
>>> records. But records with large payloads are running extremely slow like at
>>> rate 2 TPS.
>>>
>>> Could you provide some best practices to tune?
>>> Also, can we increase parallel processing, beyond the number of
>>> kafka partitions that we have, without causing any overhead ?
>>>
>>> Regards,
>>> Vijay
>>>
>>

Re: Performance Flink streaming kafka consumer sink to s3

Posted by Vijayendra Yadav <co...@gmail.com>.

Hi Robert,

Thanks for information. payloads so far are 400KB (each record).
To achieve high parallelism at the downstream operator do I rebalance the
kafka stream ? Could you give me an example please.

Regards,
Vijay


On Fri, Aug 14, 2020 at 12:50 PM Robert Metzger <rm...@apache.org> wrote:

> Hi,
>
> Also, can we increase parallel processing, beyond the number of
>> kafka partitions that we have, without causing any overhead ?
>
>
> Yes, the Kafka sources produce a tiny bit of overhead, but the potential
> benefit of having downstream operators at a high parallelism might be much
> bigger.
>
> How large is a large payload in your case?
>
> Best practices:
> Try to understand what's causing the performance slowdown: Kafka or S3 ?
> You can do a test where you read from kafka, and write it into a
> discarding sink.
> Likewise, use a datagenerator source, and write into S3.
>
> Do the math on your job: What's the theoretical limits of your job:
> https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines
>
> Hope this helps,
> Robert
>
>
> On Thu, Aug 13, 2020 at 11:25 PM Vijayendra Yadav <co...@gmail.com>
> wrote:
>
>> Hi Team,
>>
>> I am trying to increase throughput of my flink stream job streaming from
>> kafka source and sink to s3. Currently it is running fine for small events
>> records. But records with large payloads are running extremely slow like at
>> rate 2 TPS.
>>
>> Could you provide some best practices to tune?
>> Also, can we increase parallel processing, beyond the number of
>> kafka partitions that we have, without causing any overhead ?
>>
>> Regards,
>> Vijay
>>
>

Re: Performance Flink streaming kafka consumer sink to s3

Posted by Robert Metzger <rm...@apache.org>.

Hi,

Also, can we increase parallel processing, beyond the number of
> kafka partitions that we have, without causing any overhead ?

Yes, the Kafka sources produce a tiny bit of overhead, but the potential
benefit of having downstream operators at a high parallelism might be much
bigger.

How large is a large payload in your case?

Best practices:
Try to understand what's causing the performance slowdown: Kafka or S3 ?
You can do a test where you read from kafka, and write it into a discarding
sink.
Likewise, use a datagenerator source, and write into S3.

Do the math on your job: What's the theoretical limits of your job:
https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines

Hope this helps,
Robert

On Thu, Aug 13, 2020 at 11:25 PM Vijayendra Yadav <co...@gmail.com>
wrote:

> Hi Team,
>
> I am trying to increase throughput of my flink stream job streaming from
> kafka source and sink to s3. Currently it is running fine for small events
> records. But records with large payloads are running extremely slow like at
> rate 2 TPS.
>
> Could you provide some best practices to tune?
> Also, can we increase parallel processing, beyond the number of
> kafka partitions that we have, without causing any overhead ?
>
> Regards,
> Vijay
>