You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streampipes.apache.org by Muhammad Faizan <m....@eu.denso.com> on 2021/06/17 10:18:25 UTC

Performance Issues

Hi,

I have been evaluating performance comparison of pipelines created through StreamPipes and also manually with Flink and Kafka. I have noticed a weird performance down-grade in case of StreamPipes.

I have two setups:

  1.  Pipeline created manually using Flink and Kafka (without StreamPipes)
  2.  Pipeline created using StreamPipes.
     *   Main data source is added to StreamPipes using Kafka Adaptor
     *   Using this data source in StreamPipes pipeline definition.

The problem is that in case of StreamPipes created pipeline, the performance (number of events consumed per second) in Flink is very low (near to 500 events / sec), but in case of manual Setup 1, the Flink is able to consume ~10,000 events / sec from Kafka. See image below:

[cid:image003.jpg@01D76372.EED22510]

The pipeline in StreamPipes is:
[Diagram  Description automatically generated]


In case of StreamPipes, I see that when a new data source (Kafka Adapter) is connected, it replicates the events to a new Kafka Topic which is then used by the created pipeline. Can that be a reason for performance down-grade? Or do you thing something is wrong with my setup or missing anything?


Mit freundlichen Grüßen / Best regards / Yoroshiku Onegai Shimasu

[cid:image005.png@01D76372.EED22510]
i.A. Muhammad Faizan
Student
Corporate R&D
DENSO AUTOMOTIVE Deutschland GmbH

Phone: +49 8165 944 201 / NiceNet: 5033-201 / m.faizan@eu.denso.com<ma...@eu.denso.com> / www.denso.com<http://www.denso.com/>
Located at: Eching Office, Freisinger Str. 21-23, 85386 Eching, Bayern, Germany
Geschäftsführer: Yuji Ishizuka, Yoshio Nakano, Kazuoki Matsugatani, Taro Tabata
Registergericht München: HR B 72576, VAT-ID No.: DE 129426275, St.-Nr. 115/124/30084

This e-mail message is intended only for the use of the named recipient(s). The information contained
therein may be confidential or privileged, and its disclosure or reproduction is strictly prohibited. If you
are not the named recipient, please return it immediately to its sender at the above address and destroy
it. Environmental preservation and harmony with society is one of DENSO's four management principles.
Please consider the environment before printing this e-mail.

Re: Performance Issues

Posted by Xin Wang <da...@gmail.com>.
Hi,

Btw, now I'm working on upgrading the flink-wrapper version to latest
(1.13.1).

Thanks,
Xin


Patrick Wiener <wi...@apache.org> 于2021年6月22日周二 下午5:36写道:

> Hi Muhammad,
>
> Well, I don’t think the version diffs of Flink have such an impact. Do you
> have more fine-grained evaluation figures, e.g. whats the throughput of the
> SP Connect Kafka adapter?
>
> The Connect Kafka adapter is the first entry point into SP and thus
> determines the max throughput after all. Similarly for every other
> subsequent processor up to the sink.
> In addition, de/serialization is always quite expensive, especially as we
> do this in every intermittent processor step as we consume/produce events
> from/to Kafka.
>
> Patrick
>
> > Am 21.06.2021 um 11:06 schrieb Muhammad Faizan <m....@eu.denso.com>:
> >
> > Hi Dominik,
> >
> > The events were thingified JSON in both cases and I was using the
> default Kafka batch size. The configurations were quite the same, only the
> difference was Flink version: StreamPipes works with Flink 1.9.1 and my
> other setup was with Flink 1.11.1.
> >
> >
> > Mit freundlichen Grüßen / Best regards / Yoroshiku Onegai Shimasu
> >
> > i.A. Muhammad Faizan
> > Student / Corporate R&D / DENSO AUTOMOTIVE Deutschland GmbH
> > Phone: +49 8165 944 201 / NiceNet: 5033-201 / m.faizan@eu.denso.com /
> www.denso.com
> >
> >
> >
> > -----Original Message-----
> > From: Dominik Riemer <ri...@apache.org>
> > Sent: Friday, June 18, 2021 9:43 PM
> > To: dev@streampipes.apache.org
> > Subject: Re: Performance Issues
> >
> > Hi Muhammad,
> >
> > thanks for reporting this - this seems to be quite a downgrade and we
> should investigate this.
> > I haven't used the flink wrapper for a while now but will look into this
> as soon as possible.
> >
> > Were both tests similar in terms of their configuration? Which Kafka
> settings did you use in your standalone setting (e.g., batch size) and how
> did you serialize events in this setting?
> >
> > Dominik
> >
> >
> > On 2021/06/17 10:18:25, Muhammad Faizan <m....@eu.denso.com> wrote:
> >> Hi,
> >>
> >> I have been evaluating performance comparison of pipelines created
> through StreamPipes and also manually with Flink and Kafka. I have noticed
> a weird performance down-grade in case of StreamPipes.
> >>
> >> I have two setups:
> >>
> >>  1.  Pipeline created manually using Flink and Kafka (without
> StreamPipes)
> >>  2.  Pipeline created using StreamPipes.
> >>     *   Main data source is added to StreamPipes using Kafka Adaptor
> >>     *   Using this data source in StreamPipes pipeline definition.
> >>
> >> The problem is that in case of StreamPipes created pipeline, the
> performance (number of events consumed per second) in Flink is very low
> (near to 500 events / sec), but in case of manual Setup 1, the Flink is
> able to consume ~10,000 events / sec from Kafka. See image below:
> >>
> >> [cid:image003.jpg@01D76372.EED22510]
> >>
> >> The pipeline in StreamPipes is:
> >> [Diagram  Description automatically generated]
> >>
> >>
> >> In case of StreamPipes, I see that when a new data source (Kafka
> Adapter) is connected, it replicates the events to a new Kafka Topic which
> is then used by the created pipeline. Can that be a reason for performance
> down-grade? Or do you thing something is wrong with my setup or missing
> anything?
> >>
> >>
> >> Mit freundlichen Grüßen / Best regards / Yoroshiku Onegai Shimasu
> >>
> >> [cid:image005.png@01D76372.EED22510]
> >> i.A. Muhammad Faizan
> >> Student
> >> Corporate R&D
> >> DENSO AUTOMOTIVE Deutschland GmbH
> >>
> >> Phone: +49 8165 944 201 / NiceNet: 5033-201 /
> >> m.faizan@eu.denso.com<ma...@eu.denso.com> /
> >> www.denso.com<http://www.denso.com/>
> >> Located at: Eching Office, Freisinger Str. 21-23, 85386 Eching,
> >> Bayern, Germany
> >> Geschäftsführer: Yuji Ishizuka, Yoshio Nakano, Kazuoki Matsugatani,
> >> Taro Tabata Registergericht München: HR B 72576, VAT-ID No.: DE
> >> 129426275, St.-Nr. 115/124/30084
> >>
> >> This e-mail message is intended only for the use of the named
> >> recipient(s). The information contained therein may be confidential or
> >> privileged, and its disclosure or reproduction is strictly prohibited.
> >> If you are not the named recipient, please return it immediately to its
> sender at the above address and destroy it. Environmental preservation and
> harmony with society is one of DENSO's four management principles.
> >> Please consider the environment before printing this e-mail.
> >>
>
>

-- 
Thanks,
Xin

Re: Performance Issues

Posted by Patrick Wiener <wi...@apache.org>.
Hi Muhammad,

Well, I don’t think the version diffs of Flink have such an impact. Do you have more fine-grained evaluation figures, e.g. whats the throughput of the SP Connect Kafka adapter?

The Connect Kafka adapter is the first entry point into SP and thus determines the max throughput after all. Similarly for every other subsequent processor up to the sink.
In addition, de/serialization is always quite expensive, especially as we do this in every intermittent processor step as we consume/produce events from/to Kafka.

Patrick

> Am 21.06.2021 um 11:06 schrieb Muhammad Faizan <m....@eu.denso.com>:
> 
> Hi Dominik,
> 
> The events were thingified JSON in both cases and I was using the default Kafka batch size. The configurations were quite the same, only the difference was Flink version: StreamPipes works with Flink 1.9.1 and my other setup was with Flink 1.11.1.
> 
> 
> Mit freundlichen Grüßen / Best regards / Yoroshiku Onegai Shimasu
> 
> i.A. Muhammad Faizan
> Student / Corporate R&D / DENSO AUTOMOTIVE Deutschland GmbH
> Phone: +49 8165 944 201 / NiceNet: 5033-201 / m.faizan@eu.denso.com / www.denso.com
> 
> 
> 
> -----Original Message-----
> From: Dominik Riemer <ri...@apache.org> 
> Sent: Friday, June 18, 2021 9:43 PM
> To: dev@streampipes.apache.org
> Subject: Re: Performance Issues
> 
> Hi Muhammad,
> 
> thanks for reporting this - this seems to be quite a downgrade and we should investigate this.
> I haven't used the flink wrapper for a while now but will look into this as soon as possible.
> 
> Were both tests similar in terms of their configuration? Which Kafka settings did you use in your standalone setting (e.g., batch size) and how did you serialize events in this setting?
> 
> Dominik
> 
> 
> On 2021/06/17 10:18:25, Muhammad Faizan <m....@eu.denso.com> wrote: 
>> Hi,
>> 
>> I have been evaluating performance comparison of pipelines created through StreamPipes and also manually with Flink and Kafka. I have noticed a weird performance down-grade in case of StreamPipes.
>> 
>> I have two setups:
>> 
>>  1.  Pipeline created manually using Flink and Kafka (without StreamPipes)
>>  2.  Pipeline created using StreamPipes.
>>     *   Main data source is added to StreamPipes using Kafka Adaptor
>>     *   Using this data source in StreamPipes pipeline definition.
>> 
>> The problem is that in case of StreamPipes created pipeline, the performance (number of events consumed per second) in Flink is very low (near to 500 events / sec), but in case of manual Setup 1, the Flink is able to consume ~10,000 events / sec from Kafka. See image below:
>> 
>> [cid:image003.jpg@01D76372.EED22510]
>> 
>> The pipeline in StreamPipes is:
>> [Diagram  Description automatically generated]
>> 
>> 
>> In case of StreamPipes, I see that when a new data source (Kafka Adapter) is connected, it replicates the events to a new Kafka Topic which is then used by the created pipeline. Can that be a reason for performance down-grade? Or do you thing something is wrong with my setup or missing anything?
>> 
>> 
>> Mit freundlichen Grüßen / Best regards / Yoroshiku Onegai Shimasu
>> 
>> [cid:image005.png@01D76372.EED22510]
>> i.A. Muhammad Faizan
>> Student
>> Corporate R&D
>> DENSO AUTOMOTIVE Deutschland GmbH
>> 
>> Phone: +49 8165 944 201 / NiceNet: 5033-201 / 
>> m.faizan@eu.denso.com<ma...@eu.denso.com> / 
>> www.denso.com<http://www.denso.com/>
>> Located at: Eching Office, Freisinger Str. 21-23, 85386 Eching, 
>> Bayern, Germany
>> Geschäftsführer: Yuji Ishizuka, Yoshio Nakano, Kazuoki Matsugatani, 
>> Taro Tabata Registergericht München: HR B 72576, VAT-ID No.: DE 
>> 129426275, St.-Nr. 115/124/30084
>> 
>> This e-mail message is intended only for the use of the named 
>> recipient(s). The information contained therein may be confidential or 
>> privileged, and its disclosure or reproduction is strictly prohibited. 
>> If you are not the named recipient, please return it immediately to its sender at the above address and destroy it. Environmental preservation and harmony with society is one of DENSO's four management principles.
>> Please consider the environment before printing this e-mail.
>> 


RE: Performance Issues

Posted by Muhammad Faizan <m....@eu.denso.com>.
Hi Dominik,

The events were thingified JSON in both cases and I was using the default Kafka batch size. The configurations were quite the same, only the difference was Flink version: StreamPipes works with Flink 1.9.1 and my other setup was with Flink 1.11.1.


Mit freundlichen Grüßen / Best regards / Yoroshiku Onegai Shimasu

i.A. Muhammad Faizan
Student / Corporate R&D / DENSO AUTOMOTIVE Deutschland GmbH
Phone: +49 8165 944 201 / NiceNet: 5033-201 / m.faizan@eu.denso.com / www.denso.com



-----Original Message-----
From: Dominik Riemer <ri...@apache.org> 
Sent: Friday, June 18, 2021 9:43 PM
To: dev@streampipes.apache.org
Subject: Re: Performance Issues

Hi Muhammad,

thanks for reporting this - this seems to be quite a downgrade and we should investigate this.
I haven't used the flink wrapper for a while now but will look into this as soon as possible.

Were both tests similar in terms of their configuration? Which Kafka settings did you use in your standalone setting (e.g., batch size) and how did you serialize events in this setting?

Dominik


On 2021/06/17 10:18:25, Muhammad Faizan <m....@eu.denso.com> wrote: 
> Hi,
> 
> I have been evaluating performance comparison of pipelines created through StreamPipes and also manually with Flink and Kafka. I have noticed a weird performance down-grade in case of StreamPipes.
> 
> I have two setups:
> 
>   1.  Pipeline created manually using Flink and Kafka (without StreamPipes)
>   2.  Pipeline created using StreamPipes.
>      *   Main data source is added to StreamPipes using Kafka Adaptor
>      *   Using this data source in StreamPipes pipeline definition.
> 
> The problem is that in case of StreamPipes created pipeline, the performance (number of events consumed per second) in Flink is very low (near to 500 events / sec), but in case of manual Setup 1, the Flink is able to consume ~10,000 events / sec from Kafka. See image below:
> 
> [cid:image003.jpg@01D76372.EED22510]
> 
> The pipeline in StreamPipes is:
> [Diagram  Description automatically generated]
> 
> 
> In case of StreamPipes, I see that when a new data source (Kafka Adapter) is connected, it replicates the events to a new Kafka Topic which is then used by the created pipeline. Can that be a reason for performance down-grade? Or do you thing something is wrong with my setup or missing anything?
> 
> 
> Mit freundlichen Grüßen / Best regards / Yoroshiku Onegai Shimasu
> 
> [cid:image005.png@01D76372.EED22510]
> i.A. Muhammad Faizan
> Student
> Corporate R&D
> DENSO AUTOMOTIVE Deutschland GmbH
> 
> Phone: +49 8165 944 201 / NiceNet: 5033-201 / 
> m.faizan@eu.denso.com<ma...@eu.denso.com> / 
> www.denso.com<http://www.denso.com/>
> Located at: Eching Office, Freisinger Str. 21-23, 85386 Eching, 
> Bayern, Germany
> Geschäftsführer: Yuji Ishizuka, Yoshio Nakano, Kazuoki Matsugatani, 
> Taro Tabata Registergericht München: HR B 72576, VAT-ID No.: DE 
> 129426275, St.-Nr. 115/124/30084
> 
> This e-mail message is intended only for the use of the named 
> recipient(s). The information contained therein may be confidential or 
> privileged, and its disclosure or reproduction is strictly prohibited. 
> If you are not the named recipient, please return it immediately to its sender at the above address and destroy it. Environmental preservation and harmony with society is one of DENSO's four management principles.
> Please consider the environment before printing this e-mail.
> 

Re: Performance Issues

Posted by Dominik Riemer <ri...@apache.org>.
Hi Muhammad,

thanks for reporting this - this seems to be quite a downgrade and we should investigate this.
I haven't used the flink wrapper for a while now but will look into this as soon as possible.

Were both tests similar in terms of their configuration? Which Kafka settings did you use in your standalone setting (e.g., batch size) and how did you serialize events in this setting?

Dominik


On 2021/06/17 10:18:25, Muhammad Faizan <m....@eu.denso.com> wrote: 
> Hi,
> 
> I have been evaluating performance comparison of pipelines created through StreamPipes and also manually with Flink and Kafka. I have noticed a weird performance down-grade in case of StreamPipes.
> 
> I have two setups:
> 
>   1.  Pipeline created manually using Flink and Kafka (without StreamPipes)
>   2.  Pipeline created using StreamPipes.
>      *   Main data source is added to StreamPipes using Kafka Adaptor
>      *   Using this data source in StreamPipes pipeline definition.
> 
> The problem is that in case of StreamPipes created pipeline, the performance (number of events consumed per second) in Flink is very low (near to 500 events / sec), but in case of manual Setup 1, the Flink is able to consume ~10,000 events / sec from Kafka. See image below:
> 
> [cid:image003.jpg@01D76372.EED22510]
> 
> The pipeline in StreamPipes is:
> [Diagram  Description automatically generated]
> 
> 
> In case of StreamPipes, I see that when a new data source (Kafka Adapter) is connected, it replicates the events to a new Kafka Topic which is then used by the created pipeline. Can that be a reason for performance down-grade? Or do you thing something is wrong with my setup or missing anything?
> 
> 
> Mit freundlichen Grüßen / Best regards / Yoroshiku Onegai Shimasu
> 
> [cid:image005.png@01D76372.EED22510]
> i.A. Muhammad Faizan
> Student
> Corporate R&D
> DENSO AUTOMOTIVE Deutschland GmbH
> 
> Phone: +49 8165 944 201 / NiceNet: 5033-201 / m.faizan@eu.denso.com<ma...@eu.denso.com> / www.denso.com<http://www.denso.com/>
> Located at: Eching Office, Freisinger Str. 21-23, 85386 Eching, Bayern, Germany
> Geschäftsführer: Yuji Ishizuka, Yoshio Nakano, Kazuoki Matsugatani, Taro Tabata
> Registergericht München: HR B 72576, VAT-ID No.: DE 129426275, St.-Nr. 115/124/30084
> 
> This e-mail message is intended only for the use of the named recipient(s). The information contained
> therein may be confidential or privileged, and its disclosure or reproduction is strictly prohibited. If you
> are not the named recipient, please return it immediately to its sender at the above address and destroy
> it. Environmental preservation and harmony with society is one of DENSO's four management principles.
> Please consider the environment before printing this e-mail.
>