You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@beam.apache.org by Jan Lukavský <je...@seznam.cz> on 2021/10/18 09:30:01 UTC

Re: Performance of Apache Beam

Hi Azhar,

-dev <ma...@beam.apache.org> +user <ma...@beam.apache.org>

this kind of question cannot be answered in general. The overhead will 
depend on the job and the SDK you use. Using Java SDK with (classical) 
FlinkRunner should give the best performance on Flink, although the 
overhead will not be completely nullified. The way Beam is constructed - 
with portability being one of the main concerns - necessarily brings 
some overhead compared to the job being written and optimized for single 
runner only (using Flink's native API in this case). I'd suggest you 
evaluate the programming model and portability guarantees, that Apache 
Beam gives you instead of pure performance. On the other hand Apache 
Beam tries hard to minimize the overhead, so you should not expect 
*vastly* worse performance. I'd say the best way to go is to implement a 
simplistic Pipeline somewhat representing your use-case and then measure 
the performance on this specific instance.

Regarding fault-tolerance and backpressure, Apache Beam model does not 
handle those (with the exception of bundles being processed as atomic 
units), so these are delegated to the runner - FlinkRunner will 
therefore behave the way Apache Flink defines these concepts.

Hope this helps,

  Jan

On 10/17/21 17:53, azhar mirza wrote:
> Hi Team
> Could you please let me know following below answers .
>
> I need to know performance of apache beam vs flink if we use flink as 
> runner for Beam, what will be the additional overhead converting Beam 
> to flink
>
> How fault tolerance and resiliency handled in apache beam.
> How apache beam handles backpressure?
>
> Thanks
> Azhar

Re: Performance of Apache Beam

Posted by Alexey Romanenko <ar...@gmail.com>.

+ Azhar (just in case)

> On 18 Oct 2021, at 11:30, Jan Lukavský <je...@seznam.cz> wrote:
> 
> Hi Azhar,
> 
> -dev <ma...@beam.apache.org> +user <ma...@beam.apache.org>
> this kind of question cannot be answered in general. The overhead will depend on the job and the SDK you use. Using Java SDK with (classical) FlinkRunner should give the best performance on Flink, although the overhead will not be completely nullified. The way Beam is constructed - with portability being one of the main concerns - necessarily brings some overhead compared to the job being written and optimized for single runner only (using Flink's native API in this case). I'd suggest you evaluate the programming model and portability guarantees, that Apache Beam gives you instead of pure performance. On the other hand Apache Beam tries hard to minimize the overhead, so you should not expect *vastly* worse performance. I'd say the best way to go is to implement a simplistic Pipeline somewhat representing your use-case and then measure the performance on this specific instance.
> 
> Regarding fault-tolerance and backpressure, Apache Beam model does not handle those (with the exception of bundles being processed as atomic units), so these are delegated to the runner - FlinkRunner will therefore behave the way Apache Flink defines these concepts.
> 
> Hope this helps,
> 
>  Jan
> 
> On 10/17/21 17:53, azhar mirza wrote:
>> Hi Team
>> Could you please let me know following below answers .
>> 
>> I need to know performance of apache beam vs flink if we use flink as runner for Beam, what will be the additional overhead converting Beam to flink
>> 
>> How fault tolerance and resiliency handled in apache beam.
>> How apache beam handles backpressure?
>> 
>> Thanks
>> Azhar