You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Jan Lukavský <je...@seznam.cz> on 2021/10/18 09:30:01 UTC
Re: Performance of Apache Beam
Hi Azhar,
-dev <ma...@beam.apache.org> +user <ma...@beam.apache.org>
this kind of question cannot be answered in general. The overhead will
depend on the job and the SDK you use. Using Java SDK with (classical)
FlinkRunner should give the best performance on Flink, although the
overhead will not be completely nullified. The way Beam is constructed -
with portability being one of the main concerns - necessarily brings
some overhead compared to the job being written and optimized for single
runner only (using Flink's native API in this case). I'd suggest you
evaluate the programming model and portability guarantees, that Apache
Beam gives you instead of pure performance. On the other hand Apache
Beam tries hard to minimize the overhead, so you should not expect
*vastly* worse performance. I'd say the best way to go is to implement a
simplistic Pipeline somewhat representing your use-case and then measure
the performance on this specific instance.
Regarding fault-tolerance and backpressure, Apache Beam model does not
handle those (with the exception of bundles being processed as atomic
units), so these are delegated to the runner - FlinkRunner will
therefore behave the way Apache Flink defines these concepts.
Hope this helps,
Jan
On 10/17/21 17:53, azhar mirza wrote:
> Hi Team
> Could you please let me know following below answers .
>
> I need to know performance of apache beam vs flink if we use flink as
> runner for Beam, what will be the additional overhead converting Beam
> to flink
>
> How fault tolerance and resiliency handled in apache beam.
> How apache beam handles backpressure?
>
> Thanks
> Azhar
Re: Performance of Apache Beam
Posted by Alexey Romanenko <ar...@gmail.com>.
+ Azhar (just in case)
> On 18 Oct 2021, at 11:30, Jan Lukavský <je...@seznam.cz> wrote:
>
> Hi Azhar,
>
> -dev <ma...@beam.apache.org> +user <ma...@beam.apache.org>
> this kind of question cannot be answered in general. The overhead will depend on the job and the SDK you use. Using Java SDK with (classical) FlinkRunner should give the best performance on Flink, although the overhead will not be completely nullified. The way Beam is constructed - with portability being one of the main concerns - necessarily brings some overhead compared to the job being written and optimized for single runner only (using Flink's native API in this case). I'd suggest you evaluate the programming model and portability guarantees, that Apache Beam gives you instead of pure performance. On the other hand Apache Beam tries hard to minimize the overhead, so you should not expect *vastly* worse performance. I'd say the best way to go is to implement a simplistic Pipeline somewhat representing your use-case and then measure the performance on this specific instance.
>
> Regarding fault-tolerance and backpressure, Apache Beam model does not handle those (with the exception of bundles being processed as atomic units), so these are delegated to the runner - FlinkRunner will therefore behave the way Apache Flink defines these concepts.
>
> Hope this helps,
>
> Jan
>
> On 10/17/21 17:53, azhar mirza wrote:
>> Hi Team
>> Could you please let me know following below answers .
>>
>> I need to know performance of apache beam vs flink if we use flink as runner for Beam, what will be the additional overhead converting Beam to flink
>>
>> How fault tolerance and resiliency handled in apache beam.
>> How apache beam handles backpressure?
>>
>> Thanks
>> Azhar