You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Tao Li <ta...@zillow.com> on 2021/03/22 17:00:10 UTC

Is there a perf comparison between Beam (on spark) and native Spark?

Hi Beam community,

I am wondering if there is a doc to compare perf of Beam (on Spark) and native spark for batch processing? For example using TPCDS benmark.

I did find some relevant links like this<https://archive.fosdem.org/2018/schedule/event/nexmark_benchmarking_suite/attachments/slides/2494/export/events/attachments/nexmark_benchmarking_suite/slides/2494/Nexmark_Suite_for_Apache_Beam_(FOSDEM18).pdf> but it’s old and it mostly covers the streaming scenarios.

Thanks!

Re: Is there a perf comparison between Beam (on spark) and native Spark?

Posted by Tao Li <ta...@zillow.com>.
Thanks @Alexey Romanenko<ma...@gmail.com> for this info. Do we have a rough idea how Beam (on spark) compares with native Spark by using TPCDS or any benchmarks? I am just wondering if run Beam sql with Spark runner will have a similar processing time compared with Spark sql. Thanks!

From: Alexey Romanenko <ar...@gmail.com>
Reply-To: "user@beam.apache.org" <us...@beam.apache.org>
Date: Tuesday, March 23, 2021 at 12:58 PM
To: "user@beam.apache.org" <us...@beam.apache.org>
Subject: Re: Is there a perf comparison between Beam (on spark) and native Spark?

There is an extension in Beam to support TPC-DS benchmark [1] that basically runs TPC-DS SQL queries via Beam SQL. Though, I’m not sure if it runs regularly and, IIRC (when I took a look on this last time, maybe I’m mistaken), it requires some adjustments to run on any other runners than Dataflow. Also, when I tried to run it on SparkRunner many queries failed because of different reasons [2].

I believe that if we will manage to make it running for most of the queries on any runner then it will be a good addition to Nexmark benchmark that we have for now since TPC-DS results can be used to compare with other data processing systems as well.

[1] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fbeam%2Ftree%2Fmaster%2Fsdks%2Fjava%2Ftesting%2Ftpcds&data=04%7C01%7Ctaol%40zillow.com%7C3a7b26c3aead4633412408d8ee361603%7C033464830d1840e7a5883784ac50e16f%7C0%7C1%7C637521263368804132%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4Tjd1BcEHRJQUsH9DK1ASVM496nNaqZGetFD4%2F46B7k%3D&reserved=0>
[2] https://issues.apache.org/jira/browse/BEAM-9891<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FBEAM-9891&data=04%7C01%7Ctaol%40zillow.com%7C3a7b26c3aead4633412408d8ee361603%7C033464830d1840e7a5883784ac50e16f%7C0%7C1%7C637521263368804132%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ibmzJ3cPSHzDjVPBR4A5jTQTs2O2obmh%2FDQG2X3UBSg%3D&reserved=0>


On 22 Mar 2021, at 18:00, Tao Li <ta...@zillow.com>> wrote:

Hi Beam community,

I am wondering if there is a doc to compare perf of Beam (on Spark) and native spark for batch processing? For example using TPCDS benmark.

I did find some relevant links like this<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Farchive.fosdem.org%2F2018%2Fschedule%2Fevent%2Fnexmark_benchmarking_suite%2Fattachments%2Fslides%2F2494%2Fexport%2Fevents%2Fattachments%2Fnexmark_benchmarking_suite%2Fslides%2F2494%2FNexmark_Suite_for_Apache_Beam_(FOSDEM18).pdf&data=04%7C01%7Ctaol%40zillow.com%7C3a7b26c3aead4633412408d8ee361603%7C033464830d1840e7a5883784ac50e16f%7C0%7C1%7C637521263368814090%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4Dk5m6rlS8MLhHhiCY42bbGM3qZ2tzRQVxihL1TnL%2BU%3D&reserved=0> but it’s old and it mostly covers the streaming scenarios.

Thanks!


Re: Is there a perf comparison between Beam (on spark) and native Spark?

Posted by Alexey Romanenko <ar...@gmail.com>.
There is an extension in Beam to support TPC-DS benchmark [1] that basically runs TPC-DS SQL queries via Beam SQL. Though, I’m not sure if it runs regularly and, IIRC (when I took a look on this last time, maybe I’m mistaken), it requires some adjustments to run on any other runners than Dataflow. Also, when I tried to run it on SparkRunner many queries failed because of different reasons [2].

I believe that if we will manage to make it running for most of the queries on any runner then it will be a good addition to Nexmark benchmark that we have for now since TPC-DS results can be used to compare with other data processing systems as well.

[1] https://github.com/apache/beam/tree/master/sdks/java/testing/tpcds
[2] https://issues.apache.org/jira/browse/BEAM-9891

> On 22 Mar 2021, at 18:00, Tao Li <ta...@zillow.com> wrote:
> 
> Hi Beam community,
>  
> I am wondering if there is a doc to compare perf of Beam (on Spark) and native spark for batch processing? For example using TPCDS benmark.
>  
> I did find some relevant links like this <https://archive.fosdem.org/2018/schedule/event/nexmark_benchmarking_suite/attachments/slides/2494/export/events/attachments/nexmark_benchmarking_suite/slides/2494/Nexmark_Suite_for_Apache_Beam_(FOSDEM18).pdf> but it’s old and it mostly covers the streaming scenarios.
>  
> Thanks!


Re: Is there a perf comparison between Beam (on spark) and native Spark?

Posted by Boyuan Zhang <bo...@google.com>.
+Kyle Weaver <kc...@google.com>
Kyle, do you happen to have some information here?

On Mon, Mar 22, 2021 at 10:00 AM Tao Li <ta...@zillow.com> wrote:

> Hi Beam community,
>
>
>
> I am wondering if there is a doc to compare perf of Beam (on Spark) and
> native spark for batch processing? For example using TPCDS benmark.
>
>
>
> I did find some relevant links like this
> <https://archive.fosdem.org/2018/schedule/event/nexmark_benchmarking_suite/attachments/slides/2494/export/events/attachments/nexmark_benchmarking_suite/slides/2494/Nexmark_Suite_for_Apache_Beam_(FOSDEM18).pdf>
> but it’s old and it mostly covers the streaming scenarios.
>
>
>
> Thanks!
>