You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Jia, Ke A" <ke...@intel.com> on 2020/02/12 07:43:23 UTC

Adaptive Query Execution performance results in 3TB TPC-DS

Hi all,
We have completed the Spark 3.0 Adaptive Query Execution(AQE) performance tests in 3TB TPC-DS on 5 node Cascade Lake cluster. 2 queries bring about more than 1.5x performance and 37 queries bring more than 1.1x performance with AQE.  There is no query has significant performance degradations. The detail performance results and key configurations are shown in here<https://docs.google.com/spreadsheets/d/1uija2AFblciMcYzU4jnPiy6I8mU8-M0-HwSNNns5aLU/edit?usp=sharing>. Based on the performance result, we recommend users to turn on AQE in spark 3.0. If encounter any bug or improvement when enable AQE, please help to file related JIRAs. Thanks.

Regards,
Jia Ke


RE: Adaptive Query Execution performance results in 3TB TPC-DS

Posted by "Jia, Ke A" <ke...@intel.com>.
Hi Amogh,
Thanks for your interest in AQE work.

> Were any table stats available for TPC-DS during the runs ?
   We used the default configurations and didn't set special configurations (such as CBO) to collect the table stats both enable and disable AQE. And AQE mainly rely on the runtime statistic for further optimization not table stats. So it seems the effect of table stats may be small to this benchmark tests. Thanks.

Regards,
Jia Ke

From: Amogh Margoor <am...@qubole.com>
Sent: Friday, February 14, 2020 5:02 AM
To: Wenchen Fan <cl...@gmail.com>
Cc: Jia, Ke A <ke...@intel.com>; dev@spark.apache.org
Subject: Re: Adaptive Query Execution performance results in 3TB TPC-DS

Thanks Jia Ke for the numbers and they look promising.
Were any table stats available for TPC-DS during the runs ?

On Thu, Feb 13, 2020 at 4:07 AM Wenchen Fan <cl...@gmail.com>> wrote:
Thanks for providing the benchmark numbers! The result is very promising and I'm looking forward to seeing more feedback from real-world workloads.

On Wed, Feb 12, 2020 at 3:43 PM Jia, Ke A <ke...@intel.com>> wrote:
Hi all,
We have completed the Spark 3.0 Adaptive Query Execution(AQE) performance tests in 3TB TPC-DS on 5 node Cascade Lake cluster. 2 queries bring about more than 1.5x performance and 37 queries bring more than 1.1x performance with AQE.  There is no query has significant performance degradations. The detail performance results and key configurations are shown in here<https://docs.google.com/spreadsheets/d/1uija2AFblciMcYzU4jnPiy6I8mU8-M0-HwSNNns5aLU/edit?usp=sharing>. Based on the performance result, we recommend users to turn on AQE in spark 3.0. If encounter any bug or improvement when enable AQE, please help to file related JIRAs. Thanks.

Regards,
Jia Ke


Re: Adaptive Query Execution performance results in 3TB TPC-DS

Posted by Wenchen Fan <cl...@gmail.com>.
Thanks for providing the benchmark numbers! The result is very promising
and I'm looking forward to seeing more feedback from real-world workloads.

On Wed, Feb 12, 2020 at 3:43 PM Jia, Ke A <ke...@intel.com> wrote:

> Hi all,
>
> We have completed the Spark 3.0 Adaptive Query Execution(AQE) performance
> tests in 3TB TPC-DS on 5 node Cascade Lake cluster. 2 queries bring about
> more than 1.5x performance and 37 queries bring more than 1.1x performance
> with AQE.  There is no query has significant performance degradations. The
> detail performance results and key configurations are shown in here
> <https://docs.google.com/spreadsheets/d/1uija2AFblciMcYzU4jnPiy6I8mU8-M0-HwSNNns5aLU/edit?usp=sharing>.
> Based on the performance result, we recommend users to turn on AQE in spark
> 3.0. If encounter any bug or improvement when enable AQE, please help to
> file related JIRAs. Thanks.
>
>
>
> Regards,
>
> Jia Ke
>
>
>