You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/01 09:50:38 UTC

[GitHub] carsonwang commented on issue #20303: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

carsonwang commented on issue #20303: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL
URL: https://github.com/apache/spark/pull/20303#issuecomment-459666711
 
 
   @justinuang , in that article, only a few queries can benefit from optimizing the join type or handling skewed join at runtime. Most of the queries only benefit from setting the reducer number which improved about 1-20% performance. The percentage also depends on how we set the shuffle partition number in non-AE mode and the minNumPostShufflePartitions/maxNumPostShufflePartitions in AE . For a small data scale, the default shuffle partition number 200 is enough. But for 100 TB data scale, we set it to 10976 so all queries can run.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org