You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/04 07:50:46 UTC

[GitHub] [spark] fangshil commented on issue #20303: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

fangshil commented on issue #20303: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL
URL: https://github.com/apache/spark/pull/20303#issuecomment-469151637
 
 
   Excited to see AE making progress in upstream:) We have used the new AE framework to add SQL optimization rules and the result looks very promising. We have a few comments for this patch in general:
   
   1. The current patch handles shuffle parallelism on reducer side, as it starts with a relatively large number of mapper partitions(500), and merge into fewer reducer partitions by allowing each reducer to read multiple mappers. For large data scale, setting 10K to spark.sql.shuffle.partitions in non-AE VS maxNumPostShufflePartitions in AE should have same results since the reducer number won't change when data is large. 
   
   I think with this patch, we haven't got the optimal performance since we only save the overhead of launching a certain number reduce tasks. A better approach would be dynamically estimating the initial/mapper parallelism between 0 and maxNumPostShufflePartitions. This should be made possible by AE as well, while this patch should be a solid foundation for future improvements. Hope we can merge it soon!
   
   2. This patch uses submitMapStage API. The API would submit each stage as a new job, so AE breaks Spark's vanilla definition of a job. This is an issue coming from the original AE, not this new AE.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org