You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/09 23:06:04 UTC

[GitHub] [spark] sarutak edited a comment on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements

sarutak edited a comment on pull request #29677:
URL: https://github.com/apache/spark/pull/29677#issuecomment-689860782


   @c21 Thanks for the comment.
   > Users can choose to remove these repartitionByRange/orderBy in query by themselves to save the shuffle/sort, as they are not necessary to add.
   
   Yes, user can choose it but it requires users to understand how Spark and Spark SQL work internally and some distributed computing knowledge beforehand.
   Also, if a data processing logic or query is very complex, it will be difficult for users to judge which repartition operations can be removed.
   Should Spark hide complexity for users?
   
   > E.g. we can have more complicated case if user don't do the right thing: spark.range(1, 100).repartitionByRange(10, $"id".desc).repartitionByRange(10, $"id").orderBy($"id"), should we also handle these cases?
   
   Actually, this case is already handled by `CollapseRepartition` rule.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org