You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "sunchao (via GitHub)" <gi...@apache.org> on 2023/02/03 17:44:18 UTC

[GitHub] [spark] sunchao commented on pull request #39633: [SPARK-42038][SQL] SPJ: Support partially clustered distribution

sunchao commented on PR #39633:
URL: https://github.com/apache/spark/pull/39633#issuecomment-1416197756

   Yes, converted.
   
   I found it is quite difficult to move the logic out of `EnsureRequirements`, because, as mentioned above, the optimization also depends on `reorderJoinPredicates` which needs to be executed together with `ensureDistributionAndOrdering` in a lock-step as we go up the query plan.
   
   In addition, even if I extract out `reorderJoinPredicates` into a separate util method or something, it's still difficult to make the optimization as a separate rule because it also depends on certain part of the logic in `ensureDistributionAndOrdering` (for instance, as we go up the query plan tree, we'd expect join branches that have incompatible `KeyGroupedPartitioning`s to be handled by `ensureDistributionAndOrdering` and converted to use hash partitioning, so that we should not try to apply the optimization again on those branches).
   
   As a compromise, I've extracted all special logic related to `KeyGroupedPartitioning` into a separate method `checkKeyGroupCompatible`, so the main body of `ensureDistributionAndOrdering` now looks much simpler. If necessary, I can also move this method (and related) into a separate class file instead, to make them more isolated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org