You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/10/29 05:44:13 UTC

[GitHub] [spark] huaxingao opened a new pull request, #38434: [SPARK-40946][SQL] Add a new DataSource V2 interface SupportsPushDownClusterKeys

huaxingao opened a new pull request, #38434:
URL: https://github.com/apache/spark/pull/38434

   
   
   ### What changes were proposed in this pull request?
   ```
   /**
    * A mix-in interface for {@link ScanBuilder}. Data sources can implement this interface to
    * push down all the join or aggregate keys to data sources. A return value true indicates
    * that data source will return input partitions (via planInputPartitions} following the
    * clustering keys. Otherwise, a false return value indicates the data source doesn't make
    * such a guarantee, even though it may still report a partitioning that may or may not
    * be compatible with the given clustering keys, and it's Spark's responsibility to group
    * the input partitions whether it can be applied.
    *
    * @since 3.4.0
    */
   @Evolving
   public interface SupportsPushDownClusterKeys extends ScanBuilder {
   ```
   
   
   ### Why are the changes needed?
   Pass down the information of join keys to v2 data sources so the data sources can decide how to combine the input splits according to the joins keys.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, new interface `SupportsPushDownClusterKeys`
   
   
   ### How was this patch tested?
   new tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #38434: [SPARK-40946][SQL] Add a new DataSource V2 interface SupportsPushDownClusterKeys

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #38434:
URL: https://github.com/apache/spark/pull/38434#issuecomment-1435417608

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #38434: [SPARK-40946][SQL] Add a new DataSource V2 interface SupportsPushDownClusterKeys

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #38434: [SPARK-40946][SQL] Add a new DataSource V2 interface SupportsPushDownClusterKeys
URL: https://github.com/apache/spark/pull/38434


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] huaxingao commented on pull request #38434: [SPARK-40946][SQL] Add a new DataSource V2 interface SupportsPushDownClusterKeys

Posted by GitBox <gi...@apache.org>.
huaxingao commented on PR #38434:
URL: https://github.com/apache/spark/pull/38434#issuecomment-1306708557

   @cloud-fan Could you please take a look when you have some time? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #38434: [SPARK-40946][SQL] Add a new DataSource V2 interface SupportsPushDownClusterKeys

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #38434:
URL: https://github.com/apache/spark/pull/38434#issuecomment-1308715889

   I think this needs a bit more design. Partitioning is a physical property it's very weird to "pushdown" it at the logical phase. I think what we really need is tracking the requirement when doing top-down planning. e.g. when we planning a sort merge join, we should track the requirement (partitioned and ordered by join keys) when planning the join children. This is also an idea from the volcano optimizer and is a widely adopted technology.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org