You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "wangyum (via GitHub)" <gi...@apache.org> on 2023/07/22 09:40:19 UTC

[GitHub] [spark] wangyum opened a new pull request, #42112: [SPARK-44493][SQL] Extract pushable predicates from disjunctive predicates

wangyum opened a new pull request, #42112:
URL: https://github.com/apache/spark/pull/42112

   ### What changes were proposed in this pull request?
   
   
   ### Why are the changes needed?
   
   Pushdown more filters to improve query performance.
   
   ### Does this PR introduce _any_ user-facing 
   
   - [ ] change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #42112: [SPARK-44493][SQL] Support for translating catalyst expressions into partial datasource filters

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #42112:
URL: https://github.com/apache/spark/pull/42112#discussion_r1282720880


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala:
##########
@@ -573,8 +573,10 @@ object DataSourceStrategy
    * @return a `Some[Filter]` if the input [[Expression]] is convertible, otherwise a `None`.
    */
   protected[sql] def translateFilter(

Review Comment:
   do we still use it for pure translation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] melihsozdinler commented on a diff in pull request #42112: [SPARK-44493][SQL] Support for translating catalyst expressions into partial datasource filters

Posted by "melihsozdinler (via GitHub)" <gi...@apache.org>.
melihsozdinler commented on code in PR #42112:
URL: https://github.com/apache/spark/pull/42112#discussion_r1297716611


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala:
##########
@@ -584,46 +586,46 @@ object DataSourceStrategy
    * @param translatedFilterToExpr An optional map from leaf node filter expressions to its
    *                               translated [[Filter]]. The map is used for rebuilding
    *                               [[Expression]] from [[Filter]].
-   * @param nestedPredicatePushdownEnabled Whether nested predicate pushdown is enabled.
+   * @param supportNestedPushDown Whether nested predicate push down is enabled.
+   * @param canPartialPushDown Can it be translated into partial predicate.
    * @return a `Some[Filter]` if the input [[Expression]] is convertible, otherwise a `None`.
    */
   protected[sql] def translateFilterWithMapping(
       predicate: Expression,
       translatedFilterToExpr: Option[mutable.HashMap[sources.Filter, Expression]],
-      nestedPredicatePushdownEnabled: Boolean)
+      supportNestedPushDown: Boolean,
+      canPartialPushDown: Boolean)

Review Comment:
   You can define default value as false, and send the parameter on demand if true is required by this function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #42112: [SPARK-44493][SQL] Support for translating catalyst expressions into partial datasource filters

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on PR #42112:
URL: https://github.com/apache/spark/pull/42112#issuecomment-1654976856

   @MaxGekk @huaxingao @gengliangwang @cloud-fan 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44493][SQL] Support for translating catalyst expressions into partial datasource filters [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #42112:
URL: https://github.com/apache/spark/pull/42112#issuecomment-1953304328

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #42112: [SPARK-44493][SQL] Support for translating catalyst expressions into partial datasource filters

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #42112:
URL: https://github.com/apache/spark/pull/42112#discussion_r1282716758


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala:
##########
@@ -573,8 +573,10 @@ object DataSourceStrategy
    * @return a `Some[Filter]` if the input [[Expression]] is convertible, otherwise a `None`.
    */
   protected[sql] def translateFilter(

Review Comment:
   can we rename it? It's not pure translation anymore, but kind of extracting filters to do pushdown



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] melihsozdinler commented on a diff in pull request #42112: [SPARK-44493][SQL] Support for translating catalyst expressions into partial datasource filters

Posted by "melihsozdinler (via GitHub)" <gi...@apache.org>.
melihsozdinler commented on code in PR #42112:
URL: https://github.com/apache/spark/pull/42112#discussion_r1297716611


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala:
##########
@@ -584,46 +586,46 @@ object DataSourceStrategy
    * @param translatedFilterToExpr An optional map from leaf node filter expressions to its
    *                               translated [[Filter]]. The map is used for rebuilding
    *                               [[Expression]] from [[Filter]].
-   * @param nestedPredicatePushdownEnabled Whether nested predicate pushdown is enabled.
+   * @param supportNestedPushDown Whether nested predicate push down is enabled.
+   * @param canPartialPushDown Can it be translated into partial predicate.
    * @return a `Some[Filter]` if the input [[Expression]] is convertible, otherwise a `None`.
    */
   protected[sql] def translateFilterWithMapping(
       predicate: Expression,
       translatedFilterToExpr: Option[mutable.HashMap[sources.Filter, Expression]],
-      nestedPredicatePushdownEnabled: Boolean)
+      supportNestedPushDown: Boolean,
+      canPartialPushDown: Boolean)

Review Comment:
   You can define default value as false, and send the parameter on demand if true is required by this function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #42112: [SPARK-44493][SQL] Support for translating catalyst expressions into partial datasource filters

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on code in PR #42112:
URL: https://github.com/apache/spark/pull/42112#discussion_r1284086381


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala:
##########
@@ -573,8 +573,10 @@ object DataSourceStrategy
    * @return a `Some[Filter]` if the input [[Expression]] is convertible, otherwise a `None`.
    */
   protected[sql] def translateFilter(

Review Comment:
   Are these two pure translations?
   
   https://github.com/apache/spark/blob/071feabbd4325504332679dfa620bc5ee4359370/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L674
   https://github.com/apache/spark/blob/6161bf44f40f8146ea4c115c788fd4eaeb128769/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownUtils.scala#L54-L56



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] melihsozdinler commented on a diff in pull request #42112: [SPARK-44493][SQL] Support for translating catalyst expressions into partial datasource filters

Posted by "melihsozdinler (via GitHub)" <gi...@apache.org>.
melihsozdinler commented on code in PR #42112:
URL: https://github.com/apache/spark/pull/42112#discussion_r1297726143


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala:
##########
@@ -573,8 +573,10 @@ object DataSourceStrategy
    * @return a `Some[Filter]` if the input [[Expression]] is convertible, otherwise a `None`.
    */
   protected[sql] def translateFilter(
-      predicate: Expression, supportNestedPredicatePushdown: Boolean): Option[Filter] = {
-    translateFilterWithMapping(predicate, None, supportNestedPredicatePushdown)
+      predicate: Expression,
+      supportNestedPushDown: Boolean,
+      canPartialPushDown: Boolean): Option[Filter] = {

Review Comment:
   canPartialPushdown and supportNestedPushDown can have a default value as false, so if some logic does not require these options, no need to define variable before calling function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #42112: [SPARK-44493][SQL] Extract pushable predicates from disjunctive predicates

Posted by "wangyum (via GitHub)" <gi...@apache.org>.
wangyum commented on PR #42112:
URL: https://github.com/apache/spark/pull/42112#issuecomment-1647693696

   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44493][SQL] Support for translating catalyst expressions into partial datasource filters [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #42112: [SPARK-44493][SQL] Support for translating catalyst expressions into partial datasource filters
URL: https://github.com/apache/spark/pull/42112


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org