You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/16 12:24:38 UTC

[GitHub] [spark] cloud-fan commented on a diff in pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters

cloud-fan commented on code in PR #38511:
URL: https://github.com/apache/spark/pull/38511#discussion_r1023927228


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala:
##########
@@ -85,15 +72,25 @@ object PhysicalOperation extends AliasHelper with PredicateHelper {
         // projects. We need to meet the following conditions to do so:
         //   1) no Project collected so far or the collected Projects are all deterministic
         //   2) the collected filters and this filter are all deterministic, or this is the
-        //      first collected filter.
+        //      first collected filter. This condition can be relaxed if `canKeepMultipleFilters` is
+        //      true.
         //   3) this filter does not repeat any expensive expressions from the collected
         //      projects.
-        val canIncludeThisFilter = fields.forall(_.forall(_.deterministic)) && {
-          filters.isEmpty || (filters.forall(_.deterministic) && condition.deterministic)
-        } && canCollapseExpressions(Seq(condition), aliases, alwaysInline)
-        if (canIncludeThisFilter) {
-          val replaced = replaceAlias(condition, aliases)
-          (fields, filters ++ splitConjunctivePredicates(replaced), other, aliases)
+        val canPushFilterThroughProject = fields.forall(_.forall(_.deterministic)) &&
+          canCollapseExpressions(Seq(condition), aliases, alwaysInline)
+        if (canPushFilterThroughProject) {
+          val canIncludeThisFilter = filters.isEmpty || {
+            filters.length == 1 && filters.head.forall(_.deterministic) && condition.deterministic
+          }

Review Comment:
   This is the core change of this PR. `PhysicalOperation` returns a single filter condition, which means it combines filters, and we have to make sure all the filters are deterministic. `ScanOperation` returns multiple filter conditions and does not have this restriction.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org