You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/16 18:17:55 UTC

[GitHub] [spark] tanelk opened a new pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

tanelk opened a new pull request #29092:
URL: https://github.com/apache/spark/pull/29092


   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   Add `And(IsNotNull(e), GreaterThan(Size(e), Literal(0)))` filter before Explode and PosExplode, when `outer = false`.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   Predicate pushdown will be able to move this new filter down through joins and into data sources for performance improvement.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   Unit test
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu edited a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu edited a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707404702


   Looks okay otherwise.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707195686






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707159456


   **[Test build #129695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129695/testReport)** for PR 29092 at commit [`e3a0205`](https://github.com/apache/spark/commit/e3a020539246fba50d4708ac78ac5ed18da58e3c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r504079357



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -868,6 +866,41 @@ object TransposeWindow extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Infers filters from [[Generate]], such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    // This rule does not infer filters from foldable expressions to avoid constant filters
+    // like 'size([1, 2, 3]) > 0'. These do not show up in child's constraints and
+    // then the idempotence will break.
+    case generate @ Generate(e, _, _, _, _, _)
+      if !e.deterministic || e.children.forall(_.foldable) => generate
+
+    case generate @ Generate(g, _, false, _, _, _) if canInferFilters(g) =>
+      // Exclude child's constraints to guarantee idempotency
+      val inferredFilters = ExpressionSet(
+        Seq(
+          GreaterThan(Size(g.children.head), Literal(0)),

Review comment:
       This can be useful to filter rows before join, but can we pushdown `Size` expression to datasource?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707531680






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707292765


   **[Test build #129695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129695/testReport)** for PR 29092 at commit [`e3a0205`](https://github.com/apache/spark/commit/e3a020539246fba50d4708ac78ac5ed18da58e3c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707604863


   **[Test build #129738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129738/testReport)** for PR 29092 at commit [`857c007`](https://github.com/apache/spark/commit/857c0078f20609e9aca6839034b52708991aa1bf).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r504080387



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -868,6 +866,41 @@ object TransposeWindow extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Infers filters from [[Generate]], such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    // This rule does not infer filters from foldable expressions to avoid constant filters
+    // like 'size([1, 2, 3]) > 0'. These do not show up in child's constraints and
+    // then the idempotence will break.
+    case generate @ Generate(e, _, _, _, _, _)
+      if !e.deterministic || e.children.forall(_.foldable) => generate
+
+    case generate @ Generate(g, _, false, _, _, _) if canInferFilters(g) =>
+      // Exclude child's constraints to guarantee idempotency
+      val inferredFilters = ExpressionSet(
+        Seq(
+          GreaterThan(Size(g.children.head), Literal(0)),

Review comment:
       I don't think so...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707755366


   **[Test build #129738 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129738/testReport)** for PR 29092 at commit [`857c007`](https://github.com/apache/spark/commit/857c0078f20609e9aca6839034b52708991aa1bf).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704740807


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129487/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-693577467


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r489922948



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1847,3 +1848,25 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Generates filters for exploded expression, such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {

Review comment:
       I thought first we might be able to do so, but it looks okay as it is on second thoughts.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk closed pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk closed pull request #29092:
URL: https://github.com/apache/spark/pull/29092


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707460775


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34326/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-700912540


   **[Test build #129249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129249/testReport)** for PR 29092 at commit [`3fc4b12`](https://github.com/apache/spark/commit/3fc4b12337d0ac8b8877d12371284860212b1e2a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705376010


   **[Test build #129538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129538/testReport)** for PR 29092 at commit [`d4089ca`](https://github.com/apache/spark/commit/d4089caf90687700708b62e882f61ba87de9bede).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707604863


   **[Test build #129738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129738/testReport)** for PR 29092 at commit [`857c007`](https://github.com/apache/spark/commit/857c0078f20609e9aca6839034b52708991aa1bf).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500734180



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -122,6 +122,7 @@ abstract class Optimizer(catalogManager: CatalogManager)
       Batch("Operator Optimization before Inferring Filters", fixedPoint,
         rulesWithoutInferFiltersFromConstraints: _*) ::
       Batch("Infer Filters", Once,
+        InferFiltersFromGenerate,

Review comment:
       Actually the `InferFiltersFromConstraints` is allways filtered out from the `operatorOptimizationRuleSet`:
   https://github.com/apache/spark/blob/5ce321dc80a699fa525ca5b69bf2c28e10f8a12a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L120-L123
   
   I think it is very misleading , perhaps we could remove it from there and avoid further confusion?
   
   It was asked in the original PR, put looks like it was forgotten:
   https://github.com/apache/spark/pull/19149/files#r157777935




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500935070



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1854,3 +1855,37 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Infers filters from [[Generate]], such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case generate @ Generate(e: ExplodeBase, _, false, _, _, _)
+      if e.deterministic && !e.child.foldable =>

Review comment:
       Ah, I see. okay, let's just check `deterministic` only.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704523767






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-693756804


   **[Test build #128783 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128783/testReport)** for PR 29092 at commit [`ffe43d7`](https://github.com/apache/spark/commit/ffe43d7957040e62d8050ae2abb1f03e26c4d044).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704739977


   **[Test build #129487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129487/testReport)** for PR 29092 at commit [`6cc7b15`](https://github.com/apache/spark/commit/6cc7b15d722996eeb7831f2c98d1554379d375a4).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704740797






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707668514


   Thanks! Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707531680






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-657793610


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705312444


   **[Test build #129538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129538/testReport)** for PR 29092 at commit [`d4089ca`](https://github.com/apache/spark/commit/d4089caf90687700708b62e882f61ba87de9bede).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705030502






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705376418


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-700912540


   **[Test build #129249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129249/testReport)** for PR 29092 at commit [`3fc4b12`](https://github.com/apache/spark/commit/3fc4b12337d0ac8b8877d12371284860212b1e2a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704410164






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-701049874


   **[Test build #129249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129249/testReport)** for PR 29092 at commit [`3fc4b12`](https://github.com/apache/spark/commit/3fc4b12337d0ac8b8877d12371284860212b1e2a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704344533


   **[Test build #129454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129454/testReport)** for PR 29092 at commit [`c3f9063`](https://github.com/apache/spark/commit/c3f9063adbd62e932521b3298fefeaed09fd936c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-700947771






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704502984


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34070/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705049066






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-693622669


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500128769



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromGenerateSuite.scala
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.types.IntegerType
+
+class InferFiltersFromGenerateSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+    val batches = Batch("Infer Filters", Once, InferFiltersFromGenerate) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.array(IntegerType))
+
+  Seq(Explode(_), PosExplode(_)).foreach { f =>
+    val explode = f('a)
+    test("Infer filters from " + explode) {
+      val originalQuery = testRelation.generate(explode).analyze
+      val correctAnswer = testRelation
+        .where(IsNotNull('a) && Size('a) > 0)

Review comment:
       can we add one more test case to make sure we don't generate duplicated predicates if the query already has `IsNotNull('a) && Size('a) > 0`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-693757106






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704655263


   **[Test build #129487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129487/testReport)** for PR 29092 at commit [`6cc7b15`](https://github.com/apache/spark/commit/6cc7b15d722996eeb7831f2c98d1554379d375a4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r503325965



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -122,6 +122,7 @@ abstract class Optimizer(catalogManager: CatalogManager)
       Batch("Operator Optimization before Inferring Filters", fixedPoint,
         rulesWithoutInferFiltersFromConstraints: _*) ::
       Batch("Infer Filters", Once,
+        InferFiltersFromGenerate,

Review comment:
       I don't quite remember, maybe we can remove it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705330161






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-693622143


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500736341



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -122,6 +122,7 @@ abstract class Optimizer(catalogManager: CatalogManager)
       Batch("Operator Optimization before Inferring Filters", fixedPoint,
         rulesWithoutInferFiltersFromConstraints: _*) ::
       Batch("Infer Filters", Once,
+        InferFiltersFromGenerate,

Review comment:
       Yea, but I also don't know why. cc: @cloud-fan 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-693577467


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707655246


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-700909630


   Pinging @viirya and @cloud-fan for possible second review


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705376418






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705330149


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34144/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r501167719



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1854,3 +1855,37 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Infers filters from [[Generate]], such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case generate @ Generate(e: ExplodeBase, _, false, _, _, _)
+      if e.deterministic && !e.child.foldable =>

Review comment:
       Seeing the failed results there is another obvious issue with allowing foldable inferred filters - the idempotence is broken. I would very much like to keep this rule idempotent. 
   
   Should I try to include foldable inferred filters while keeping it idempotent or shall I revert the last change?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-693622143


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705322955






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-657793610


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704672759


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34093/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707655225


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34344/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705322955


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34144/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-693755084


   ok to test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-694003992


   **[Test build #128783 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128783/testReport)** for PR 29092 at commit [`ffe43d7`](https://github.com/apache/spark/commit/ffe43d7957040e62d8050ae2abb1f03e26c4d044).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704615691


   **[Test build #129463 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129463/testReport)** for PR 29092 at commit [`9797856`](https://github.com/apache/spark/commit/9797856b5808b89ff7994d1e121f64a2a3df9fe2).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-694007907






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704667180


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34093/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r503590648



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -117,14 +116,13 @@ abstract class Optimizer(catalogManager: CatalogManager)
         extendedOperatorOptimizationRules
 
     val operatorOptimizationBatch: Seq[Batch] = {
-      val rulesWithoutInferFiltersFromConstraints =
-        operatorOptimizationRuleSet.filterNot(_ == InferFiltersFromConstraints)
       Batch("Operator Optimization before Inferring Filters", fixedPoint,
-        rulesWithoutInferFiltersFromConstraints: _*) ::
+        operatorOptimizationRuleSet: _*) ::
       Batch("Infer Filters", Once,
+        InferFiltersFromGenerate,
         InferFiltersFromConstraints) ::
       Batch("Operator Optimization after Inferring Filters", fixedPoint,
-        rulesWithoutInferFiltersFromConstraints: _*) ::
+        operatorOptimizationRuleSet: _*) ::

Review comment:
       nit: `operatorOptimizationRuleSet` -> `rulesWithoutInferFilters`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707655255


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34344/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500687909



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1847,3 +1848,25 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Generates filters for exploded expression, such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case g @ Generate(e: ExplodeBase, _, false, _, _, child)

Review comment:
       Sure




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707641068


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34344/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705330161






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-657793101


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500730169



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -122,6 +122,7 @@ abstract class Optimizer(catalogManager: CatalogManager)
       Batch("Operator Optimization before Inferring Filters", fixedPoint,
         rulesWithoutInferFiltersFromConstraints: _*) ::
       Batch("Infer Filters", Once,
+        InferFiltersFromGenerate,

Review comment:
       Ah, ok. Looks okay as it is. Since `operatorOptimizationRuleSet` has `InferFiltersFromConstraints`, I thought we'd better to put `InferFiltersFromConstraints`/`InferFiltersFromGenerate` there together.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-701050491






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704616532






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707159456


   **[Test build #129695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129695/testReport)** for PR 29092 at commit [`e3a0205`](https://github.com/apache/spark/commit/e3a020539246fba50d4708ac78ac5ed18da58e3c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r501375355



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1854,3 +1855,37 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Infers filters from [[Generate]], such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case generate @ Generate(e: ExplodeBase, _, false, _, _, _)
+      if e.deterministic && !e.child.foldable =>

Review comment:
       Ah, ok. If so, its okay to revert it. Could you leave some comments about why we need the foldable check there?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704992172


   **[Test build #129514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129514/testReport)** for PR 29092 at commit [`a25bd6f`](https://github.com/apache/spark/commit/a25bd6f81e36fe8b35d3ce6ef4369a2eac29ff4f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-701050491






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707443255


   **[Test build #129720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129720/testReport)** for PR 29092 at commit [`b7a7c49`](https://github.com/apache/spark/commit/b7a7c499ace7cfcfdd1dd993f07fb286f5546434).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707530561


   **[Test build #129720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129720/testReport)** for PR 29092 at commit [`b7a7c49`](https://github.com/apache/spark/commit/b7a7c499ace7cfcfdd1dd993f07fb286f5546434).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500786803



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1854,3 +1855,37 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Infers filters from [[Generate]], such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case generate @ Generate(e: ExplodeBase, _, false, _, _, _)
+      if e.deterministic && !e.child.foldable =>

Review comment:
       We would get a foldable filter from this. 
   99% of times it would be like `size([1, 2, 3]) > 0 && isNotNull([1, 2, 3])`.
   But now that you mention it, yes it would be handled by `ConstantFolding` and in rare occasions it might be replaced with a false literal. 
   
   I don't have a strong opinion on this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704514830






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705312444


   **[Test build #129538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129538/testReport)** for PR 29092 at commit [`d4089ca`](https://github.com/apache/spark/commit/d4089caf90687700708b62e882f61ba87de9bede).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r489919112



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1847,3 +1848,25 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Generates filters for exploded expression, such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {

Review comment:
       The new one does not use constraints to infer filters. I could rename the existing one to just `InferFilters` and then combine these two.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500689565



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -122,6 +122,7 @@ abstract class Optimizer(catalogManager: CatalogManager)
       Batch("Operator Optimization before Inferring Filters", fixedPoint,
         rulesWithoutInferFiltersFromConstraints: _*) ::
       Batch("Infer Filters", Once,
+        InferFiltersFromGenerate,

Review comment:
       I'm sorry, I don't get the question.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500382697



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1847,3 +1848,25 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Generates filters for exploded expression, such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case g @ Generate(e: ExplodeBase, _, false, _, _, child)

Review comment:
       Added `Inline` and also checked other `Generate`-s, but it seems that this rule is not applicable to others.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707460800


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34326/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-693622669


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707182655


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34301/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r489861092



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1847,3 +1848,25 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Generates filters for exploded expression, such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {

Review comment:
       Why did you create a new rule instead of improving the exisint one, `InferFiltersFromConstraints`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704344533


   **[Test build #129454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129454/testReport)** for PR 29092 at commit [`c3f9063`](https://github.com/apache/spark/commit/c3f9063adbd62e932521b3298fefeaed09fd936c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-693756804


   **[Test build #128783 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128783/testReport)** for PR 29092 at commit [`ffe43d7`](https://github.com/apache/spark/commit/ffe43d7957040e62d8050ae2abb1f03e26c4d044).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-700935206


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33866/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707756707


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129738/
   Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707404702


   Looks okay othewise.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704672765






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-700947744


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33866/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk closed pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk closed pull request #29092:
URL: https://github.com/apache/spark/pull/29092


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500694215



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1854,3 +1855,37 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Infers filters from [[Generate]], such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case generate @ Generate(e: ExplodeBase, _, false, _, _, _)
+      if e.deterministic && !e.child.foldable =>

Review comment:
       We need this foldable check?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704475978


   **[Test build #129463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129463/testReport)** for PR 29092 at commit [`9797856`](https://github.com/apache/spark/commit/9797856b5808b89ff7994d1e121f64a2a3df9fe2).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705312444


   **[Test build #129538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129538/testReport)** for PR 29092 at commit [`d4089ca`](https://github.com/apache/spark/commit/d4089caf90687700708b62e882f61ba87de9bede).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-657793101


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500204107



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1847,3 +1848,25 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Generates filters for exploded expression, such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case g @ Generate(e: ExplodeBase, _, false, _, _, child)

Review comment:
       How about the other gerators, e.g., `Inline`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500207664



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1847,3 +1848,25 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Generates filters for exploded expression, such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case g @ Generate(e: ExplodeBase, _, false, _, _, child)

Review comment:
       I'm not familiar with other generators and focused only on the ones that I regularly use. 
   But yes, I'm sure it can be expanded to include other Generators as well, but there can be some exceptions.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705030479


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34119/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r503785679



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromGenerateSuite.scala
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
+
+class InferFiltersFromGenerateSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+    val batches = Batch("Infer Filters", Once, InferFiltersFromGenerate) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.array(StructType(Seq(
+    StructField("x", IntegerType),
+    StructField("y", IntegerType)
+  ))))
+
+  Seq(Explode(_), PosExplode(_), Inline(_)).foreach { f =>
+    val generator = f('a)
+    test("Infer filters from " + generator) {
+      val originalQuery = testRelation.generate(generator).analyze
+      val correctAnswer = testRelation
+        .where(IsNotNull('a) && Size('a) > 0)
+        .generate(generator)
+        .analyze
+      val optimized = Optimize.execute(originalQuery)
+      comparePlans(optimized, correctAnswer)
+    }
+
+    test("Don't infer duplicate filters from " + generator) {
+      val originalQuery = testRelation
+        .where(IsNotNull('a) && Size('a) > 0)
+        .generate(generator)
+        .analyze
+      val correctAnswer = testRelation
+        .where(IsNotNull('a) && Size('a) > 0)
+        .generate(generator)
+        .analyze
+      val optimized = Optimize.execute(originalQuery)
+      comparePlans(optimized, correctAnswer)

Review comment:
       Thanks, don't know how I missed it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704522740


   **[Test build #129454 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129454/testReport)** for PR 29092 at commit [`c3f9063`](https://github.com/apache/spark/commit/c3f9063adbd62e932521b3298fefeaed09fd936c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704514803


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34070/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500689782



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1854,3 +1855,37 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Infers filters from [[Generate]], such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case generate @ Generate(e: ExplodeBase, _, false, _, _, _)

Review comment:
       nit: how about moving this rule close to `InferFiltersFromConstraints`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704672765






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704655263


   **[Test build #129487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129487/testReport)** for PR 29092 at commit [`6cc7b15`](https://github.com/apache/spark/commit/6cc7b15d722996eeb7831f2c98d1554379d375a4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704475978


   **[Test build #129463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129463/testReport)** for PR 29092 at commit [`9797856`](https://github.com/apache/spark/commit/9797856b5808b89ff7994d1e121f64a2a3df9fe2).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707460791


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r503739898



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromGenerateSuite.scala
##########
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
+
+class InferFiltersFromGenerateSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+    val batches = Batch("Infer Filters", Once, InferFiltersFromGenerate) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.array(StructType(Seq(
+    StructField("x", IntegerType),
+    StructField("y", IntegerType)
+  ))))
+
+  Seq(Explode(_), PosExplode(_), Inline(_)).foreach { f =>
+    val generator = f('a)
+    test("Infer filters from " + generator) {
+      val originalQuery = testRelation.generate(generator).analyze
+      val correctAnswer = testRelation
+        .where(IsNotNull('a) && Size('a) > 0)
+        .generate(generator)
+        .analyze
+      val optimized = Optimize.execute(originalQuery)
+      comparePlans(optimized, correctAnswer)
+    }
+
+    test("Don't infer duplicate filters from " + generator) {
+      val originalQuery = testRelation
+        .where(IsNotNull('a) && Size('a) > 0)
+        .generate(generator)
+        .analyze
+      val correctAnswer = testRelation
+        .where(IsNotNull('a) && Size('a) > 0)
+        .generate(generator)
+        .analyze
+      val optimized = Optimize.execute(originalQuery)
+      comparePlans(optimized, correctAnswer)

Review comment:
       nit: we don't need the duplicated `correctAnswer`, just write `comparePlans(optimized, originalQuery)`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705330161






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704992172


   **[Test build #129514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129514/testReport)** for PR 29092 at commit [`a25bd6f`](https://github.com/apache/spark/commit/a25bd6f81e36fe8b35d3ce6ef4369a2eac29ff4f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707443255


   **[Test build #129720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129720/testReport)** for PR 29092 at commit [`b7a7c49`](https://github.com/apache/spark/commit/b7a7c499ace7cfcfdd1dd993f07fb286f5546434).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707460791






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707453840


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34326/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707655246






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707756707


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129738/
   Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500687271



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1847,3 +1848,25 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Generates filters for exploded expression, such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case g @ Generate(e: ExplodeBase, _, false, _, _, child)

Review comment:
       Looks okay. Could you update the PR description accordingly?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704740797


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-694007907






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704410164






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707756694


   Merged build finished. Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500215630



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1847,3 +1848,25 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Generates filters for exploded expression, such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case g @ Generate(e: ExplodeBase, _, false, _, _, child)

Review comment:
       Yea, its better to conver the otehr cases where pssible, too:
   https://github.com/apache/spark/blob/0812d6c17cc4876bb87a9d1fec35ec8c7b2365f0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala#L432-L434




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu closed pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu closed pull request #29092:
URL: https://github.com/apache/spark/pull/29092


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500784343



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1854,3 +1855,37 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Infers filters from [[Generate]], such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case generate @ Generate(e: ExplodeBase, _, false, _, _, _)

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500694652



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1854,3 +1855,37 @@ object OptimizeLimitZero extends Rule[LogicalPlan] {
       empty(ll)
   }
 }
+
+/**
+ * Infers filters from [[Generate]], such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    case generate @ Generate(e: ExplodeBase, _, false, _, _, _)
+      if e.deterministic && !e.child.foldable =>
+
+      inferFiltersForExplodeLike(generate, e)
+
+    case generate @ Generate(e: Inline, _, false, _, _, _)
+      if e.deterministic && !e.child.foldable =>
+
+      inferFiltersForExplodeLike(generate, e)
+  }
+
+  private def inferFiltersForExplodeLike(generate: Generate, expr: UnaryExpression): Generate = {

Review comment:
       Just a nit suggestion; how about reformatting it like this?
   ```
     def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
       case generate @ Generate(e, _, _, _, _, _)
         if !e.deterministic || e.children.forall(_.foldable) => generate
   
       case generate @ Generate(g, _, false, _, _, _) if canInferFilters(g) =>
         // Exclude child's constraints to guarantee idempotency
         val inferredFilters = ExpressionSet(
           Seq(
             GreaterThan(Size(g.children.head), Literal(0)),
             IsNotNull(g.children.head)
           )
         ) -- generate.child.constraints
   
         if (inferredFilters.nonEmpty) {
           generate.copy(child = Filter(inferredFilters.reduce(And), generate.child))
         } else {
           generate
         }
     }
   
     private def canInferFilters(g: Generator): Boolean = g match {
       case _: ExplodeBase => true
       case _: Inline => true
       case _ => false
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705049066


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500381188



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromGenerateSuite.scala
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+import org.apache.spark.sql.types.IntegerType
+
+class InferFiltersFromGenerateSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+    val batches = Batch("Infer Filters", Once, InferFiltersFromGenerate) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.array(IntegerType))
+
+  Seq(Explode(_), PosExplode(_)).foreach { f =>
+    val explode = f('a)
+    test("Infer filters from " + explode) {
+      val originalQuery = testRelation.generate(explode).analyze
+      val correctAnswer = testRelation
+        .where(IsNotNull('a) && Size('a) > 0)

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-693623971


   cc @maropu @dongjoon-hyun 
   
   Also, I messed up one commit, that made the bot add wrong labels.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705030502






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705330161






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707293693






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707293693






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-700947771






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705048735


   **[Test build #129514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129514/testReport)** for PR 29092 at commit [`a25bd6f`](https://github.com/apache/spark/commit/a25bd6f81e36fe8b35d3ce6ef4369a2eac29ff4f).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r500688820



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -122,6 +122,7 @@ abstract class Optimizer(catalogManager: CatalogManager)
       Batch("Operator Optimization before Inferring Filters", fixedPoint,
         rulesWithoutInferFiltersFromConstraints: _*) ::
       Batch("Infer Filters", Once,
+        InferFiltersFromGenerate,

Review comment:
       Why did you put this rule only in this batch? https://github.com/apache/spark/pull/29092/files#diff-a636a87d8843eeccca90140be91d4fafR82




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707756694


   Merged build finished. Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704514830






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-693757106






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707195686






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r503594571



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -868,6 +866,42 @@ object TransposeWindow extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Infers filters from [[Generate]], such that rows that would have been removed
+ * by this [[Generate]] can be removed earlier - before joins and in data sources.
+ */
+object InferFiltersFromGenerate extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+    // Don't infer filters from foldable expressions to avoid
+    // constant filters like 'size([1, 2, 3]) > 0'. These can add
+    // unnecessary overhead and these will not show up in the childs
+    // constraints, breaking this rules idempotency.

Review comment:
       nit: How about rephrasing it like this?
   ```
       // This rule does not infer filters from foldable expressions to avoid constant filters
       // like 'size([1, 2, 3]) > 0'. These do not show up in child's constraints and
       // then the idempotence will break.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705376439


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129538/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705015474


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34119/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29092:
URL: https://github.com/apache/spark/pull/29092#discussion_r503416308



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -122,6 +122,7 @@ abstract class Optimizer(catalogManager: CatalogManager)
       Batch("Operator Optimization before Inferring Filters", fixedPoint,
         rulesWithoutInferFiltersFromConstraints: _*) ::
       Batch("Infer Filters", Once,
+        InferFiltersFromGenerate,

Review comment:
       I went ahead and removed it from the `operatorOptimizationRuleSet`. All the tests are still passing as was expected.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu edited a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
maropu edited a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707668514


   GA passed. Thanks! Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704396268






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704523767






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-707195662


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34301/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-704616532






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29092: [SPARK-32295][SQL] Add not null and size > 0 filters before inner explode to benefit from predicate pushdown

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29092:
URL: https://github.com/apache/spark/pull/29092#issuecomment-705049076


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129514/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org