You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/05 19:27:53 UTC

[GitHub] [iceberg] huaxingao opened a new pull request, #5204: Not push down complex filter

huaxingao opened a new pull request, #5204:
URL: https://github.com/apache/iceberg/pull/5204

   Currently iceberg throws Exception when using a filter on complex type:
   
   ```
   org.apache.iceberg.spark.sql.TestSelect > testComplexTypeFilter[catalogName = testhadoop, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hadoop}] FAILED
       java.lang.IllegalArgumentException: Cannot create expression literal from org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema: [3,v1]
           at org.apache.iceberg.expressions.Literals.from(Literals.java:87)
           at org.apache.iceberg.expressions.UnboundPredicate.<init>(UnboundPredicate.java:40)
           at org.apache.iceberg.expressions.Expressions.equal(Expressions.java:175)
           at org.apache.iceberg.spark.SparkFilters.handleEqual(SparkFilters.java:239)
           at org.apache.iceberg.spark.SparkFilters.convert(SparkFilters.java:152)
           at org.apache.iceberg.spark.source.SparkScanBuilder.pushFilters(SparkScanBuilder.java:101)
           at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushFilters(PushDownUtils.scala:69)
           at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:60)
           at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:47)
           at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
           at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
           at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
           at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
           at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
           at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
           at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   
   ```
   
   This PR fixes the problem by not pushing down filters for complex type.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho merged pull request #5204: Not push down complex filter

Posted by GitBox <gi...@apache.org>.
szehon-ho merged PR #5204:
URL: https://github.com/apache/iceberg/pull/5204


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #5204: Not push down complex filter

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#issuecomment-1178354750

   @huaxingao do you want to make a change for other spark branches (ie, 3.2)?  Otherwise I or someone can look at it too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #5204: Not push down complex filter

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#issuecomment-1176806791

   Thanks @huaxingao for change, and @singhpk234 for additional review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on pull request #5204: Not push down complex filter

Posted by GitBox <gi...@apache.org>.
huaxingao commented on PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#issuecomment-1178369480

   @szehon-ho I will do this later today or tomorrow. Do we need this in all the spark branches (3.2, 3.1, 3.0, 2.4)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #5204: Not push down complex filter

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#discussion_r914817448


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -100,7 +100,13 @@ public Filter[] pushFilters(Filter[] filters) {
     List<Filter> pushed = Lists.newArrayListWithExpectedSize(filters.length);
 
     for (Filter filter : filters) {
-      Expression expr = SparkFilters.convert(filter);
+      Expression expr = null;
+      try {
+        expr = SparkFilters.convert(filter);
+      } catch (IllegalArgumentException e) {
+        // converting to Iceberg Expression failed, so this expression cannot be pushed down

Review Comment:
   nvm, i see we log the pushed filters in spark already [here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L82-L87)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on pull request #5204: Not push down complex filter

Posted by GitBox <gi...@apache.org>.
huaxingao commented on PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#issuecomment-1176820884

   Thank you very much! @szehon-ho  Also thank you @singhpk234 for reviewing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #5204: Not push down complex filter

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#discussion_r914390773


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -100,7 +100,13 @@ public Filter[] pushFilters(Filter[] filters) {
     List<Filter> pushed = Lists.newArrayListWithExpectedSize(filters.length);
 
     for (Filter filter : filters) {
-      Expression expr = SparkFilters.convert(filter);
+      Expression expr = null;
+      try {
+        expr = SparkFilters.convert(filter);
+      } catch (IllegalArgumentException e) {
+        // converting to Iceberg Expression failed, so this expression cannot be pushed down

Review Comment:
   [minor] should we log what were the filters that spark pushed but were ignored ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on pull request #5204: Not push down complex filter

Posted by GitBox <gi...@apache.org>.
huaxingao commented on PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#issuecomment-1178380586

   BTW, I saw the CI failed (https://github.com/apache/iceberg/runs/7223837382?check_suite_focus=true) because of this test
   
   ```
   org.apache.iceberg.parquet.TestBloomRowGroupFilter > testBytesEq FAILED
       java.lang.AssertionError: Should not read: cannot match a new generated binary
           at org.junit.Assert.fail(Assert.java:89)
           at org.junit.Assert.assertTrue(Assert.java:42)
           at org.junit.Assert.assertFalse(Assert.java:65)
           at org.apache.iceberg.parquet.TestBloomRowGroupFilter.testBytesEq(TestBloomRowGroupFilter.java:587)
   ```
   
   The failure is not related to the changes in this PR. I think the failure is because bloom filter can be false positive. I will fix the bloom filter test. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org