You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/05 19:27:53 UTC
[GitHub] [iceberg] huaxingao opened a new pull request, #5204: Not push down complex filter
huaxingao opened a new pull request, #5204:
URL: https://github.com/apache/iceberg/pull/5204
Currently iceberg throws Exception when using a filter on complex type:
```
org.apache.iceberg.spark.sql.TestSelect > testComplexTypeFilter[catalogName = testhadoop, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hadoop}] FAILED
java.lang.IllegalArgumentException: Cannot create expression literal from org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema: [3,v1]
at org.apache.iceberg.expressions.Literals.from(Literals.java:87)
at org.apache.iceberg.expressions.UnboundPredicate.<init>(UnboundPredicate.java:40)
at org.apache.iceberg.expressions.Expressions.equal(Expressions.java:175)
at org.apache.iceberg.spark.SparkFilters.handleEqual(SparkFilters.java:239)
at org.apache.iceberg.spark.SparkFilters.convert(SparkFilters.java:152)
at org.apache.iceberg.spark.source.SparkScanBuilder.pushFilters(SparkScanBuilder.java:101)
at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushFilters(PushDownUtils.scala:69)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:60)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:47)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
```
This PR fixes the problem by not pushing down filters for complex type.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] szehon-ho merged pull request #5204: Not push down complex filter
Posted by GitBox <gi...@apache.org>.
szehon-ho merged PR #5204:
URL: https://github.com/apache/iceberg/pull/5204
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] szehon-ho commented on pull request #5204: Not push down complex filter
Posted by GitBox <gi...@apache.org>.
szehon-ho commented on PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#issuecomment-1178354750
@huaxingao do you want to make a change for other spark branches (ie, 3.2)? Otherwise I or someone can look at it too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] szehon-ho commented on pull request #5204: Not push down complex filter
Posted by GitBox <gi...@apache.org>.
szehon-ho commented on PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#issuecomment-1176806791
Thanks @huaxingao for change, and @singhpk234 for additional review!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] huaxingao commented on pull request #5204: Not push down complex filter
Posted by GitBox <gi...@apache.org>.
huaxingao commented on PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#issuecomment-1178369480
@szehon-ho I will do this later today or tomorrow. Do we need this in all the spark branches (3.2, 3.1, 3.0, 2.4)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] singhpk234 commented on a diff in pull request #5204: Not push down complex filter
Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#discussion_r914817448
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -100,7 +100,13 @@ public Filter[] pushFilters(Filter[] filters) {
List<Filter> pushed = Lists.newArrayListWithExpectedSize(filters.length);
for (Filter filter : filters) {
- Expression expr = SparkFilters.convert(filter);
+ Expression expr = null;
+ try {
+ expr = SparkFilters.convert(filter);
+ } catch (IllegalArgumentException e) {
+ // converting to Iceberg Expression failed, so this expression cannot be pushed down
Review Comment:
nvm, i see we log the pushed filters in spark already [here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala#L82-L87)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] huaxingao commented on pull request #5204: Not push down complex filter
Posted by GitBox <gi...@apache.org>.
huaxingao commented on PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#issuecomment-1176820884
Thank you very much! @szehon-ho Also thank you @singhpk234 for reviewing!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] singhpk234 commented on a diff in pull request #5204: Not push down complex filter
Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#discussion_r914390773
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -100,7 +100,13 @@ public Filter[] pushFilters(Filter[] filters) {
List<Filter> pushed = Lists.newArrayListWithExpectedSize(filters.length);
for (Filter filter : filters) {
- Expression expr = SparkFilters.convert(filter);
+ Expression expr = null;
+ try {
+ expr = SparkFilters.convert(filter);
+ } catch (IllegalArgumentException e) {
+ // converting to Iceberg Expression failed, so this expression cannot be pushed down
Review Comment:
[minor] should we log what were the filters that spark pushed but were ignored ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] huaxingao commented on pull request #5204: Not push down complex filter
Posted by GitBox <gi...@apache.org>.
huaxingao commented on PR #5204:
URL: https://github.com/apache/iceberg/pull/5204#issuecomment-1178380586
BTW, I saw the CI failed (https://github.com/apache/iceberg/runs/7223837382?check_suite_focus=true) because of this test
```
org.apache.iceberg.parquet.TestBloomRowGroupFilter > testBytesEq FAILED
java.lang.AssertionError: Should not read: cannot match a new generated binary
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.junit.Assert.assertFalse(Assert.java:65)
at org.apache.iceberg.parquet.TestBloomRowGroupFilter.testBytesEq(TestBloomRowGroupFilter.java:587)
```
The failure is not related to the changes in this PR. I think the failure is because bloom filter can be false positive. I will fix the bloom filter test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org