You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/17 11:20:38 UTC

[GitHub] [iceberg] shardulm94 opened a new issue #1952: Failure evaluating expressions for NOT + STARTS_WITH predicate

shardulm94 opened a new issue #1952:
URL: https://github.com/apache/iceberg/issues/1952


   ```
   scala> val df = spark.read.format("iceberg").load("/tmp/iceberg/logs")
   df: org.apache.spark.sql.DataFrame = [level: string]
   
   scala> df.filter(not($"level".startsWith("b"))).show()
   java.lang.IllegalArgumentException: No negation for operation: STARTS_WITH
     at org.apache.iceberg.expressions.Expression$Operation.negate(Expression.java:72)
     at org.apache.iceberg.expressions.UnboundPredicate.negate(UnboundPredicate.java:69)
     at org.apache.iceberg.expressions.RewriteNot.not(RewriteNot.java:44)
     at org.apache.iceberg.expressions.RewriteNot.not(RewriteNot.java:22)
     at org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:293)
     at org.apache.iceberg.expressions.Projections$BaseProjectionEvaluator.project(Projections.java:153
   ```
   
   There are multiple places in Iceberg which call `RewriteNot` on the user provided filters ([1](https://github.com/apache/iceberg/blob/425a45f8acec0496d77e070c07fb209de92ab2c1/api/src/main/java/org/apache/iceberg/expressions/Projections.java#L153), [2](https://github.com/apache/iceberg/blob/425a45f8acec0496d77e070c07fb209de92ab2c1/api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java#L59), [3](https://github.com/apache/iceberg/blob/425a45f8acec0496d77e070c07fb209de92ab2c1/api/src/main/java/org/apache/iceberg/expressions/ManifestEvaluator.java#L67) and many more). However, the `STARTS_WITH` predicate does not support negation and hence `RewriteNot` throws an exception. Should we implement a `NOT_STARTS_WITH` operation to support this usecase?
   
   Another approach would be to just keep `NOT` expression in `RewriteNot` for predicates that do not support negation. However some evaluators in Iceberg make the assumption that there are no `NOT`s in the tree and it may affect their correctness. https://github.com/apache/iceberg/blob/425a45f8acec0496d77e070c07fb209de92ab2c1/api/src/main/java/org/apache/iceberg/expressions/Projections.java#L148


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #1952: Failure evaluating expressions for NOT + STARTS_WITH predicate

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #1952:
URL: https://github.com/apache/iceberg/issues/1952#issuecomment-747635739


   Yes, I think we will need a `NotStartsWith` to handle this.
   
   We can't keep the `Not` after rewriting because some evaluation doesn't support negation. For example, when lower and upper bounds are used to evaluate whether to read a file, the result may be true to indicate that the value may be contained in the file. That should not be negated because it does not confirm that the value is definitely in the file: if looking for files that "do not contain X", we can't use "not(file might contain X)".


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed issue #1952: Failure evaluating expressions for NOT + STARTS_WITH predicate

Posted by GitBox <gi...@apache.org>.
rdblue closed issue #1952:
URL: https://github.com/apache/iceberg/issues/1952


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-eric commented on issue #1952: Failure evaluating expressions for NOT + STARTS_WITH predicate

Posted by GitBox <gi...@apache.org>.
cccs-eric commented on issue #1952:
URL: https://github.com/apache/iceberg/issues/1952#issuecomment-983573754


   I ran into the same issue while trying to run a SQL query using Spark against an Iceberg table.  It crashed with the same error: `IllegalArgumentException: No negation for operation: STARTS_WITH`.  I ran the following query in a Jupyter Notebook:
   ```
   %%sparksql --view bug --cache
   SELECT
       *
   FROM
       namespace.my_table AS tbl
   WHERE
       tbl.aDate > CURRENT_DATE() - INTERVAL 10 DAYS
       AND tbl.activityDisplayName = 'Update user'
       AND rawJSON LIKE '%SourceAnchor%'
       AND displayName NOT LIKE 'Sync%'
   ```
   
   If I create a temporary table in Spark using one or some of the table data files and issue the same query, it works.  So it really is when Iceberg is in play.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #1952: Failure evaluating expressions for NOT + STARTS_WITH predicate

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #1952:
URL: https://github.com/apache/iceberg/issues/1952#issuecomment-756074477


   I should have a PR open for this by Friday EOD. I've been a little under the weather this week but I believe I have everything working now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #1952: Failure evaluating expressions for NOT + STARTS_WITH predicate

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #1952:
URL: https://github.com/apache/iceberg/issues/1952#issuecomment-753210813


   I'm not. Go for it!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #1952: Failure evaluating expressions for NOT + STARTS_WITH predicate

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #1952:
URL: https://github.com/apache/iceberg/issues/1952#issuecomment-752850067


   Is anybody working on this? If not, I've started working on this a bit and would be happy to take this issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org