You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2019/01/13 19:27:20 UTC

[GitHub] rdblue commented on issue #75: Add startsWith predicate

rdblue commented on issue #75: Add startsWith predicate 
URL: https://github.com/apache/incubator-iceberg/pull/75#issuecomment-453858053
 
 
   Thanks for working on this @renato2099!
   
   Like the version by @Liorba, this can't be committed until the transforms support inclusive projections of the `startsWith` predicate. Otherwise, the inclusive projection will be null and the predicate will be converted to `alwaysTrue` to match partition values resulting in much larger than necessary table scans.
   
   For background, a projection is the predicate converted to work on partition values. The *inclusive* projection is one that must match any partition that contains a matching value, even if those partitions contain values that don't match. (The *strict* projection is one that matches a partition if all values in the partition must match the original predicate.)
   
   For example, if a table is partitioned by `part0=truncate(str, 4)` then partitions are created by taking the first 4 characters of column `str`. The inclusive projection of `startsWith(str, "aa")` is `startWith(part0, "aa")` because all values that match the predicate must be in partitions that start with "aa". Similarly, the inclusive projection of `startsWith(str, "aaaaa")` is `startsWith(part0, "aaaa")` or `equal(part0, "aaaa")` because all values that match must be in partition "aaaa". Note that the partition "aaaa" also includes the value "aaaab" that doesn't match the original data predicate, which is why we call it the "inclusive" projection.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org