You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/13 09:30:32 UTC

[GitHub] [iceberg] kbendick commented on pull request #2081: Allow binary truncation length to be zero to handle evaluators that encounter empty string values

kbendick commented on pull request #2081:
URL: https://github.com/apache/iceberg/pull/2081#issuecomment-759323889


   > This looks fine to me, given the tests all passed. But I wonder why empty string was not handled and added for testing from the very beginning, I may lack some context here. I see @aokolnychyi added the check for `length > 0`, was there any particular reason for that? Do you expect `length == 0` to be handled before the pushdown?
   
   Yes I would also be very interested to hear if there's any reason to not allow for the truncation function to alter the precondition. I'm open to there being something I didn't catch.
   
   For reference, I checked and the only place that `truncateBinary` is used is in metrics evaluation - namely for `startsWith` (and soon to be notStartsWith).
   
   To be exact, in `InclusiveMetricsEvaluator`,  `ManifestEvaluator`, and `ParquetMetricsRowGroupFilter` (all of them in the visitor and exclusively with startsWith - which is where the bug occurs presumably due to a row with an empty string). It's also used in updating the lower bounds and upper bounds for min/max parquet metrics for `FIXED` and `BINARY` types (in `ParquetUtils`). So it's only used with metrics and literals and it's overall relatively well tested. I can add more tests as a follow up on the binary and fixed types if we feel it necessary, but overall those are all decently well covered in the tests - especially after I add my tests from the `NOT_STARTS_WITH` PR, which covers it further: https://github.com/apache/iceberg/pull/2062.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org