You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/14 22:41:05 UTC

[GitHub] [iceberg] rdblue commented on pull request #2651: Core: add checkShouldKeep for DeleteFilter

rdblue commented on pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#issuecomment-861040175


   @jackye1995, my primary concern with this is that you're reading the data to produce whether or not a row should be present. I think that would mean basically reading all of the data twice in Trino: once into pages and once for the `CloseableIterable<Boolean>` used to mark whether a row in a page should be kept. That doesn't seem worth it to me, although you could argue that if your projection is narrow it would be okay. You may also already be re-using the pages to produce the underlying `StructLike` iterable. If that's the case, then you can ignore from here on...
   
   I think a better approach would be to adapt the pages in Trino to produce a view of a row that implements `StructLike`, at least in a limited capacity (i.e. for top-level or struct columns only, no array/map nesting). That's what happens in Spark vectorization. There is a class that wraps a batch and provides access to a row using its ordinal. If you had that, you'd be able to read the data into pages and still use Iceberg's delete filters to test each row.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org