You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2023/01/10 17:05:11 UTC

[GitHub] [iceberg] Fokko commented on pull request #6555: Python: Expression to disjunctive normal form

Fokko commented on PR #6555:
URL: https://github.com/apache/iceberg/pull/6555#issuecomment-1377577817

   Thanks @martindurant for the input, appreciate it.
   
   > You need this to get pyarrow to filter within row-groups? When used with dask, I am surprised you would need this, because ice would be in charge of filtering row groups, and dask would pass the filters directly to pyarrow. That's me guessing without looking at the detail of the code.
   
   So we have Iceberg that does the partition pruning, and the `plan_files`, then we need to filter down on a row level (including skipping row groups would be nice of course).
   
   > I must admit to having come up with the [original simple form](https://github.com/dask/fastparquet/commit/cba69795c0ee2c1ec9a4e276ec728f2b4ae6b2fc#diff-38b333399f04a93e200b35be164c2f66d8c1d99817ca5aff26b3fd01d5079fe9R194) of this, ("col", "==", 0) and the AND form, but not the ANDOR extended form.
   
   The user is able to define the predicate as they like, I'm not sure if we should limit users to `AND`.
   
   > The docstring above is actually incorrect: fastparquet does also support row-wise filtering, but not via dask (and although you save on memory, it can be slower than loading everything and filtering after).
   
   Ah, that's good to know. I went for PyArrow right now because it looks like the most complete in terms of filtering, and then we can add `fastparquet` later on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org