You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/12 13:28:51 UTC

[GitHub] [arrow] jorisvandenbossche commented on issue #9160: How to filter parquet column with None using Python?

jorisvandenbossche commented on issue #9160:
URL: https://github.com/apache/arrow/issues/9160#issuecomment-758654770


   The problem is that a null is not equal to itself, so you can't filter nulls with an `==` equality check. 
   
   For the new dataset API, we are working on more powerful filter expressions, and you can already achieve this:
   
   ```
   In [21]: import pyarrow.dataset as ds
   
   In [22]: pq.read_table('data.parquet', filters=~ds.field("column").is_valid()).to_pandas()
   Out[22]: 
     column
   0   None
   1   None
   ``` 
   
   We should probably also add a `is_null()` method to make this case a bit more straightforward. 
   
   
   ---
   
   General note: we prefer the user mailing list for such questions, see https://arrow.apache.org/community/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org