You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Gary Clark <gc...@gmail.com> on 2020/09/04 13:31:12 UTC

[Python] filtering tables for null values

Hi,

I'm currently reading my table in as such:

```
filters = [
    ('column', '=', 'null')
]

df= pq.read_table('./joins/parquet/', filters=filters)

print(df.shape)
```

This gives me 0 rows even though I know there are thousands of nulls in my
data. If I read the data like this, I can see all the nulls

```
df= pq.read_table('./joins/parquet/')
print(df.column( 'column').null_count)
```

Is there something wrong with my filter? Or has this not been implemented?

-- 
Gary Clark
*Data Scientist & Data Engineer*
*B.S. Mechanical Engineering, Howard University '13*
+1 (717) 798-6916
gclarkjr5@gmail.com

Re: [Python] filtering tables for null values

Posted by Neal Richardson <ne...@gmail.com>.
Hi Gary,
I believe there are `is_null` and `is_valid` functions, and I would expect
that those are better to use for filtering on missing values than `==`. Try
those out and let us know.

Neal

On Fri, Sep 4, 2020 at 6:31 AM Gary Clark <gc...@gmail.com> wrote:

> Hi,
>
> I'm currently reading my table in as such:
>
> ```
> filters = [
>     ('column', '=', 'null')
> ]
>
> df= pq.read_table('./joins/parquet/', filters=filters)
>
> print(df.shape)
> ```
>
> This gives me 0 rows even though I know there are thousands of nulls in my
> data. If I read the data like this, I can see all the nulls
>
> ```
> df= pq.read_table('./joins/parquet/')
> print(df.column( 'column').null_count)
> ```
>
> Is there something wrong with my filter? Or has this not been implemented?
>
> --
> Gary Clark
> *Data Scientist & Data Engineer*
> *B.S. Mechanical Engineering, Howard University '13*
> +1 (717) 798-6916
> gclarkjr5@gmail.com
>