You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/05/04 23:43:00 UTC

[jira] [Resolved] (ARROW-5572) [Python] raise error message when passing invalid filter in parquet reading

     [ https://issues.apache.org/jira/browse/ARROW-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney resolved ARROW-5572.
---------------------------------
    Fix Version/s: 1.0.0
       Resolution: Fixed

Issue resolved by pull request 7052
[https://github.com/apache/arrow/pull/7052]

> [Python] raise error message when passing invalid filter in parquet reading
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-5572
>                 URL: https://issues.apache.org/jira/browse/ARROW-5572
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.13.0
>            Reporter: Joris Van den Bossche
>            Assignee: Joris Van den Bossche
>            Priority: Minor
>              Labels: dataset-parquet-read, parquet, pull-request-available
>             Fix For: 1.0.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> From https://stackoverflow.com/questions/56522977/using-predicates-to-filter-rows-from-pyarrow-parquet-parquetdataset
> For example, when specifying a column in the filter which is a normal column and not a key in your partitioned folder hierarchy, the filter gets silently ignored. It would be nice to get an error message for this.  
> Reproducible example:
> {code:python}
> df = pd.DataFrame({'a': [0, 0, 1, 1], 'b': [0, 1, 0, 1], 'c': [1, 2, 3, 4]})
> table = pa.Table.from_pandas(df)
> pq.write_to_dataset(table, 'test_parquet_row_filters', partition_cols=['a'])
> # filter on 'a' (partition column) -> works
> pq.read_table('test_parquet_row_filters', filters=[('a', '=', 1)]).to_pandas()
> # filter on normal column (in future could do row group filtering) -> silently does nothing
> pq.read_table('test_parquet_row_filters', filters=[('b', '=', 1)]).to_pandas()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)