You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/09 13:56:55 UTC
[GitHub] [arrow] jorisvandenbossche commented on pull request #7691: ARROW-8655: [Python][Dataset] Provide helper method to deconstruct a partition expression
jorisvandenbossche commented on pull request #7691:
URL: https://github.com/apache/arrow/pull/7691#issuecomment-656143156
Not the cleanest solution, but could do this relatively quickly because it's based on what I did earlier in https://github.com/apache/arrow/pull/7523. But I think a more proper solution won't be possible before 1.0, and this at least gives a way to get the information needed.
A few examples:
```python
In [1]: import pyarrow.dataset as ds
In [2]: dataset = ds.dataset("test_filter_fragments_pandas/", format="parquet", partitioning="hive")
In [4]: expr = list(dataset.get_fragments())[0].partition_expression
# single partition level with a string
In [5]: expr
Out[5]: <pyarrow.dataset.Expression (part == A:string)>
In [6]: ds._unwrap_partition_expression(expr)
Out[6]: [('part', 'A')]
In [7]: dataset = ds.dataset("test_parquet_dask/", format="parquet", partitioning="hive")
In [8]: expr = list(dataset.get_fragments())[0].partition_expression
# two partition levels with integers
In [9]: expr
Out[9]: <pyarrow.dataset.Expression ((year == 2016:int32) and (month == 1:int32))>
In [10]: ds._unwrap_partition_expression(expr)
Out[10]: [('year', 2016), ('month', 1)]
In [11]: dataset = ds.dataset("test.parquet", format="parquet")
In [12]: expr = list(dataset.get_fragments())[0].partition_expression
# no partitioned dataset
In [13]: expr
Out[13]: <pyarrow.dataset.Expression true:bool>
In [14]: ds._unwrap_partition_expression(expr)
Out[14]: []
```
cc @rjzamora
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org