You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/09 13:56:55 UTC
[GitHub] [arrow] jorisvandenbossche commented on pull request #7691: ARROW-8655: [Python][Dataset] Provide helper method to deconstruct a partition expression

jorisvandenbossche commented on pull request #7691:
URL: https://github.com/apache/arrow/pull/7691#issuecomment-656143156


   Not the cleanest solution, but could do this relatively quickly because it's based on what I did earlier in https://github.com/apache/arrow/pull/7523. But I think a more proper solution won't be possible before 1.0, and this at least gives a way to get the information needed.
   
   A few examples:
   
   ```python
   In [1]: import pyarrow.dataset as ds                                                                                                                                                                               
   
   In [2]: dataset = ds.dataset("test_filter_fragments_pandas/", format="parquet", partitioning="hive")                                                                                                               
   In [4]: expr = list(dataset.get_fragments())[0].partition_expression                                                                                                                                               
   
   # single partition level with a string
   In [5]: expr                                                                                                                                                                                                       
   Out[5]: <pyarrow.dataset.Expression (part == A:string)>
   
   In [6]: ds._unwrap_partition_expression(expr)                                                                                                                                                                      
   Out[6]: [('part', 'A')]
   
   
   In [7]: dataset = ds.dataset("test_parquet_dask/", format="parquet", partitioning="hive")                                                                                                                          
   In [8]: expr = list(dataset.get_fragments())[0].partition_expression                                                                                                                                               
   
   # two partition levels with integers
   In [9]: expr                                                                                                                                                                                                       
   Out[9]: <pyarrow.dataset.Expression ((year == 2016:int32) and (month == 1:int32))>
   
   In [10]: ds._unwrap_partition_expression(expr)                                                                                                                                                                     
   Out[10]: [('year', 2016), ('month', 1)]
   
   
   In [11]: dataset = ds.dataset("test.parquet", format="parquet")                                                                                                                                                    
   In [12]: expr = list(dataset.get_fragments())[0].partition_expression                                                                                                                                              
   
   # no partitioned dataset
   In [13]: expr                                                                                                                                                                                                      
   Out[13]: <pyarrow.dataset.Expression true:bool>
   
   In [14]: ds._unwrap_partition_expression(expr)                                                                                                                                                                     
   Out[14]: []
   ```
   
   cc @rjzamora 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org