You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Dave Hirschfeld (Jira)" <ji...@apache.org> on 2020/06/08 09:24:00 UTC
[jira] [Created] (ARROW-9065) Support parsing date32 in dataset
partition folders
Dave Hirschfeld created ARROW-9065:
--------------------------------------
Summary: Support parsing date32 in dataset partition folders
Key: ARROW-9065
URL: https://issues.apache.org/jira/browse/ARROW-9065
Project: Apache Arrow
Issue Type: Improvement
Components: C++, Python
Reporter: Dave Hirschfeld
I have some data which is partitioned by year/month/date. It would be useful if the date could be automatically parsed:
```python
In [17]: schema = pa.schema([("year", pa.int16()), ("month", pa.int8()), ("day", pa.date32())])
In [18]: partition = DirectoryPartitioning(schema)
In [19]: partition.parse("/2020/06/2020-06-08")
---------------------------------------------------------------------------
ArrowNotImplementedError Traceback (most recent call last)
<ipython-input-19-c227c808b401> in <module>
----> 1 partition.parse("/2020/06/2020-06-08")
~\envs\dev\lib\site-packages\pyarrow\_dataset.pyx in pyarrow._dataset.Partitioning.parse()
~\envs\dev\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.pyarrow_internal_check_status()
~\envs\dev\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()
ArrowNotImplementedError: parsing scalars of type date32[day]
```
Not a big issue since you can just use string and convert, but nevertheless it would be nice if it Just Worked
```python
In [22]: schema = pa.schema([("year", pa.int16()), ("month", pa.int8()), ("day", pa.string())])
In [23]: partition = DirectoryPartitioning(schema)
In [24]: partition.parse("/2020/06/2020-06-08")
Out[24]: <pyarrow.dataset.AndExpression (((year == 2020:int16) and (month == 6:int8)) and (day == 2020-06-08:string))>
```
--
This message was sent by Atlassian Jira
(v8.3.4#803005)