You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2019/09/10 01:24:00 UTC
[jira] [Resolved] (ARROW-3651) [Python] Datetimes from
non-DateTimeIndex cannot be deserialized
[ https://issues.apache.org/jira/browse/ARROW-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney resolved ARROW-3651.
---------------------------------
Resolution: Fixed
Issue resolved by pull request 5311
[https://github.com/apache/arrow/pull/5311]
> [Python] Datetimes from non-DateTimeIndex cannot be deserialized
> ----------------------------------------------------------------
>
> Key: ARROW-3651
> URL: https://issues.apache.org/jira/browse/ARROW-3651
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.11.1
> Reporter: Armin Berres
> Assignee: Wes McKinney
> Priority: Major
> Labels: parquet, pull-request-available
> Fix For: 0.15.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Given an index which contains datetimes but is no DateTimeIndex writing the file works but reading back fails.
> {code:python}
> df = pd.DataFrame(1, index=pd.MultiIndex.from_arrays([[1,2],[3,4]]), columns=[pd.to_datetime("2018/01/01")])
> # columns index is no DateTimeIndex anymore
> df = df.reset_index().set_index(['level_0', 'level_1'])
> table = pa.Table.from_pandas(df)
> pq.write_table(table, 'test.parquet')
> pq.read_pandas('test.parquet').to_pandas()
> {code}
> results in
> {code}
> KeyError Traceback (most recent call last)
> ~/venv/mpptool/lib/python3.7/site-packages/pyarrow/pandas_compat.py in _pandas_type_to_numpy_type(pandas_type)
> 676 try:
> --> 677 return _pandas_logical_type_map[pandas_type]
> 678 except KeyError:
> KeyError: 'datetime'
> {code}
> The created schema:
> {code}
> 2018-01-01 00:00:00: int64
> level_0: int64
> level_1: int64
> metadata
> --------
> {b'pandas': b'{"index_columns": ["level_0", "level_1"], "column_indexes": [{"n'
> b'ame": null, "field_name": null, "pandas_type": "datetime", "nump'
> b'y_type": "object", "metadata": null}], "columns": [{"name": "201'
> b'8-01-01 00:00:00", "field_name": "2018-01-01 00:00:00", "pandas_'
> b'type": "int64", "numpy_type": "int64", "metadata": null}, {"name'
> b'": "level_0", "field_name": "level_0", "pandas_type": "int64", "'
> b'numpy_type": "int64", "metadata": null}, {"name": "level_1", "fi'
> b'eld_name": "level_1", "pandas_type": "int64", "numpy_type": "int'
> b'64", "metadata": null}], "pandas_version": "0.23.4"}'}
> {code}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)