You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jeff Reback (JIRA)" <ji...@apache.org> on 2017/07/27 10:27:00 UTC
[jira] [Created] (ARROW-1286) PYTHON: support Categorical
serialization to/from parquet
Jeff Reback created ARROW-1286:
----------------------------------
Summary: PYTHON: support Categorical serialization to/from parquet
Key: ARROW-1286
URL: https://issues.apache.org/jira/browse/ARROW-1286
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Jeff Reback
related to https://issues.apache.org/jira/browse/ARROW-439
pandas Categorical types are not NotImplemented. minimal example.
pandas 0.20.3 & pyarrow 0.5.0
{code}
In [1]: df = pd.DataFrame({'a': pd.Categorical(list('abc'))})
In [2]: df.dtypes
Out[2]:
a category
dtype: object
In [4]: import pyarrow
In [5]: import pyarrow.parquet
In [6]: table = pyarrow.Table.from_pandas(df, timestamps_to_ms=True)
...: pyarrow.parquet.write_table(
...: table, 'foo.pq')
...:
...:
---------------------------------------------------------------------------
ArrowNotImplementedError Traceback (most recent call last)
<ipython-input-6-4512e9a2e15e> in <module>()
1 table = pyarrow.Table.from_pandas(df, timestamps_to_ms=True)
2 pyarrow.parquet.write_table(
----> 3 table, 'foo.pq')
4
/Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/pyarrow/parquet.py in write_table(table, where, row_group_size, version, use_dictionary, compression, use_deprecated_int96_timestamps, **kwargs)
770 version=version,
771 use_deprecated_int96_timestamps=use_deprecated_int96_timestamps)
--> 772 writer = ParquetWriter(where, table.schema, **options)
773 writer.write_table(table, row_group_size=row_group_size)
774 writer.close()
_parquet.pyx in pyarrow._parquet.ParquetWriter.__cinit__()
error.pxi in pyarrow.lib.check_status()
ArrowNotImplementedError: NotImplemented: unhandled type
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)