You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jeff Reback (JIRA)" <ji...@apache.org> on 2017/07/27 10:27:00 UTC

[jira] [Created] (ARROW-1286) PYTHON: support Categorical serialization to/from parquet

Jeff Reback created ARROW-1286:
----------------------------------

             Summary: PYTHON: support Categorical serialization to/from parquet
                 Key: ARROW-1286
                 URL: https://issues.apache.org/jira/browse/ARROW-1286
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Jeff Reback


related to https://issues.apache.org/jira/browse/ARROW-439

pandas Categorical types are not NotImplemented. minimal example.

pandas 0.20.3 & pyarrow 0.5.0

{code}
In [1]: df = pd.DataFrame({'a': pd.Categorical(list('abc'))})

In [2]: df.dtypes
Out[2]: 
a    category
dtype: object

In [4]: import pyarrow

In [5]: import pyarrow.parquet

In [6]: table = pyarrow.Table.from_pandas(df, timestamps_to_ms=True)
   ...: pyarrow.parquet.write_table(
   ...:             table, 'foo.pq')
   ...:             
   ...: 
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
<ipython-input-6-4512e9a2e15e> in <module>()
      1 table = pyarrow.Table.from_pandas(df, timestamps_to_ms=True)
      2 pyarrow.parquet.write_table(
----> 3             table, 'foo.pq')
      4 

/Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/pyarrow/parquet.py in write_table(table, where, row_group_size, version, use_dictionary, compression, use_deprecated_int96_timestamps, **kwargs)
    770         version=version,
    771         use_deprecated_int96_timestamps=use_deprecated_int96_timestamps)
--> 772     writer = ParquetWriter(where, table.schema, **options)
    773     writer.write_table(table, row_group_size=row_group_size)
    774     writer.close()

_parquet.pyx in pyarrow._parquet.ParquetWriter.__cinit__()

error.pxi in pyarrow.lib.check_status()

ArrowNotImplementedError: NotImplemented: unhandled type
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)