You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jeff Reback (JIRA)" <ji...@apache.org> on 2017/07/27 10:08:00 UTC
[jira] [Created] (ARROW-1285) NotImplemented exception creates
empty parquet file
Jeff Reback created ARROW-1285:
----------------------------------
Summary: NotImplemented exception creates empty parquet file
Key: ARROW-1285
URL: https://issues.apache.org/jira/browse/ARROW-1285
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.5.0
Reporter: Jeff Reback
Priority: Minor
This is correctly raising (because categorical is not implemented), but it is creating an empty file.
xref https://github.com/pandas-dev/pandas/pull/15838#pullrequestreview-52576290
{code}
In [2]: df = pd.DataFrame({'a': list('abc'),
...: 'b': list(range(1, 4)),
...: 'c': np.arange(3, 6).astype('u1'),
...: 'd': np.arange(4.0, 7.0, dtype='float64'),
...: 'e': [True, False, True],
...: 'f': pd.Categorical(list('abc')),
...: 'g': pd.date_range('20130101', periods=3),
...: 'h': pd.date_range('20130101', periods=3, tz='US/Eastern'),
...: 'i': pd.date_range('20130101', periods=3, freq='ns')})
...:
In [3]: df.to_parquet('foo.pq')
---------------------------------------------------------------------------
---------------------------------------------------------------------------
ArrowNotImplementedError Traceback (most recent call last)
<ipython-input-3-8070fb7e3e2c> in <module>()
----> 1 df.to_parquet('foo.pq')
/Users/jreback/pandas/pandas/core/frame.py in to_parquet(self, fname, engine, compression, **kwargs)
1620 from pandas.io.parquet import to_parquet
1621 to_parquet(self, fname, engine,
-> 1622 compression=compression, **kwargs)
1623
1624 @Substitution(header='Write out column names. If a list of string is given, \
/Users/jreback/pandas/pandas/io/parquet.py in to_parquet(df, path, engine, compression, **kwargs)
152 raise ValueError("parquet must have string column names")
153
--> 154 return impl.write(df, path, compression=compression)
155
156
/Users/jreback/pandas/pandas/io/parquet.py in write(self, df, path, compression, **kwargs)
53 table = self.api.Table.from_pandas(df, timestamps_to_ms=True)
54 self.api.parquet.write_table(
---> 55 table, path, compression=compression, **kwargs)
56
57 def read(self, path):
/Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/pyarrow/parquet.py in write_table(table, where, row_group_size, version, use_dictionary, compression, use_deprecated_int96_timestamps, **kwargs)
770 version=version,
771 use_deprecated_int96_timestamps=use_deprecated_int96_timestamps)
--> 772 writer = ParquetWriter(where, table.schema, **options)
773 writer.write_table(table, row_group_size=row_group_size)
774 writer.close()
_parquet.pyx in pyarrow._parquet.ParquetWriter.__cinit__()
error.pxi in pyarrow.lib.check_status()
ArrowNotImplementedError: NotImplemented: unhandled type
In [4]: !ls -ltr foo.pq
-rw-r--r-- 1 jreback staff 0 Jul 27 06:03 foo.pq
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)