You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/07/28 02:26:00 UTC
[jira] [Resolved] (ARROW-1285) PYTHON: NotImplemented exception
creates empty parquet file
[ https://issues.apache.org/jira/browse/ARROW-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney resolved ARROW-1285.
---------------------------------
Resolution: Fixed
Issue resolved by pull request 902
[https://github.com/apache/arrow/pull/902]
> PYTHON: NotImplemented exception creates empty parquet file
> -----------------------------------------------------------
>
> Key: ARROW-1285
> URL: https://issues.apache.org/jira/browse/ARROW-1285
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.5.0
> Reporter: Jeff Reback
> Assignee: Wes McKinney
> Priority: Minor
> Fix For: 0.6.0
>
>
> This is correctly raising (because categorical is not implemented), but it is creating an empty file.
> xref https://github.com/pandas-dev/pandas/pull/15838#pullrequestreview-52576290
> {code}
> In [2]: df = pd.DataFrame({'a': list('abc'),
> ...: 'b': list(range(1, 4)),
> ...: 'c': np.arange(3, 6).astype('u1'),
> ...: 'd': np.arange(4.0, 7.0, dtype='float64'),
> ...: 'e': [True, False, True],
> ...: 'f': pd.Categorical(list('abc')),
> ...: 'g': pd.date_range('20130101', periods=3),
> ...: 'h': pd.date_range('20130101', periods=3, tz='US/Eastern'),
> ...: 'i': pd.date_range('20130101', periods=3, freq='ns')})
> ...:
> In [3]: df.to_parquet('foo.pq')
> ---------------------------------------------------------------------------
> ---------------------------------------------------------------------------
> ArrowNotImplementedError Traceback (most recent call last)
> <ipython-input-3-8070fb7e3e2c> in <module>()
> ----> 1 df.to_parquet('foo.pq')
> /Users/jreback/pandas/pandas/core/frame.py in to_parquet(self, fname, engine, compression, **kwargs)
> 1620 from pandas.io.parquet import to_parquet
> 1621 to_parquet(self, fname, engine,
> -> 1622 compression=compression, **kwargs)
> 1623
> 1624 @Substitution(header='Write out column names. If a list of string is given, \
> /Users/jreback/pandas/pandas/io/parquet.py in to_parquet(df, path, engine, compression, **kwargs)
> 152 raise ValueError("parquet must have string column names")
> 153
> --> 154 return impl.write(df, path, compression=compression)
> 155
> 156
> /Users/jreback/pandas/pandas/io/parquet.py in write(self, df, path, compression, **kwargs)
> 53 table = self.api.Table.from_pandas(df, timestamps_to_ms=True)
> 54 self.api.parquet.write_table(
> ---> 55 table, path, compression=compression, **kwargs)
> 56
> 57 def read(self, path):
> /Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/pyarrow/parquet.py in write_table(table, where, row_group_size, version, use_dictionary, compression, use_deprecated_int96_timestamps, **kwargs)
> 770 version=version,
> 771 use_deprecated_int96_timestamps=use_deprecated_int96_timestamps)
> --> 772 writer = ParquetWriter(where, table.schema, **options)
> 773 writer.write_table(table, row_group_size=row_group_size)
> 774 writer.close()
> _parquet.pyx in pyarrow._parquet.ParquetWriter.__cinit__()
> error.pxi in pyarrow.lib.check_status()
> ArrowNotImplementedError: NotImplemented: unhandled type
> In [4]: !ls -ltr foo.pq
> -rw-r--r-- 1 jreback staff 0 Jul 27 06:03 foo.pq
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)