You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Tom Augspurger (JIRA)" <ji...@apache.org> on 2017/09/21 16:38:07 UTC
[jira] [Created] (ARROW-1593) [PYTHON] serialize_pandas should pass
through the preserve_index keyword
Tom Augspurger created ARROW-1593:
-------------------------------------
Summary: [PYTHON] serialize_pandas should pass through the preserve_index keyword
Key: ARROW-1593
URL: https://issues.apache.org/jira/browse/ARROW-1593
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Affects Versions: 0.7.0
Reporter: Tom Augspurger
Assignee: Tom Augspurger
Priority: Minor
Fix For: 0.8.0
I'm doing some benchmarking of Arrow serialization for dask.distributed to serialize dataframes.
Overall things look good compared to the current implementation (using pickle). The biggest difference was pickle's ability to use pandas' RangeIndex to avoid serializing the entire Index of values when possible.
I suspect that a "range type" isn't in scope for arrow, but in the meantime applications using Arrow could detect the `RangeIndex`, and pass {{ pyarrow.serialize_pandas(df, preserve_index=False) }}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)