You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by "Tom Augspurger (JIRA)" <ji...@apache.org> on 2017/09/21 16:38:07 UTC

[jira] [Created] (ARROW-1593) [PYTHON] serialize_pandas should pass through the preserve_index keyword

Tom Augspurger created ARROW-1593:
-------------------------------------

             Summary: [PYTHON] serialize_pandas should pass through the preserve_index keyword
                 Key: ARROW-1593
                 URL: https://issues.apache.org/jira/browse/ARROW-1593
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
    Affects Versions: 0.7.0
            Reporter: Tom Augspurger
            Assignee: Tom Augspurger
            Priority: Minor
             Fix For: 0.8.0


I'm doing some benchmarking of Arrow serialization for dask.distributed to serialize dataframes.

Overall things look good compared to the current implementation (using pickle). The biggest difference was pickle's ability to use pandas' RangeIndex to avoid serializing the entire Index of values when possible.

I suspect that a "range type" isn't in scope for arrow, but in the meantime applications using Arrow could detect the `RangeIndex`, and pass {{ pyarrow.serialize_pandas(df, preserve_index=False) }} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)