You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Bryan Cutler (JIRA)" <ji...@apache.org> on 2016/11/17 17:59:58 UTC

[jira] [Commented] (ARROW-369) [Python] Add ability to convert multiple record batches at once to pandas

    [ https://issues.apache.org/jira/browse/ARROW-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674374#comment-15674374 ] 

Bryan Cutler commented on ARROW-369:
------------------------------------

I could work on this if you don't mind.  I was already doing this using concat in some of my local testing, so I'll take a crack at the chunked columns implementation.

> [Python] Add ability to convert multiple record batches at once to pandas
> -------------------------------------------------------------------------
>
>                 Key: ARROW-369
>                 URL: https://issues.apache.org/jira/browse/ARROW-369
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Wes McKinney
>              Labels: newbie
>
> Instead of only being able to only convert single single record batches and tables that consist only of single ColumnChunks, we should also support the construction of Pandas DataFrames from multiple RecordBatches. In the most simple way, we would convert each batch to a Pandas DataFrame and then concat them all together. A second (and preferred) implementation would extend the C++ function {{ConvertColumnToPandas}} in {{python/src/pyarrow/adapters/pandas.*}} to work on chunked columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)