You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2020/06/25 15:24:00 UTC

[jira] [Comment Edited] (ARROW-8301) [C++][Python][R] Handle ChunkedArray and Table in C data interface

    [ https://issues.apache.org/jira/browse/ARROW-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145016#comment-17145016 ] 

Neal Richardson edited comment on ARROW-8301 at 6/25/20, 3:23 PM:
------------------------------------------------------------------

The use case I'm thinking of is: the Python package I'm using that does things with Arrow, and from which I want to pull data into R, always returns a Table. I can't "just" export its RecordBatches because Tables don't contain RecordBatches, they contain ChunkedArrays. So to export the Table, it would be something like

{code}
table.export_schema()
for col in table.chunked_arrays():
    for a in col.chunks():
        a.export_array()
{code}

and reassemble the Table. Looking at the R and Python code we have now that does the Array and RecordBatch work, I'm not sure how simple that would be to do, and I wonder if there's a better way.


was (Author: npr):
The use case I'm thinking of is: the Python package I'm using that does things with Arrow, and from which I want to pull data into R, always returns a Table. I can't "just" export its RecordBatches because Tables don't contain RecordBatches, they contain ChunkedArrays. So to export the Table, it would be something like

{code}
export_schema()
for col in table.chunked_arrays():
    for a in col.chunks():
        export_array()
{code}

and reassemble the Table. Looking at the R and Python code we have now that does the Array and RecordBatch work, I'm not sure how simple that would be to do, and I wonder if there's a better way.

> [C++][Python][R] Handle ChunkedArray and Table in C data interface
> ------------------------------------------------------------------
>
>                 Key: ARROW-8301
>                 URL: https://issues.apache.org/jira/browse/ARROW-8301
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C, C++, Python, R
>            Reporter: Neal Richardson
>            Assignee: Antoine Pitrou
>            Priority: Critical
>             Fix For: 1.0.0
>
>
> Currently the C data interface does Array and RecordBatch, but we're also going to need ChunkedArray and Table. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)