You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2016/10/13 20:06:20 UTC

[jira] [Commented] (ARROW-288) Implement Arrow adapter for Spark Datasets

    [ https://issues.apache.org/jira/browse/ARROW-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573017#comment-15573017 ] 

Wes McKinney commented on ARROW-288:
------------------------------------

hi [~freiss] and [~jlaskowski] -- we made pretty big progress on the C++ side to be able to be closer to full interoperability with the Arrow Java library. We still need to do some integration testing, but it would be great to start exploring the technical plan for making this happen. I was just talking with [~rxin] about this the other day, so there may be someone on the Spark side who could help with this effort, too. 

The first step is to convert a Spark Dataset into 1 or more Arrow record batches, including metadata conversion, and then converting back. The Java <-> C++ data movement itself is a comparatively minor task because that is just sending a serialized byte buffer through the existing protocol. We can test this out in Python using the Arrow <-> pandas bridge which has already been completed. 

Let me know if anyone will have the bandwidth to work on this and we can coordinate. thanks!

> Implement Arrow adapter for Spark Datasets
> ------------------------------------------
>
>                 Key: ARROW-288
>                 URL: https://issues.apache.org/jira/browse/ARROW-288
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Java - Vectors
>            Reporter: Wes McKinney
>
> It would be valuable for applications that use Arrow to be able to 
> * Convert between Spark DataFrames/Datasets and Java Arrow vectors
> * Send / Receive Arrow record batches / Arrow file format RPCs to / from Spark 
> * Allow PySpark to use Arrow for messaging in UDF evaluation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)