You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2020/09/25 23:17:00 UTC
[jira] [Closed] (ARROW-3205) [R] Minimum working example
round-tripping a data frame from R to plasma to pandas
[ https://issues.apache.org/jira/browse/ARROW-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson closed ARROW-3205.
----------------------------------
Resolution: Abandoned
Per mailing list discussions, Plasma is not being maintained anymore
> [R] Minimum working example round-tripping a data frame from R to plasma to pandas
> ----------------------------------------------------------------------------------
>
> Key: ARROW-3205
> URL: https://issues.apache.org/jira/browse/ARROW-3205
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Reporter: James Lamb
> Priority: Minor
>
> I see tremendous opportunity for interoperability between Python and R (two popular languages for data scientists) using Arrow as an interchange format.
> To make this concrete and get developers in those languages interested, I think it would be valuable to create a minimum working example of writing an R data frame into plasma and reading it back up into *pandas* in a separate Python process, and vice versa.
> I could, for example, envision reading a CSV up into a *data.table* in R to do some cleaning and feature engineering, writing that object to *plasma*, then kicking off multiple parallel Python processes to search a space of models. This could demonstrate the benefits of replacing "load this dataset from a file 50 times" with "read off this range of memory in plasma".
>
> I believe pretty strongly that a tangible example like this would meaningfully improve the R community's interest in and engagement with the Arrow project.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)