You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sun Rui (JIRA)" <ji...@apache.org> on 2016/06/22 02:27:57 UTC

[jira] [Commented] (SPARK-12173) Consider supporting DataSet API in SparkR

    [ https://issues.apache.org/jira/browse/SPARK-12173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343212#comment-15343212 ] 

Sun Rui commented on SPARK-12173:
---------------------------------

[~rxin] yes R don't need compile time type safety, but map/reduce functions are popular in R, for example lapply() applies a function to each item of a list or vector. For now, sparkR support spark.lapply() similar to lapply(). The internal implementation internally depends on RDD. We could change the implementation to use Dataset but not exposing Dataset API, something like:
   change the R vector/list to a Dataset
   call Dataset functions on it
   Collect the result back as R vector/list
Not exposing Dataset API means SparkR does not provides distributed vector/list abstraction, SparkR users have to use DataFrame for distributed vector/list , which seems is not convenient to R users. 
[~shivaram] what do you think?

> Consider supporting DataSet API in SparkR
> -----------------------------------------
>
>                 Key: SPARK-12173
>                 URL: https://issues.apache.org/jira/browse/SPARK-12173
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>            Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org