You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Felix Cheung (JIRA)" <ji...@apache.org> on 2015/09/14 05:50:45 UTC

[jira] [Commented] (SPARK-9325) Support `collect` on DataFrame columns

    [ https://issues.apache.org/jira/browse/SPARK-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742860#comment-14742860 ] 

Felix Cheung commented on SPARK-9325:
-------------------------------------

This turns out to be not straightforward.
Since Column captures the selection and not the data, there is no obvious way to "get data only for this column". Ideally, this should be implemented as 

{code}
  ages <- collect(select(df, df$Age))
{code}

However, df$Age returns a Column which does not reference the DataFrame, whether privately or on the JVM side, therefore it isn't clear how to turn `df$Age` (returns Column) into `select(df, df$Age)` (returns DataFrame)

Any suggestion on how to proceed?


> Support `collect` on DataFrame columns
> --------------------------------------
>
>                 Key: SPARK-9325
>                 URL: https://issues.apache.org/jira/browse/SPARK-9325
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>            Reporter: Shivaram Venkataraman
>
> This is to support code of the form 
> ```
> ages <- collect(df$Age)
> ```
> Right now `df$Age` returns a Column, which has no functions supported.
> Similarly we might consider supporting `head(df$Age)` etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org