You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Naveen Madhire <vm...@umail.iu.edu> on 2015/07/07 19:29:02 UTC

DataFrame question

Hi All,

I am working with dataframes and have been struggling with this thing, any
pointers would be helpful.

I've a Json file with the schema like this,

links: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- desc: string (nullable = true)
 |    |    |-- id: string (nullable = true)


I want to fetch id and desc as an RDD like this RDD[(String,String)]

i am using dataframes    *df.select("links.desc","links.id
<http://links.id/>").rdd*

the above dataframe is returning an RDD like this
RDD[(List(String),List(String)]


So, links:[{"one","1"},{"two","2"},{"three","3"}] json should return and
RDD[(one,1),(two,2),(three,3)]

can anyone tell me how the dataframe select should be modified?

Re: DataFrame question

Posted by Michael Armbrust <mi...@databricks.com>.
You probably want to explode the array to produce one row per element:

df.select(explode(df("links")).alias("link"))

On Tue, Jul 7, 2015 at 10:29 AM, Naveen Madhire <vm...@umail.iu.edu>
wrote:

> Hi All,
>
> I am working with dataframes and have been struggling with this thing, any
> pointers would be helpful.
>
> I've a Json file with the schema like this,
>
> links: array (nullable = true)
>  |    |-- element: struct (containsNull = true)
>  |    |    |-- desc: string (nullable = true)
>  |    |    |-- id: string (nullable = true)
>
>
> I want to fetch id and desc as an RDD like this RDD[(String,String)]
>
> i am using dataframes    *df.select("links.desc","links.id
> <http://links.id/>").rdd*
>
> the above dataframe is returning an RDD like this
> RDD[(List(String),List(String)]
>
>
> So, links:[{"one","1"},{"two","2"},{"three","3"}] json should return and
> RDD[(one,1),(two,2),(three,3)]
>
> can anyone tell me how the dataframe select should be modified?
>