You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Mark Mikolajczak <ma...@flayranalytics.co.uk> on 2016/11/28 16:26:23 UTC

RDD to Dataframe Error


Hi All,

Hoping you can help:


I have created an RDD from a NOSQL database and I want to convert the RDD to a data frame. I have tried many options but all result in errors.

    val df = sc.couchbaseQuery(test).map(_.value).collect().foreach(println)


{"accountStatus":"AccountOpen","custId":"140034"}
{"accountStatus":"AccountOpen","custId":"140385"}
{"accountStatus":"AccountClosed","subId":"10795","custId":"139698","subStatus":"Active"}
{"accountStatus":"AccountClosed","subId":"11364","custId":"140925","subStatus":"Paused"}
{"accountStatus":"AccountOpen","subId":"10413","custId":"138842","subStatus":"Active"}
{"accountStatus":"AccountOpen","subId":"10414","custId":"138842","subStatus":"Active"}
{"accountStatus":"AccountClosed","subId":"11314","custId":"140720","subStatus":"Paused"}
{"accountStatus":"AccountOpen","custId":"139166"}
{"accountStatus":"AccountClosed","subId":"10735","custId":"139558","subStatus":"Paused"}
{"accountStatus":"AccountOpen","custId":"139575"}
df: Unit = ()
I have tried adding .toDF() to the end of my code and also creating a schema and using createDataFrame but receive errors. Whats the best approach to converting the RDD to Dataframe?

import org.apache.spark.sql.types._

// The schema is encoded in a string
val schemaString = "accountStatus subId custId subStatus"

// Generate the schema based on the string of schema
val fields = schemaString.split(" ")
  .map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)
//

val peopleDF = spark.createDataFrame(df,schema)


Re: RDD to Dataframe Error

Posted by Mohit Jaggi <mo...@gmail.com>.
looks like you have RDD of JSON. Try this: http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets <http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets>

Mohit Jaggi
Founder,
Data Orchard LLC
www.dataorchardllc.com




> On Nov 28, 2016, at 9:49 AM, Jun Kim <i2...@gmail.com> wrote:
> 
> Hi, Mark!
> 
> Which kind of error message do you get?
> 
> The simplest way to convert RDD to DF is just importing implicits and use toDF
> 
> import spark.implicits._
> val df = rdd.toDF
> 
> :-)
> 
> 2016년 11월 29일 (화) 오전 1:26, Mark Mikolajczak <mark@flayranalytics.co.uk <ma...@flayranalytics.co.uk>>님이 작성:
> 
> 
> Hi All,
> 
> Hoping you can help:
> 
> 
> I have created an RDD from a NOSQL database and I want to convert the RDD to a data frame. I have tried many options but all result in errors.
> 
>     val df = sc.couchbaseQuery(test).map(_.value).collect().foreach(println)
> 
> 
> {"accountStatus":"AccountOpen","custId":"140034"}
> {"accountStatus":"AccountOpen","custId":"140385"}
> {"accountStatus":"AccountClosed","subId":"10795","custId":"139698","subStatus":"Active"}
> {"accountStatus":"AccountClosed","subId":"11364","custId":"140925","subStatus":"Paused"}
> {"accountStatus":"AccountOpen","subId":"10413","custId":"138842","subStatus":"Active"}
> {"accountStatus":"AccountOpen","subId":"10414","custId":"138842","subStatus":"Active"}
> {"accountStatus":"AccountClosed","subId":"11314","custId":"140720","subStatus":"Paused"}
> {"accountStatus":"AccountOpen","custId":"139166"}
> {"accountStatus":"AccountClosed","subId":"10735","custId":"139558","subStatus":"Paused"}
> {"accountStatus":"AccountOpen","custId":"139575"}
> df: Unit = ()
> I have tried adding .toDF() to the end of my code and also creating a schema and using createDataFrame but receive errors. Whats the best approach to converting the RDD to Dataframe?
> 
> import org.apache.spark.sql.types._
> 
> // The schema is encoded in a string
> val schemaString = "accountStatus subId custId subStatus"
> 
> // Generate the schema based on the string of schema
> val fields = schemaString.split(" ")
>   .map(fieldName => StructField(fieldName, StringType, nullable = true))
> val schema = StructType(fields)
> //
> 
> val peopleDF = spark.createDataFrame(df,schema)
> 
> -- 
> Taejun Kim
> 
> Data Mining Lab.
> School of Electrical and Computer Engineering
> University of Seoul


Re: RDD to Dataframe Error

Posted by Jun Kim <i2...@gmail.com>.
Hi, Mark!

Which kind of error message do you get?

The simplest way to convert RDD to DF is just importing implicits and use
toDF

import spark.implicits._
val df = rdd.toDF

:-)

2016년 11월 29일 (화) 오전 1:26, Mark Mikolajczak <ma...@flayranalytics.co.uk>님이
작성:

>
>
> Hi All,
>
> Hoping you can help:
>
>
> I have created an RDD from a NOSQL database and I want to convert the RDD
> to a data frame. I have tried many options but all result in errors.
>
>     val df = sc.couchbaseQuery(test).map(_.value).collect().foreach(println)
>
>
> {"accountStatus":"AccountOpen","custId":"140034"}
> {"accountStatus":"AccountOpen","custId":"140385"}
> {"accountStatus":"AccountClosed","subId":"10795","custId":"139698","subStatus":"Active"}
> {"accountStatus":"AccountClosed","subId":"11364","custId":"140925","subStatus":"Paused"}
> {"accountStatus":"AccountOpen","subId":"10413","custId":"138842","subStatus":"Active"}
> {"accountStatus":"AccountOpen","subId":"10414","custId":"138842","subStatus":"Active"}
> {"accountStatus":"AccountClosed","subId":"11314","custId":"140720","subStatus":"Paused"}
> {"accountStatus":"AccountOpen","custId":"139166"}
> {"accountStatus":"AccountClosed","subId":"10735","custId":"139558","subStatus":"Paused"}
> {"accountStatus":"AccountOpen","custId":"139575"}
> df: Unit = ()
>
> I have tried adding .toDF() to the end of my code and also creating a
> schema and using createDataFrame but receive errors. Whats the best
> approach to converting the RDD to Dataframe?
>
> import org.apache.spark.sql.types._
>
> // The schema is encoded in a string
> val schemaString = "accountStatus subId custId subStatus"
>
> // Generate the schema based on the string of schema
> val fields = schemaString.split(" ")
>   .map(fieldName => StructField(fieldName, StringType, nullable = true))
> val schema = StructType(fields)
>
> //
>
> val peopleDF = spark.createDataFrame(df,schema)
>
>
> --
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul