You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kevin Jung <it...@samsung.com> on 2014/06/18 10:36:26 UTC

get schema from SchemaRDD

Can I get schema information from SchemaRDD?
For example,

*case class Person(name:String, Age:Int, Gender:String, Birth:String)
val peopleRDD = sc.textFile("/sample/sample.csv").map(_.split(",")).map(p =>
Person(p(0).toString, p(1).toInt, p(2).toString, p(3).toString))
peopleRDD.saveAsParquetFile("people.parquet")*

(few days later...)

*val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
val loadedPeopleRDD = sqlContext.parquetFile("people.parquet")
loadedPeopleRDD.registerAsTable("peopleTable")*

Someone who doesn't know Person class can't know what columns and types this
table have.
Maybe they want to get schema information from loadedPeopleRDD.
How can I do this?





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/get-schema-from-SchemaRDD-tp7830.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: get schema from SchemaRDD

Posted by Michael Armbrust <mi...@databricks.com>.
We just merged a feature into master that lets you print the schema or view
it as a string (printSchema() and schemaTreeString on SchemaRDD).

There is also this JIRA targeting 1.1 for presenting a nice programatic API
for this information: https://issues.apache.org/jira/browse/SPARK-2179


On Wed, Jun 18, 2014 at 10:36 AM, Kevin Jung <it...@samsung.com> wrote:

> Can I get schema information from SchemaRDD?
> For example,
>
> *case class Person(name:String, Age:Int, Gender:String, Birth:String)
> val peopleRDD = sc.textFile("/sample/sample.csv").map(_.split(",")).map(p
> =>
> Person(p(0).toString, p(1).toInt, p(2).toString, p(3).toString))
> peopleRDD.saveAsParquetFile("people.parquet")*
>
> (few days later...)
>
> *val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext._
> val loadedPeopleRDD = sqlContext.parquetFile("people.parquet")
> loadedPeopleRDD.registerAsTable("peopleTable")*
>
> Someone who doesn't know Person class can't know what columns and types
> this
> table have.
> Maybe they want to get schema information from loadedPeopleRDD.
> How can I do this?
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/get-schema-from-SchemaRDD-tp7830.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>