You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "anthonyjschulte@gmail.com" <an...@gmail.com> on 2014/06/27 00:03:30 UTC
SparkSQL- Nested CaseClass Parquet failure
Hello all:
I am attempting to persist a parquet file comprised of a SchemaRDD of nested
case classes...
Creating a schemaRDD object seems to work fine, but exception is thrown when
I attempt to persist this object to a parquet file...
my code:
case class Trivial(trivial: String = "trivial", lt: LessTrivial)
case class LessTrivial(i: Int = 1)
val conf = new SparkConf()
.setMaster( """local[1]""")
.setAppName("test")
implicit val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
val rdd = sqlContext.createSchemaRDD(sc.parallelize(Seq(Trivial("s",
LessTrivial(1)), Trivial("T", LessTrivial(2))))) //no exceptions.
rdd.saveAsParquetFile("trivial.parquet1") //exception:
java.lang.RuntimeException: Unsupported datatype
StructType(List(StructField(i,IntegerType,true)))
Is persisting SchemaRDDs containing nested case classes supported for
Parquet files?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-CaseClass-Parquet-failure-tp8377.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: SparkSQL- Nested CaseClass Parquet failure
Posted by "anthonyjschulte@gmail.com" <an...@gmail.com>.
Thanks. That might be a good note to add to the official Programming
Guide...
On Thu, Jun 26, 2014 at 5:05 PM, Michael Armbrust [via Apache Spark User
List] <ml...@n3.nabble.com> wrote:
> Nested parquet is not supported in 1.0, but is part of the upcoming 1.0.1
> release.
>
>
> On Thu, Jun 26, 2014 at 3:03 PM, [hidden email]
> <http://user/SendEmail.jtp?type=node&node=8382&i=0> <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=8382&i=1>> wrote:
>
>> Hello all:
>> I am attempting to persist a parquet file comprised of a SchemaRDD of
>> nested
>> case classes...
>>
>> Creating a schemaRDD object seems to work fine, but exception is thrown
>> when
>> I attempt to persist this object to a parquet file...
>>
>> my code:
>>
>> case class Trivial(trivial: String = "trivial", lt: LessTrivial)
>> case class LessTrivial(i: Int = 1)
>>
>> val conf = new SparkConf()
>> .setMaster( """local[1]""")
>> .setAppName("test")
>>
>> implicit val sc = new SparkContext(conf)
>> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>
>> import sqlContext._
>>
>> val rdd = sqlContext.createSchemaRDD(sc.parallelize(Seq(Trivial("s",
>> LessTrivial(1)), Trivial("T", LessTrivial(2))))) //no exceptions.
>>
>> rdd.saveAsParquetFile("trivial.parquet1") //exception:
>> java.lang.RuntimeException: Unsupported datatype
>> StructType(List(StructField(i,IntegerType,true)))
>>
>>
>> Is persisting SchemaRDDs containing nested case classes supported for
>> Parquet files?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-CaseClass-Parquet-failure-tp8377.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-CaseClass-Parquet-failure-tp8377p8382.html
> To unsubscribe from SparkSQL- Nested CaseClass Parquet failure, click
> here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=8377&code=YW50aG9ueWpzY2h1bHRlQGdtYWlsLmNvbXw4Mzc3fDY4MDk1NTEwMA==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
--
A N T H O N Y Ⓙ S C H U L T E
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-CaseClass-Parquet-failure-tp8377p8386.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: SparkSQL- Nested CaseClass Parquet failure
Posted by Michael Armbrust <mi...@databricks.com>.
Nested parquet is not supported in 1.0, but is part of the upcoming 1.0.1
release.
On Thu, Jun 26, 2014 at 3:03 PM, anthonyjschulte@gmail.com <
anthonyjschulte@gmail.com> wrote:
> Hello all:
> I am attempting to persist a parquet file comprised of a SchemaRDD of
> nested
> case classes...
>
> Creating a schemaRDD object seems to work fine, but exception is thrown
> when
> I attempt to persist this object to a parquet file...
>
> my code:
>
> case class Trivial(trivial: String = "trivial", lt: LessTrivial)
> case class LessTrivial(i: Int = 1)
>
> val conf = new SparkConf()
> .setMaster( """local[1]""")
> .setAppName("test")
>
> implicit val sc = new SparkContext(conf)
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
> import sqlContext._
>
> val rdd = sqlContext.createSchemaRDD(sc.parallelize(Seq(Trivial("s",
> LessTrivial(1)), Trivial("T", LessTrivial(2))))) //no exceptions.
>
> rdd.saveAsParquetFile("trivial.parquet1") //exception:
> java.lang.RuntimeException: Unsupported datatype
> StructType(List(StructField(i,IntegerType,true)))
>
>
> Is persisting SchemaRDDs containing nested case classes supported for
> Parquet files?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-CaseClass-Parquet-failure-tp8377.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>