You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "anthonyjschulte@gmail.com" <an...@gmail.com> on 2014/06/27 00:03:30 UTC

SparkSQL- Nested CaseClass Parquet failure

Hello all:
I am attempting to persist a parquet file comprised of a SchemaRDD of nested
case classes... 

Creating a schemaRDD object seems to work fine, but exception is thrown when
I attempt to persist this object to a parquet file...

my code:

  case class Trivial(trivial: String = "trivial", lt: LessTrivial)
  case class LessTrivial(i: Int = 1)

  val conf = new SparkConf()
    .setMaster( """local[1]""")
    .setAppName("test")

  implicit val sc = new SparkContext(conf)
  val sqlContext = new org.apache.spark.sql.SQLContext(sc)

  import sqlContext._

  val rdd = sqlContext.createSchemaRDD(sc.parallelize(Seq(Trivial("s",
LessTrivial(1)), Trivial("T", LessTrivial(2))))) //no exceptions.

  rdd.saveAsParquetFile("trivial.parquet1") //exception:
java.lang.RuntimeException: Unsupported datatype
StructType(List(StructField(i,IntegerType,true)))


Is persisting SchemaRDDs containing nested case classes supported for
Parquet files?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-CaseClass-Parquet-failure-tp8377.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: SparkSQL- Nested CaseClass Parquet failure

Posted by "anthonyjschulte@gmail.com" <an...@gmail.com>.

Thanks. That might be a good note to add to the official Programming
Guide...


On Thu, Jun 26, 2014 at 5:05 PM, Michael Armbrust [via Apache Spark User
List] <ml...@n3.nabble.com> wrote:

> Nested parquet is not supported in 1.0, but is part of the upcoming 1.0.1
> release.
>
>
> On Thu, Jun 26, 2014 at 3:03 PM, [hidden email]
> <http://user/SendEmail.jtp?type=node&node=8382&i=0> <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=8382&i=1>> wrote:
>
>> Hello all:
>> I am attempting to persist a parquet file comprised of a SchemaRDD of
>> nested
>> case classes...
>>
>> Creating a schemaRDD object seems to work fine, but exception is thrown
>> when
>> I attempt to persist this object to a parquet file...
>>
>> my code:
>>
>>   case class Trivial(trivial: String = "trivial", lt: LessTrivial)
>>   case class LessTrivial(i: Int = 1)
>>
>>   val conf = new SparkConf()
>>     .setMaster( """local[1]""")
>>     .setAppName("test")
>>
>>   implicit val sc = new SparkContext(conf)
>>   val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>
>>   import sqlContext._
>>
>>   val rdd = sqlContext.createSchemaRDD(sc.parallelize(Seq(Trivial("s",
>> LessTrivial(1)), Trivial("T", LessTrivial(2))))) //no exceptions.
>>
>>   rdd.saveAsParquetFile("trivial.parquet1") //exception:
>> java.lang.RuntimeException: Unsupported datatype
>> StructType(List(StructField(i,IntegerType,true)))
>>
>>
>> Is persisting SchemaRDDs containing nested case classes supported for
>> Parquet files?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-CaseClass-Parquet-failure-tp8377.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-CaseClass-Parquet-failure-tp8377p8382.html
>  To unsubscribe from SparkSQL- Nested CaseClass Parquet failure, click
> here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=8377&code=YW50aG9ueWpzY2h1bHRlQGdtYWlsLmNvbXw4Mzc3fDY4MDk1NTEwMA==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
A  N  T  H  O  N  Y   Ⓙ   S  C  H  U  L  T  E




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-CaseClass-Parquet-failure-tp8377p8386.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: SparkSQL- Nested CaseClass Parquet failure

Posted by Michael Armbrust <mi...@databricks.com>.

Nested parquet is not supported in 1.0, but is part of the upcoming 1.0.1
release.


On Thu, Jun 26, 2014 at 3:03 PM, anthonyjschulte@gmail.com <
anthonyjschulte@gmail.com> wrote:

> Hello all:
> I am attempting to persist a parquet file comprised of a SchemaRDD of
> nested
> case classes...
>
> Creating a schemaRDD object seems to work fine, but exception is thrown
> when
> I attempt to persist this object to a parquet file...
>
> my code:
>
>   case class Trivial(trivial: String = "trivial", lt: LessTrivial)
>   case class LessTrivial(i: Int = 1)
>
>   val conf = new SparkConf()
>     .setMaster( """local[1]""")
>     .setAppName("test")
>
>   implicit val sc = new SparkContext(conf)
>   val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
>   import sqlContext._
>
>   val rdd = sqlContext.createSchemaRDD(sc.parallelize(Seq(Trivial("s",
> LessTrivial(1)), Trivial("T", LessTrivial(2))))) //no exceptions.
>
>   rdd.saveAsParquetFile("trivial.parquet1") //exception:
> java.lang.RuntimeException: Unsupported datatype
> StructType(List(StructField(i,IntegerType,true)))
>
>
> Is persisting SchemaRDDs containing nested case classes supported for
> Parquet files?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Nested-CaseClass-Parquet-failure-tp8377.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>