You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Cheng Lian (JIRA)" <ji...@apache.org> on 2015/06/08 16:21:00 UTC
[jira] [Commented] (PARQUET-293) ScalaReflectionException when trying to convert an RDD of Scrooge to a DataFrame

    [ https://issues.apache.org/jira/browse/PARQUET-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577242#comment-14577242 ] 

Cheng Lian commented on PARQUET-293:
------------------------------------

Hey Tim, thanks for the details and sorry for the late reply. Didn't got time to investigate this issue last week.  I tried to reproduce this issue by modifying the scrooge-maven-demo comes with Scrooge (I was using the most recent master, Git commit hash: 59eb954f8f1bf091a6ddd6e6fb21219aff8cbcc5).  The reason of this exception is that, the Scala code Scrooge generates is actually a trait extending {{Product}}:
{code}
trait Junk
  extends ThriftStruct
  with scala.Product2[Long, String]
  with java.io.Serializable
{code}
while Spark expects a case class, something like:
{code}
case class Junk(junkID: Long, junkString: String)
{code}
The key difference here is that the latter case class version has a constructor whose arguments can be transformed into fields of the DataFrame schema.  The exception was thrown because Spark can't find such a constructor from trait {{Junk}}.

I think we can resolve this issue since it's an expected behavior.

> ScalaReflectionException when trying to convert an RDD of Scrooge to a DataFrame
> --------------------------------------------------------------------------------
>
>                 Key: PARQUET-293
>                 URL: https://issues.apache.org/jira/browse/PARQUET-293
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-format
>    Affects Versions: 1.6.0
>            Reporter: Tim Chan
>
> I get "scala.ScalaReflectionException: <none> is not a term" when I try to convert an RDD of Scrooge to a DataFrame, e.g. myScroogeRDD.toDF
> Has anyone else encountered this problem? 
> I'm using Spark 1.3.1, Scala 2.10.4 and scrooge-sbt-plugin 3.16.3
> Here is my thrift IDL:
> {code}
> namespace scala com.junk
> namespace java com.junk
> struct Junk {
>     10: i64 junkID,
>     20: string junkString
> }
> {code}
> from a spark-shell: 
> {code}
> val junks = List( Junk(123L, "junk1"), Junk(567L, "junk2"), Junk(789L, "junk3") )
> val junksRDD = sc.parallelize(junks)
> junksRDD.toDF
> {code}
> Exception thrown:
> {noformat}
> scala.ScalaReflectionException: <none> is not a term
> 	at scala.reflect.api.Symbols$SymbolApi$class.asTerm(Symbols.scala:259)
> 	at scala.reflect.internal.Symbols$SymbolContextApiImpl.asTerm(Symbols.scala:73)
> 	at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:148)
> 	at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
> 	at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:107)
> 	at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
> 	at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:316)
> 	at org.apache.spark.sql.SQLContext$implicits$.rddToDataFrameHolder(SQLContext.scala:254)
> 	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
> 	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
> 	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
> 	at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
> 	at $iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
> 	at $iwC$$iwC$$iwC.<init>(<console>:40)
> 	at $iwC$$iwC.<init>(<console>:42)
> 	at $iwC.<init>(<console>:44)
> 	at <init>(<console>:46)
> 	at .<init>(<console>:50)
> 	at .<clinit>(<console>)
> 	at .<init>(<console>:7)
> 	at .<clinit>(<console>)
> 	at $print(<console>)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
> 	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
> 	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
> 	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
> 	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
> 	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856)
> 	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901)
> 	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)
> 	at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656)
> 	at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664)
> 	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669)
> 	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996)
> 	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
> 	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
> 	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> 	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:944)
> 	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058)
> 	at org.apache.spark.repl.Main$.main(Main.scala:31)
> 	at org.apache.spark.repl.Main.main(Main.scala)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
> 	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
> 	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
> 	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
> 	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)