You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Cheng Lian (JIRA)" <ji...@apache.org> on 2015/06/08 16:21:00 UTC
[jira] [Commented] (PARQUET-293) ScalaReflectionException when
trying to convert an RDD of Scrooge to a DataFrame
[ https://issues.apache.org/jira/browse/PARQUET-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577242#comment-14577242 ]
Cheng Lian commented on PARQUET-293:
------------------------------------
Hey Tim, thanks for the details and sorry for the late reply. Didn't got time to investigate this issue last week. I tried to reproduce this issue by modifying the scrooge-maven-demo comes with Scrooge (I was using the most recent master, Git commit hash: 59eb954f8f1bf091a6ddd6e6fb21219aff8cbcc5). The reason of this exception is that, the Scala code Scrooge generates is actually a trait extending {{Product}}:
{code}
trait Junk
extends ThriftStruct
with scala.Product2[Long, String]
with java.io.Serializable
{code}
while Spark expects a case class, something like:
{code}
case class Junk(junkID: Long, junkString: String)
{code}
The key difference here is that the latter case class version has a constructor whose arguments can be transformed into fields of the DataFrame schema. The exception was thrown because Spark can't find such a constructor from trait {{Junk}}.
I think we can resolve this issue since it's an expected behavior.
> ScalaReflectionException when trying to convert an RDD of Scrooge to a DataFrame
> --------------------------------------------------------------------------------
>
> Key: PARQUET-293
> URL: https://issues.apache.org/jira/browse/PARQUET-293
> Project: Parquet
> Issue Type: Bug
> Components: parquet-format
> Affects Versions: 1.6.0
> Reporter: Tim Chan
>
> I get "scala.ScalaReflectionException: <none> is not a term" when I try to convert an RDD of Scrooge to a DataFrame, e.g. myScroogeRDD.toDF
> Has anyone else encountered this problem?
> I'm using Spark 1.3.1, Scala 2.10.4 and scrooge-sbt-plugin 3.16.3
> Here is my thrift IDL:
> {code}
> namespace scala com.junk
> namespace java com.junk
> struct Junk {
> 10: i64 junkID,
> 20: string junkString
> }
> {code}
> from a spark-shell:
> {code}
> val junks = List( Junk(123L, "junk1"), Junk(567L, "junk2"), Junk(789L, "junk3") )
> val junksRDD = sc.parallelize(junks)
> junksRDD.toDF
> {code}
> Exception thrown:
> {noformat}
> scala.ScalaReflectionException: <none> is not a term
> at scala.reflect.api.Symbols$SymbolApi$class.asTerm(Symbols.scala:259)
> at scala.reflect.internal.Symbols$SymbolContextApiImpl.asTerm(Symbols.scala:73)
> at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:148)
> at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
> at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:107)
> at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
> at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:316)
> at org.apache.spark.sql.SQLContext$implicits$.rddToDataFrameHolder(SQLContext.scala:254)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
> at $iwC$$iwC$$iwC.<init>(<console>:40)
> at $iwC$$iwC.<init>(<console>:42)
> at $iwC.<init>(<console>:44)
> at <init>(<console>:46)
> at .<init>(<console>:50)
> at .<clinit>(<console>)
> at .<init>(<console>:7)
> at .<clinit>(<console>)
> at $print(<console>)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
> at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856)
> at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)
> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656)
> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664)
> at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
> at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:944)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)