You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Arjen P. de Vries (JIRA)" <ji...@apache.org> on 2016/03/17 17:00:35 UTC

[jira] [Commented] (SPARK-13456) Cannot create encoders for case classes defined in Spark shell after upgrading to Scala 2.11

    [ https://issues.apache.org/jira/browse/SPARK-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199774#comment-15199774 ] 

Arjen P. de Vries commented on SPARK-13456:
-------------------------------------------

I benefited from learning about this workaround (which is implicitly given in the discussion above, but I only realized afterwards):

´´´
import sqlContext.implicits._
case class T(a: Int, b: Double)
org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this)
val ds = Seq(1 -> T(1, 1D), 2 -> T(2, 2D)).toDS()
´´´

> Cannot create encoders for case classes defined in Spark shell after upgrading to Scala 2.11
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-13456
>                 URL: https://issues.apache.org/jira/browse/SPARK-13456
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Cheng Lian
>            Priority: Blocker
>
> Spark 2.0 started to use Scala 2.11 by default since [PR #10608|https://github.com/apache/spark/pull/10608].  Unfortunately, after this upgrade, Spark fails to create encoders for case classes defined in REPL:
> {code}
> import sqlContext.implicits._
> case class T(a: Int, b: Double)
> val ds = Seq(1 -> T(1, 1D), 2 -> T(2, 2D)).toDS()
> {code}
> Exception thrown:
> {noformat}
> org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class `T` without access to the scope that this class was defined in.
> Try moving this class out of its parent class.;
>   at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:565)
>   at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:561)
>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:261)
>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:304)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:353)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:333)
>   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:353)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:251)
>   at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.resolveDeserializer(Analyzer.scala:561)
>   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolve(ExpressionEncoder.scala:315)
>   at org.apache.spark.sql.Dataset.<init>(Dataset.scala:81)
>   at org.apache.spark.sql.Dataset.<init>(Dataset.scala:92)
>   at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:482)
>   at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:140)
>   ... 51 elided
> {noformat}
> However, existing Dataset REPL test case does pass:
> {code}
>   test("SPARK-2576 importing SQLContext.implicits._") {
>     // We need to use local-cluster to test this case.
>     val output = runInterpreter("local-cluster[1,1,1024]",
>       """
>         |val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>         |import sqlContext.implicits._
>         |case class TestCaseClass(value: Int)
>         |sc.parallelize(1 to 10).map(x => TestCaseClass(x)).toDF().collect()
>         |
>         |// Test Dataset Serialization in the REPL
>         |Seq(TestCaseClass(1)).toDS().collect()
>       """.stripMargin)
>     assertDoesNotContain("error:", output)
>     assertDoesNotContain("Exception", output)
>   }
> {code}
> One possible clue is that, {{ReplSuite}} calls {{SparkILoop}} directly, while Spark shell is started by {{o.a.s.repl.Main}}, which also sets option {{-Yrepl-class-based}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org