You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2022/02/28 11:35:00 UTC

[jira] [Resolved] (SPARK-38042) Encoder cannot be found when a tuple component is a type alias for an Array

     [ https://issues.apache.org/jira/browse/SPARK-38042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-38042.
---------------------------------
    Fix Version/s: 3.3.0
                   3.2.2
       Resolution: Fixed

Issue resolved by pull request 35370
[https://github.com/apache/spark/pull/35370]

> Encoder cannot be found when a tuple component is a type alias for an Array
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-38042
>                 URL: https://issues.apache.org/jira/browse/SPARK-38042
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.2, 3.2.0
>            Reporter: Johan Nyström-Persson
>            Priority: Major
>             Fix For: 3.3.0, 3.2.2
>
>
> ScalaReflection.dataTypeFor fails when Array[T] has been aliased for some T, and then the alias is being used as a component of e.g. a product.
> Minimal example, tested in version 3.1.2:
> {code:java}
> type Data = Array[Long]
> val xs:List[(Data,Int)] = List((Array(1),1), (Array(2),2))
> sc.parallelize(xs).toDF("a", "b"){code}
> This gives the following exception:
> {code:java}
> scala.MatchError: Data (of class scala.reflect.internal.Types$AliasNoArgsTypeRef) 
>  at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:104) 
>  at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) 
>  at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:904) 
>  at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:903) 
>  at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) 
>  at org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:88) 
>  at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$6(ScalaReflection.scala:573) 
>  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) 
>  at scala.collection.immutable.List.foreach(List.scala:392) 
>  at scala.collection.TraversableLike.map(TraversableLike.scala:238) 
>  at scala.collection.TraversableLike.map$(TraversableLike.scala:231) 
>  at scala.collection.immutable.List.map(List.scala:298) 
>  at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:562) 
>  at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) 
>  at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:904) 
>  at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:903) 
>  at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) 
>  at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:432) 
>  at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerForType$1(ScalaReflection.scala:421) 
>  at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) 
>  at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:904) 
>  at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:903) 
>  at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) 
>  at org.apache.spark.sql.catalyst.ScalaReflection$.serializerForType(ScalaReflection.scala:413) 
>  at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:55) 
>  at org.apache.spark.sql.Encoders$.product(Encoders.scala:285) 
>  at org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder(SQLImplicits.scala:251) 
>  at org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder$(SQLImplicits.scala:251) 
>  at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:32) 
>  ... 48 elided{code}
> At first glance, I think this could be fixed by changing e.g.
> {code:java}
> getClassNameFromType(tpe) to 
> getClassNameFromType(tpe.dealias)
> {code}
> in ScalaReflection.dataTypeFor. I will try to test that and submit a pull request shortly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org