You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Loic Descotte (JIRA)" <ji...@apache.org> on 2017/02/02 09:41:51 UTC
[jira] [Updated] (SPARK-19434) Dataframe/Dataset unserialization
failing with Map
[ https://issues.apache.org/jira/browse/SPARK-19434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Loic Descotte updated SPARK-19434:
----------------------------------
Description:
If I serialize a type containing a scala Map into a dataframe, it will fail to unserialize it unless I flag the type as a collection.Map in the case class. Else it will make a mismatch between collection.Map (trait) and collection.immutable.map (default implementation).
{quote}
case class Person(name: String, details: Map[String, String])
val peopleSeq = Seq(
Person("bob", Map("age" -> "30", "address" -> "blabla")),
Person("john", Map("age" -> "25", "address" -> "blabla"))
)
val peopleDS = peopleSeq.toDS
peopleDS.collect should contain theSameElementsAs peopleSeq
// failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java',
// Line 127, Column 40: No applicable constructor/method found for actual parameters "java.lang.String, scala.collection.Map";
// candidates are: "Person(java.lang.String, scala.collection.immutable.Map)"
{quote}
With this trick it is working :
{quote}
case class PersonWithForcedMapType(name: String, details: collection.Map[String, String])
val peopleSeq = Seq(
PersonWithForcedMapType("bob", Map("age" -> "30", "address" -> "blabla")),
PersonWithForcedMapType("john", Map("age" -> "25", "address" -> "blabla"))
)
val peopleDS = peopleSeq.toDS
peopleDS.collect should contain theSameElementsAs peopleSeq //OK
{quote}
It is the same if I don't use datasets API but dataframes instead :
{quote}
val peopleSeq = Seq(
Person("bob", Map("age" -> "30", "address" -> "blabla")),
Person("john", Map("age" -> "25", "address" -> "blabla"))
)
val peopleDF = peopleSeq.toDF
val peopleResult = peopleDF.map { row =>
val name = row.getAs[String](0)
val details = row.getAs[Map[String, String]](1)
Person(name, details)
}
peopleResult.collect should contain theSameElementsAs peopleSeq
// failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java',
// Line 127, Column 40: No applicable constructor/method found for actual parameters "java.lang.String, scala.collection.Map";
// candidates are: "Person(java.lang.String, scala.collection.immutable.Map)"
{quote}
was:
If I serialize a type containing a scala Map into a dataframe, it will fail to unserialize it unless I flag the type as a collection.Map in the case class. Else it will make a mismatch between collection.Map (trait) and collection.immutable.map (default implementation).
{quote}
case class Person(name: String, details: Map[String, String])
val peopleSeq = Seq(
Person("bob", Map("age" -> "30", "address" -> "blabla")),
Person("john", Map("age" -> "25", "address" -> "blabla"))
)
val peopleDS = peopleSeq.toDS
peopleDS.collect should contain theSameElementsAs peopleSeq
// failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java',
// Line 127, Column 40: No applicable constructor/method found for actual parameters "java.lang.String, scala.collection.Map";
// candidates are: "Person(java.lang.String, scala.collection.immutable.Map)"
{quote}
With this trick it is working :
{quote}
case class PersonWithForcedMapType(name: String, details: collection.Map[String, String])
val peopleSeq = Seq(
PersonWithForcedMapType("bob", Map("age" -> "30", "address" -> "blabla")),
PersonWithForcedMapType("john", Map("age" -> "25", "address" -> "blabla"))
)
val peopleDS = peopleSeq.toDS
peopleDS.collect should contain theSameElementsAs peopleSeq //OK
{quote}
It is the same if I don't use datasets API but dataframes instead :
{quote}
val peopleSeq = Seq(
Person("bob", Map("age" -> "30", "address" -> "blabla")),
Person("john", Map("age" -> "25", "address" -> "blabla"))
)
val peopleDF = peopleSeq.toDF
val peopleResult = peopleDF.map { row =>
val name = row.getAs[String](0)
val details = row.getAs[Map[String, String]](1)
Person(name, details)
}
peopleResult.collect should contain theSameElementsAs peopleSeq
// failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java',
// Line 127, Column 40: No applicable constructor/method found for actual parameters "java.lang.String, scala.collection.Map";
// candidates are: "Person(java.lang.String, scala.collection.immutable.Map)"
{quote}
> Dataframe/Dataset unserialization failing with Map
> --------------------------------------------------
>
> Key: SPARK-19434
> URL: https://issues.apache.org/jira/browse/SPARK-19434
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.2, 2.1.0
> Reporter: Loic Descotte
>
> If I serialize a type containing a scala Map into a dataframe, it will fail to unserialize it unless I flag the type as a collection.Map in the case class. Else it will make a mismatch between collection.Map (trait) and collection.immutable.map (default implementation).
> {quote}
> case class Person(name: String, details: Map[String, String])
> val peopleSeq = Seq(
> Person("bob", Map("age" -> "30", "address" -> "blabla")),
> Person("john", Map("age" -> "25", "address" -> "blabla"))
> )
> val peopleDS = peopleSeq.toDS
> peopleDS.collect should contain theSameElementsAs peopleSeq
> // failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java',
> // Line 127, Column 40: No applicable constructor/method found for actual parameters "java.lang.String, scala.collection.Map";
> // candidates are: "Person(java.lang.String, scala.collection.immutable.Map)"
> {quote}
> With this trick it is working :
> {quote}
> case class PersonWithForcedMapType(name: String, details: collection.Map[String, String])
> val peopleSeq = Seq(
> PersonWithForcedMapType("bob", Map("age" -> "30", "address" -> "blabla")),
> PersonWithForcedMapType("john", Map("age" -> "25", "address" -> "blabla"))
> )
> val peopleDS = peopleSeq.toDS
> peopleDS.collect should contain theSameElementsAs peopleSeq //OK
> {quote}
> It is the same if I don't use datasets API but dataframes instead :
> {quote}
> val peopleSeq = Seq(
> Person("bob", Map("age" -> "30", "address" -> "blabla")),
> Person("john", Map("age" -> "25", "address" -> "blabla"))
> )
> val peopleDF = peopleSeq.toDF
> val peopleResult = peopleDF.map { row =>
> val name = row.getAs[String](0)
> val details = row.getAs[Map[String, String]](1)
> Person(name, details)
> }
> peopleResult.collect should contain theSameElementsAs peopleSeq
> // failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java',
> // Line 127, Column 40: No applicable constructor/method found for actual parameters "java.lang.String, scala.collection.Map";
> // candidates are: "Person(java.lang.String, scala.collection.immutable.Map)"
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org