You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Loic Descotte (JIRA)" <ji...@apache.org> on 2017/02/02 09:40:51 UTC

[jira] [Created] (SPARK-19434) Dataframe/Dataset unserialization failing with Map

Loic Descotte created SPARK-19434:
-------------------------------------

             Summary: Dataframe/Dataset unserialization failing with Map
                 Key: SPARK-19434
                 URL: https://issues.apache.org/jira/browse/SPARK-19434
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.0, 2.0.2
            Reporter: Loic Descotte


If I serialize a type containing a scala Map into a dataframe, it will fail to unserialize it unless I flag the type as a  collection.Map in the case class. Else it will make a mismatch between collection.Map (trait) and collection.immutable.map (default implementation). 

{quote}
    case class Person(name: String, details: Map[String, String])
 val peopleSeq = Seq(
      Person("bob", Map("age" -> "30", "address" -> "blabla")),
      Person("john", Map("age" -> "25", "address" -> "blabla"))
    )

    val peopleDS = peopleSeq.toDS


    peopleDS.collect should contain theSameElementsAs peopleSeq

    // failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java',
    // Line 127, Column 40: No applicable constructor/method found for actual parameters "java.lang.String, scala.collection.Map";
    // candidates are: "Person(java.lang.String, scala.collection.immutable.Map)"
{quote}

With this trick it is working : 


{quote}
case class PersonWithForcedMapType(name: String, details: collection.Map[String, String])

val peopleSeq = Seq(
      PersonWithForcedMapType("bob", Map("age" -> "30", "address" -> "blabla")),
      PersonWithForcedMapType("john", Map("age" -> "25", "address" -> "blabla"))
    )

    val peopleDS = peopleSeq.toDS


    peopleDS.collect should contain theSameElementsAs peopleSeq //OK
{quote}

It is the same if I don't use datasets API but dataframes instead : 

{quote}
 val peopleSeq = Seq(
      Person("bob", Map("age" -> "30", "address" -> "blabla")),
      Person("john", Map("age" -> "25", "address" -> "blabla"))
    )

    val peopleDF = peopleSeq.toDF

    val peopleResult = peopleDF.map { row =>
      val name = row.getAs[String](0)
      val details = row.getAs[Map[String, String]](1)
      Person(name, details)
    }

    peopleResult.collect should contain theSameElementsAs peopleSeq

    // failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java',
    // Line 127, Column 40: No applicable constructor/method found for actual parameters "java.lang.String, scala.collection.Map";
    // candidates are: "Person(java.lang.String, scala.collection.immutable.Map)"
{quote}





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org