You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2018/10/18 20:44:00 UTC

[jira] [Created] (SPARK-25772) Java encoders - switch fields on collectAsList

Dongjoon Hyun created SPARK-25772:
-------------------------------------

             Summary: Java encoders - switch fields on collectAsList
                 Key: SPARK-25772
                 URL: https://issues.apache.org/jira/browse/SPARK-25772
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.1
         Environment: mac os
spark 2.1.1
Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121
            Reporter: Tom


I have the following schema in a dataset -

root
 |-- userId: string (nullable = true)
 |-- data: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- startTime: long (nullable = true)
 |    |    |-- endTime: long (nullable = true)
 |-- offset: long (nullable = true)


 And I have the following classes (+ setter and getters which I omitted for simplicity) -


 
{code:java}
public class MyClass {

    private String userId;

    private Map<String, MyDTO> data;

    private Long offset;
 }

public class MyDTO {

    private long startTime;
    private long endTime;

}
{code}


I collect the result the following way - 


{code:java}
        Encoder<MyClass> myClassEncoder = Encoders.bean(MyClass.class);
        Dataset<MyClass> results = raw_df.as(myClassEncoder);
        List<MyClass> lst = results.collectAsList();

{code}
        
I do several calculations to get the result I want and the result is correct all through the way before I collect it.
This is the result for - 


{code:java}
results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);

{code}

|data[2017-07-01].startTime|data[2017-07-01].endTime|
+-----------------------------+--------------+
|1498854000                |1498870800              |


This is the result after collecting the reuslts for - 


{code:java}
MyClass userData = results.collectAsList().get(0);
MyDTO userDTO = userData.getData().get("2017-07-01");
System.out.println("userDTO startTime: " + userDTO.getStartTime());
System.out.println("userDTO endTime: " + userDTO.getEndTime());

{code}

--
data startTime: 1498870800
data endTime: 1498854000

I tend to believe it is a spark issue. Would love any suggestions on how to bypass it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org