You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tom (JIRA)" <ji...@apache.org> on 2017/07/13 14:39:00 UTC
[jira] [Updated] (SPARK-21402) Java encoders - switch fields on
collectAsList
[ https://issues.apache.org/jira/browse/SPARK-21402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tom updated SPARK-21402:
------------------------
Description:
I have the following schema in a dataset -
root
|-- userId: string (nullable = true)
|-- data: map (nullable = true)
| |-- key: string
| |-- value: struct (valueContainsNull = true)
| | |-- startTime: long (nullable = true)
| | |-- endTime: long (nullable = true)
|-- offset: long (nullable = true)
And I have the following classes (+ setter and getters which I omitted for simplicity) -
{code:java}
public class MyClass {
private String userId;
private Map<String, MyDTO> data;
private Long offset;
}
public class MyDTO {
private long startTime;
private long endTime;
}
{code}
I collect the result the following way -
{code:java}
Encoder<MyClass> myClassEncoder = Encoders.bean(MyClass.class);
Dataset<MyClass> results = raw_df.as(myClassEncoder);
List<MyClass> lst = results.collectAsList();
{code}
I do several calculations to get the result I want and the result is correct all through the way before I collect it.
This is the result for -
{code:java}
results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);
{code}
|data[2017-07-01].startTime|data[2017-07-01].endTime|
+-----------------------------+--------------+
|1498854000 |1498870800 |
This is the result after collecting the reuslts for -
{code:java}
MyClass userData = results.collectAsList().get(0);
MyDTO userDTO = userData.getData().get("2017-07-01");
System.out.println("userDTO startTime: " + userDTO.getSleepStartTime());
System.out.println("userDTO endTime: " + userDTO.getSleepEndTime());
{code}
--
data startTime: 1498870800
data endTime: 1498854000
I tend to believe it is a spark issue. Would love any suggestions on how to bypass it.
was:
I have the following schema in a dataset -
root
|-- userId: string (nullable = true)
|-- data: map (nullable = true)
| |-- key: string
| |-- value: struct (valueContainsNull = true)
| | |-- startTime: long (nullable = true)
| | |-- endTime: long (nullable = true)
|-- offset: long (nullable = true)
And I have the following classes (+ setter and getters which I omitted for simplicity) -
public class MyClass {
private String userId;
private Map<String, MyDTO> data;
private Long offset;
}
public class MyDTO {
private long startTime;
private long endTime;
}
I collect the result the following way -
Encoder<MyClass> myClassEncoder = Encoders.bean(MyClass.class);
Dataset<MyClass> results = raw_df.as(myClassEncoder);
List<MyClass> lst = results.collectAsList();
I do several calculations to get the result I want and the result is correct all through the way before I collect it.
This is the result for -
results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);
|data[2017-07-01].startTime|data[2017-07-01].endTime|
+------------------------------------+--------------+
|1498854000 |1498870800 |
This is the result after collecting the reuslts for -
MyClass userData = results.collectAsList().get(0);
MyDTO userDTO = userData.getData().get("2017-07-01");
System.out.println("userDTO startTime: " + userDTO.getSleepStartTime());
System.out.println("userDTO endTime: " + userDTO.getSleepEndTime());
--
data startTime: 1498870800
data endTime: 1498854000
I tend to believe it is a spark issue. Would love any suggestions on how to bypass it.
> Java encoders - switch fields on collectAsList
> ----------------------------------------------
>
> Key: SPARK-21402
> URL: https://issues.apache.org/jira/browse/SPARK-21402
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.1.1
> Environment: mac os
> spark 2.1.1
> Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121
> Reporter: Tom
> Priority: Minor
>
> I have the following schema in a dataset -
> root
> |-- userId: string (nullable = true)
> |-- data: map (nullable = true)
> | |-- key: string
> | |-- value: struct (valueContainsNull = true)
> | | |-- startTime: long (nullable = true)
> | | |-- endTime: long (nullable = true)
> |-- offset: long (nullable = true)
> And I have the following classes (+ setter and getters which I omitted for simplicity) -
>
> {code:java}
> public class MyClass {
> private String userId;
> private Map<String, MyDTO> data;
> private Long offset;
> }
> public class MyDTO {
> private long startTime;
> private long endTime;
> }
> {code}
> I collect the result the following way -
> {code:java}
> Encoder<MyClass> myClassEncoder = Encoders.bean(MyClass.class);
> Dataset<MyClass> results = raw_df.as(myClassEncoder);
> List<MyClass> lst = results.collectAsList();
> {code}
>
> I do several calculations to get the result I want and the result is correct all through the way before I collect it.
> This is the result for -
> {code:java}
> results.select(results.col("data").getField("2017-07-01").getField("startTime")).show(false);
> {code}
> |data[2017-07-01].startTime|data[2017-07-01].endTime|
> +-----------------------------+--------------+
> |1498854000 |1498870800 |
> This is the result after collecting the reuslts for -
> {code:java}
> MyClass userData = results.collectAsList().get(0);
> MyDTO userDTO = userData.getData().get("2017-07-01");
> System.out.println("userDTO startTime: " + userDTO.getSleepStartTime());
> System.out.println("userDTO endTime: " + userDTO.getSleepEndTime());
> {code}
> --
> data startTime: 1498870800
> data endTime: 1498854000
> I tend to believe it is a spark issue. Would love any suggestions on how to bypass it.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org