You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kazuaki Ishizaki (JIRA)" <ji...@apache.org> on 2018/01/01 18:22:00 UTC
[jira] [Updated] (SPARK-22935) Dataset with Java Beans for
java.sql.Date produces incorrect result
[ https://issues.apache.org/jira/browse/SPARK-22935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kazuaki Ishizaki updated SPARK-22935:
-------------------------------------
Description:
The following code prints {{c=0}}. The value of {{c}} must be 2. {{ds}} must not be {{empty}}.
This occurs with whole-stage codegen or without whole-stage codegen.
{code}
public void SPARK22935() {
Dataset<CDR> cdr = spark
.read()
.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", ";")
.csv("CDR_SAMPLE.csv")
.as(Encoders.bean(CDR.class));
Dataset<CDR> ds = cdr.filter((FilterFunction<CDR>) x -> (x.timestamp != null));
long c = ds.count();
cdr.show(2);
ds.show(2);
System.out.println("cnt=" + c);
}
// CDR.java
public class CDR implements java.io.Serializable {
public java.sql.Date timestamp;
}
// CDR_SAMPLE.csv
timestamp
2017-10-29T02:37:07.815Z
2017-10-29T02:38:07.815Z
{code}
result
{code}
+--------------------+
| timestamp|
+--------------------+
|2017-10-29 11:37:...|
|2017-10-29 11:38:...|
+--------------------+
+---------+
|timestamp|
+---------+
+---------+
c=0
{code}
was:
The following code prints {{c=0}}. The value of {{c}} must be 2. {ds} must not be {{empty}}.
{code}
public void SPARK22935() {
Dataset<CDR> cdr = spark
.read()
.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.option("delimiter", ";")
.csv("CDR_SAMPLE.csv")
.as(Encoders.bean(CDR.class));
Dataset<CDR> ds = cdr.filter((FilterFunction<CDR>) x -> (x.timestamp != null));
long c = ds.count();
cdr.show(2);
ds.show(2);
System.out.println("cnt=" + c);
}
// CDR.java
public class CDR implements java.io.Serializable {
public java.sql.Date timestamp;
}
// CDR_SAMPLE.csv
timestamp
2017-10-29T02:37:07.815Z
2017-10-29T02:38:07.815Z
{code}
result
{code}
+--------------------+
| timestamp|
+--------------------+
|2017-10-29 11:37:...|
|2017-10-29 11:38:...|
+--------------------+
+---------+
|timestamp|
+---------+
+---------+
c=0
{code}
> Dataset with Java Beans for java.sql.Date produces incorrect result
> -------------------------------------------------------------------
>
> Key: SPARK-22935
> URL: https://issues.apache.org/jira/browse/SPARK-22935
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.1, 2.3.0
> Reporter: Kazuaki Ishizaki
>
> The following code prints {{c=0}}. The value of {{c}} must be 2. {{ds}} must not be {{empty}}.
> This occurs with whole-stage codegen or without whole-stage codegen.
> {code}
> public void SPARK22935() {
> Dataset<CDR> cdr = spark
> .read()
> .format("csv")
> .option("header", "true")
> .option("inferSchema", "true")
> .option("delimiter", ";")
> .csv("CDR_SAMPLE.csv")
> .as(Encoders.bean(CDR.class));
> Dataset<CDR> ds = cdr.filter((FilterFunction<CDR>) x -> (x.timestamp != null));
> long c = ds.count();
> cdr.show(2);
> ds.show(2);
> System.out.println("cnt=" + c);
> }
> // CDR.java
> public class CDR implements java.io.Serializable {
> public java.sql.Date timestamp;
> }
> // CDR_SAMPLE.csv
> timestamp
> 2017-10-29T02:37:07.815Z
> 2017-10-29T02:38:07.815Z
> {code}
> result
> {code}
> +--------------------+
> | timestamp|
> +--------------------+
> |2017-10-29 11:37:...|
> |2017-10-29 11:38:...|
> +--------------------+
> +---------+
> |timestamp|
> +---------+
> +---------+
> c=0
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org