You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Venkata Ramana G (JIRA)" <ji...@apache.org> on 2014/10/30 14:06:33 UTC
[jira] [Comment Edited] (SPARK-4077) A broken string timestamp
value can Spark SQL return wrong values for valid string timestamp values
[ https://issues.apache.org/jira/browse/SPARK-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190020#comment-14190020 ]
Venkata Ramana G edited comment on SPARK-4077 at 10/30/14 1:05 PM:
-------------------------------------------------------------------
In org.apache.hadoop.hive.serde2.io.TimestampWritable.set , if the next entry is null then current time stamp object is being reset.
Not sure why it is done like that in hive. We also can raise a bug in hive.
However because of this hiveinspectors:unwrap cannot use the same timestamp object without creating a copy.
was (Author: gvramana):
In org.apache.hadoop.hive.serde2.io.TimestampWritable.init , if the next entry is null then current time stamp object is being reset.
Not sure why it is done like that in hive. We also can raise a bug in hive.
However because of this hiveinspectors:unwrap cannot use the same timestamp object without creating a copy.
> A broken string timestamp value can Spark SQL return wrong values for valid string timestamp values
> ---------------------------------------------------------------------------------------------------
>
> Key: SPARK-4077
> URL: https://issues.apache.org/jira/browse/SPARK-4077
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.1.0
> Reporter: Yin Huai
> Assignee: Venkata Ramana G
>
> The following case returns wrong results.
> The text file is
> {code}
> 2014-12-11 00:00:00,1
> 2014-12-11astring00:00:00,2
> {code}
> The DDL statement and the query are shown below...
> {code}
> sql("""
> create external table date_test(my_date timestamp, id int)
> row format delimited
> fields terminated by ','
> lines terminated by '\n'
> LOCATION 'dateTest'
> """)
> sql("select * from date_test").collect.foreach(println)
> {code}
> The result is
> {code}
> [1969-12-31 19:00:00.0,1]
> [null,2]
> {code}
> If I change the data to
> {code}
> 2014-12-11 00:00:00,1
> 2014-12-11 00:00:00,2
> {code}
> The result is fine.
> For the data with broken string timestamp value, I tried runSqlHive. The result is fine.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org