You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Venkata Ramana G (JIRA)" <ji...@apache.org> on 2014/10/30 14:06:33 UTC

[jira] [Comment Edited] (SPARK-4077) A broken string timestamp value can Spark SQL return wrong values for valid string timestamp values

    [ https://issues.apache.org/jira/browse/SPARK-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190020#comment-14190020 ] 

Venkata Ramana G edited comment on SPARK-4077 at 10/30/14 1:05 PM:
-------------------------------------------------------------------

In org.apache.hadoop.hive.serde2.io.TimestampWritable.set , if the next entry is null then current time stamp object is being reset. 
Not sure why it is done like that in hive. We also can raise a bug in hive.

However because of this hiveinspectors:unwrap cannot use the same timestamp object without creating a copy. 


was (Author: gvramana):
In org.apache.hadoop.hive.serde2.io.TimestampWritable.init , if the next entry is null then current time stamp object is being reset. 
Not sure why it is done like that in hive. We also can raise a bug in hive.

However because of this hiveinspectors:unwrap cannot use the same timestamp object without creating a copy. 

> A broken string timestamp value can Spark SQL return wrong values for valid string timestamp values
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-4077
>                 URL: https://issues.apache.org/jira/browse/SPARK-4077
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.1.0
>            Reporter: Yin Huai
>            Assignee: Venkata Ramana G
>
> The following case returns wrong results.
> The text file is 
> {code}
> 2014-12-11 00:00:00,1
> 2014-12-11astring00:00:00,2
> {code}
> The DDL statement and the query are shown below...
> {code}
> sql("""
> create external table date_test(my_date timestamp, id int)
> row format delimited
> fields terminated by ','
> lines terminated by '\n'
> LOCATION 'dateTest'
> """)
> sql("select * from date_test").collect.foreach(println)
> {code}
> The result is 
> {code}
> [1969-12-31 19:00:00.0,1]
> [null,2]
> {code}
> If I change the data to 
> {code}
> 2014-12-11 00:00:00,1
> 2014-12-11 00:00:00,2
> {code}
> The result is fine.
> For the data with broken string timestamp value, I tried runSqlHive. The result is fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org