You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Oksana Romankova (JIRA)" <ji...@apache.org> on 2016/10/13 19:42:21 UTC

[jira] [Comment Edited] (SPARK-17914) Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp

    [ https://issues.apache.org/jira/browse/SPARK-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572957#comment-15572957 ] 

Oksana Romankova edited comment on SPARK-17914 at 10/13/16 7:42 PM:
--------------------------------------------------------------------

Sean, I can't find any evidence of ISO8601 not supporting nanoseconds. All it says that it supports fraction of a second that should be supplied following comma or dot. Different parsing libraries that support ISO8601 have different precision limitations. For instance in Python, datetime.strptime() only supports precision down to microseconds and will throw an exception if nanoseconds were supplied in input string. While it may not be ideal for those who need to be able to retain nanosecond precision after parsing, it is an acceptable behavior. Those who do not need to retain nanosecond precision can catch, or, preemptively, truncate input string. Spark sql DateTimeUtils.stringToTimestamp() doesn't throw, and doesn't truncate properly, which results in incorrect timestamp. In the example above, the acceptable truncation would be:

"2016-05-14T15:12:14.0034567Z" -> "2016-05-14 15:12:14.003456"
"2016-05-14T15:12:14.000345678Z" -> "2016-05-14 15:12:14.000345"



was (Author: oromankova@cardlytics.com):
Sean, I can't find any evidence of ISO8601 not supporting nanoseconds. All it says that it supports fraction of a second that should be supplied following comma or dot. Different parsing libraries that support ISO8601 have different precision limitations. For instance in Python, datetime.strptime() only supports precision down to microseconds and will throw an exception if nanoseconds were supplied in input string. While it may not be ideal for those who need to be able to retain nanosecond precision after parsing, it is an acceptable behavior. Those who do not need to retain nanosecond precision can catch, or, preemptively, truncate input string. Spark sql DateTimeUtils.stringToTimestamp() doesn't throw, and doesn't truncate properly, which results in incorrect timestamp. In the example above, the acceptable truncation would be:

```
"2016-05-14T15:12:14.0034567Z" -> "2016-05-14 15:12:14.003456"
"2016-05-14T15:12:14.000345678Z" -> "2016-05-14 15:12:14.000345"
```

> Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-17914
>                 URL: https://issues.apache.org/jira/browse/SPARK-17914
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Oksana Romankova
>
> In some cases when timestamps contain nanoseconds they will be parsed incorrectly. 
> Examples: 
> "2016-05-14T15:12:14.0034567Z" -> "2016-05-14 15:12:14.034567"
> "2016-05-14T15:12:14.000345678Z" -> "2016-05-14 15:12:14.345678"
> The issue seems to be happening in DateTimeUtils.stringToTimestamp(). It assumes that only 6 digit fraction of a second will be passed.
> With this being the case I would suggest either discarding nanoseconds automatically, or throw an exception prompting to pre-format timestamps to microsecond precision first before casting to the Timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org