You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Brett Randall (JIRA)" <ji...@apache.org> on 2016/06/05 04:49:59 UTC

[jira] [Comment Edited] (SPARK-15723) SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

    [ https://issues.apache.org/jira/browse/SPARK-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315725#comment-15315725 ] 

Brett Randall edited comment on SPARK-15723 at 6/5/16 4:49 AM:
---------------------------------------------------------------

Unfortunately I don't think that will help in this case, or any other case where there is ambiguity for a client-supplied short-code TZ name as-interpreted by a certain server-side local default timezone, or as shown in this case, in a test.

{{SimpleDateFormat}} parsing has some unusual side-effects on its underlying {{Calendar}} instance, which are triggered by a {{parse()}}.  There is a warning of this behaviour in the JavaDoc for {{java.text.DateFormat.setTimeZone(TimeZone)}}:

{quote}
The {{TimeZone}} set by this method may be overwritten as a result of a call to the parse method.
{quote}

If the SDF's current {{Calendar}} has a {{TimeZone}} (which is either {{TimeZone.getDefault()}} or as later set as is done in the second attempt in {{SimpleDateParam}}) which does not match a timezone that is decoded from a short-code TZ in the parsed string, SDF makes the assumption that the best "local" interpretation of the abbreviation is the correct one, and updates its local {{Calendar}} instance to the new TZ.  So in my test-case, if my {{TimeZone.getDefault()}} is {{Australia/Sydney}}, which has a short-name alias of {{EST}}, when {{EST}} is encountered in the parsed date string, SDF says "they must mean Eastern (Australian) Standard Time, I'll interpret as that and update my calendar", whereas that code running with a different default TZ will pick {{America/New_York}} or {{-0500}}.  It does this during the parsing and before the computation of the datetime, so the result is in the guessed timezone, not any timezone you pre-seed with {{setTimeZone()}} or even {{setCalendar()}} - they will always be overwritten if the TZ is parsed.  You could block the update in an SDF sub-class, but you are still left with the same problem - how should {{EST}}, which is necessarily ambiguous according to the old list of short codes, be interpreted?

The JDK code is in method {{java.text.SimpleDateFormat.subParseZoneString\(String, int, CalendarBuilder\)}}, where it can be seen to call {{setTimeZone\(...\)}} on itself if it parses a TZ that differs from the existing SDF one.

So any pre-setting of a {{TimeZone}} or {{Calendar}} is moot if you allow the short-TZ code to be parsed - it is ambiguous for many values, but it is the caller's fault for using a deprecated short TZ form.  The result completely depends on {{TimeZone.getDefault()}}, which you don't want to change or rely-on.  It is unfortunate that SDF does this, but these short-forms are completely deprecated anyway.  Clients should not be using them - they should be using an RFC822 form, which avoids ambiguity.

Setting a fixed TZ and removing the opportunity to provide an ambiguous code would necessitate an API change, and you would no-longer be able to specify a TZ.  That might be OK, but being an API change you might want deprecation, I don't know.


was (Author: brett_s_r):
Unfortunately I don't think that will help in this case, or any other case where there is ambiguity for a client-supplied short-code TZ name as-interpreted by a certain server-side local default timezone, or as shown in this case, in a test.

{{SimpleDateFormat}} parsing has some unusual side-effects on its underlying {{Calendar}} instance, which are triggered by a {{parse()}}.  There is a warning of this behaviour in the JavaDoc for {{java.text.DateFormat.setTimeZone(TimeZone)}}:

{quote}
The {{TimeZone}} set by this method may be overwritten as a result of a call to the parse method.
{quote}

If the SDF's current {{Calendar}} has a {{TimeZone}} (which is either {{TimeZone.getDefault()}} or as later set as is done in the second attempt in {{SimpleDateParam}}) which does not match a timezone that is decoded from a short-code TZ in the parsed string, SDF makes the assumption that the best "local" interpretation of the abbreviation is the correct one, and updates its local {{Calendar}} instance to the new TZ.  So in my test-case, if my {{TimeZone.getDefault()}} is {{Australia/Sydney}}, which has a short-name alias of {{EST}}, when {{EST}} is encountered in the parsed date string, SDF says "they must mean Eastern (Australian) Standard Time, I'll interpret as that and update my calendar", whereas that code running with a different default TZ will pick {{America/New_York}} or {{-0500}}.  It does this during the parsing and before the computation of the datetime, so the result is in the guessed timezone, not any timezone you pre-seed with {{setTimeZone()}} or even {{setCalendar()}} - they will always be overwritten if the TZ is parsed.  You could block the update in an SDF sub-class, but you are still left with the same problem - how should {{EST}}, which is necessarily ambiguous according to the old list of short codes, be interpreted?

The JDK code is in method {{java.text.SimpleDateFormat.subParseZoneString\(String, int, CalendarBuilder\)}}, where it can be seen to call {{setTimeZone\(...\)}} on itself if it parses a TZ that differs from the existing SDF one.

So any pre-setting of a {{TimeZone}} or {{Calendar}} is moot if you allow the short-TZ code to be parsed - it is ambiguous for many values, but it is the caller's fault for using a deprecated short TZ form.  The result completely depends on {{TimeZone.getDefault()}}, which you don't want to change or rely-on.  It is unfortunate that SDF does this, but these short-forms are completely deprecated anyway.  Clients should not be using them - they should be using an RFC822 form, which avoids ambiguity.


> SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-15723
>                 URL: https://issues.apache.org/jira/browse/SPARK-15723
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.1
>            Reporter: Brett Randall
>            Priority: Minor
>              Labels: test
>
> {{org.apache.spark.status.api.v1.SimpleDateParamSuite}} has this assertion:
> {code}
>     new SimpleDateParam("2015-02-20T17:21:17.190EST").timestamp should be (1424470877190L)
> {code}
> This test is fragile and fails when executing in an environment where the local default timezone causes {{EST}} to be interpreted as something other than US Eastern Standard Time.  If your local timezone is {{Australia/Sydney}}, then {{EST}} equates to {{GMT+10}} and you will get:
> {noformat}
> date parsing *** FAILED ***
> 1424413277190 was not equal to 1424470877190 (SimpleDateParamSuite.scala:29)
> {noformat}
> In short, {{SimpleDateFormat}} is sensitive to the local default {{TimeZone}} when interpreting short zone names.  According to the {{TimeZone}} javadoc, they ought not be used:
> {quote}
> Three-letter time zone IDs
> For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such as "PST", "CTT", "AST") are also supported. However, their use is deprecated because the same abbreviation is often used for multiple time zones (for example, "CST" could be U.S. "Central Standard Time" and "China Standard Time"), and the Java platform can then only recognize one of them.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org