You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2022/10/21 07:21:01 UTC

[jira] [Updated] (HIVE-25292) to_unix_timestamp & unix_timestamp should support ENGLISH format by default

     [ https://issues.apache.org/jira/browse/HIVE-25292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stamatis Zampetakis updated HIVE-25292:
---------------------------------------
    Fix Version/s:     (was: 3.2.0)

I cleared the fixVersion field since this ticket is still open. Please review this ticket and if the fix is already committed to a specific version please set the version accordingly and mark the ticket as RESOLVED.

According to the [JIRA guidelines|https://cwiki.apache.org/confluence/display/Hive/HowToContribute] the fixVersion should be set only when the issue is resolved/closed.

> to_unix_timestamp & unix_timestamp should support ENGLISH format by default
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-25292
>                 URL: https://issues.apache.org/jira/browse/HIVE-25292
>             Project: Hive
>          Issue Type: Improvement
>          Components: Clients
>            Reporter: shezm
>            Assignee: shezm
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Hei
> The to_unix_timestamp function is implemented by GenericUDFToUnixTimeStamp. It uses SimpleDateFormat to parse the time of the string type.
> But SimpleDateFormat does not specify the Locale parameter, that is, the default locale of the jvm machine will be used. This will cause some non-English local machines to be unable to run similar sql like :
>  
> {code:java}
> hive> select to_unix_timestamp('16/Mar/2017:12:25:01', 'dd/MMM/yyy:HH:mm:ss');
> OK
> NULL
> hive> select unix_timestamp('16/Mar/2017:12:25:01', 'dd/MMM/yyy:HH:mm:ss');
> OK
> NULL
> {code}
>  
> At the same time, I found that in spark, to_unix_timestamp & unix_timestamp also use SimpleDateFormat, and spark uses Locale.US by default, but this will make it impossible to use local language syntax. For example, in the Chinese environment, I can parse this result correctly in hive,
>  
> {code:java}
> hive> select to_unix_timestamp('16/三月/2017:12:25:01', 'dd/MMMM/yyy:HH:mm:ss');
> OK
> 1489638301
> Time taken: 0.147 seconds, Fetched: 1 row(s)
> OK
> {code}
> But spark will return Null.
> Because English dates are more common dates, I think two SimpleDateFormats are needed. The new SimpleDateFormat is initialized with the Locale.ENGLISH parameter.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)