You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sumit Verma (Jira)" <ji...@apache.org> on 2021/10/26 12:24:00 UTC

[jira] [Assigned] (HIVE-25449) datediff() gives wrong output when run in a tez task with some non-UTC timezone

     [ https://issues.apache.org/jira/browse/HIVE-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sumit Verma reassigned HIVE-25449:
----------------------------------

    Assignee: Sumit Verma  (was: Shubham Chaurasia)

> datediff() gives wrong output when run in a tez task with some non-UTC timezone
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-25449
>                 URL: https://issues.apache.org/jira/browse/HIVE-25449
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>            Reporter: Shubham Chaurasia
>            Assignee: Sumit Verma
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Repro (thanks Qiaosong Dong) - 
> Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}}
> {code}
> create external table test_dt(id string, dt date);
> insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07');
> select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt on dt1.id = dt.id;
> +------+
> | _c0  |
> +------+
> | 6    |
> | 7    |
> +------+
> {code}
> Expected output - 
> {code}
> +------+
> | _c0  |
> +------+
> | 5    |
> | 6    |
> +------+
> {code}
> *Cause*
> This happens because in {{VectorUDFDateDiffColScalar}} class  
> 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to parse the date strings which interprets it to be in local timezone.
> 2. For first column we get a column vector which represents the date as epoch day. This is always in UTC.
> *Solution*
> We need to check other variants of datediff UDFs as well and change the parsing mechanism to always interpret date strings in UTC. 
>  
> I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue.
> {code}
> -          date.setTime(formatter.parse(new String(bytesValue, "UTF-8")).getTime());
> -          baseDate = DateWritableV2.dateToDays(date);
> +          org.apache.hadoop.hive.common.type.Date hiveDate
> +              = org.apache.hadoop.hive.common.type.Date.valueOf(new String(bytesValue, "UTF-8"));
> +          date.setTime(hiveDate.toEpochMilli());
> +          baseDate = hiveDate.toEpochDay();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)