You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sankar Hariappan <Sa...@microsoft.com.INVALID> on 2021/09/30 13:29:50 UTC

RE: [EXTERNAL] Raise exception instead of silent change for new DateTimeformatter

Hi @Stamatis Zampetakis<ma...@gmail.com>, @David<ma...@gmail.com>,

Our current implementation using DateTimeFormatter is not backward compatible and it leads to migration issues.
One of our customer who have this use-case where we don't have a better options to migrate.

Hive 1.2/Spark 2.4 (Shared metastore):
Set VM time zone to Asia/Bangkok.
INSERT values ("1400-01-01 00:00:00") into parquet_table; // Here, parquet writer converts the data into UTC (- 07:00:00) and stored it.

Migrate to Hive 3.x/Spark 3.x (Shared metastore)::
Set VM time zone to Asia/Bangkok.
SELECT ts from parquet_table; // Hive returns different value whereas Spark (spark.sql.legacy.timeParserPolicy=LEGACY) returns 1400-01-01 00:00:00

It is not easy to change thousands of Hive scripts to handle this difference and it adds to migration cost.
I think, it is necessary to enable backward compatibility for smooth migration. Pls share your thoughts.

Thanks,
Sankar

From: Ashish Sharma <as...@gmail.com>
Sent: 29 September 2021 19:11
To: dev@hive.apache.org; user@hive.apache.org
Cc: sankarh@apache.org
Subject: [EXTERNAL] Raise exception instead of silent change for new DateTimeformatter


History

Hive 1.2 -

VM time zone set to Asia/Bangkok

Query - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 UTC','yyyy-MM-dd HH:mm:ss z'));

Result - 1800-01-01 07:00:00

Implementation details -

SimpleDateFormat formatter = new SimpleDateFormat(pattern);
Long unixtime = formatter.parse(textval).getTime() / 1000;
Date date = new Date(unixtime * 1000L);

https://docs.oracle.com/javase/8/docs/api/java/util/Date.html<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F8%2Fdocs%2Fapi%2Fjava%2Futil%2FDate.html&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136779324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xxOBj5zDm29DTpPYC6rlgz639Dhn7vpHxALYHdn9VO0%3D&reserved=0> . In official documentation they have mentioned that "Unfortunately, the API for these functions was not amenable to internationalization and The corresponding methods in Date are deprecated" . Due to that this is producing wrong result

latest hive -

set hive.local.time.zone=Asia/Bangkok;

Query - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 UTC','yyyy-MM-dd HH:mm:ss z'));

Result - 1800-01-01 06:42:04

Implementation details -

DateTimeFormatter dtformatter = new DateTimeFormatterBuilder()
.parseCaseInsensitive()
.appendPattern(pattern)
.toFormatter();

ZonedDateTime zonedDateTime = ZonedDateTime.parse(textval,dtformatter).withZoneSameInstant(ZoneId.of(timezone));
Long dttime = zonedDateTime.toInstant().getEpochSecond();



Problem-

Now SimpleDateFormat has been replaced with DateTimeFormatter which is not backward compatible. Causing issues at times for migration to the new version. Because the older data written using Hive 1.x or 2.x is not compatible with DateTimeFormatter.



Solution -

Introduce an config "hive.legacy.timeParserPolicy" with following values -
1. EXCEPTION - compare value of both SimpleDateFormat & DateTimeFormatter raise exception if doesn't match
2. LEGACY - use SimpleDateFormat
3. CORRECTED - use DateTimeFormatter

This will help hive user in the following manner -
1. Migrate to new version using LEGACY
2. Find values which are not compatible with the new version - EXCEPTION
3. Use latest date apis - CORRECTED

Note: apache spark also face the same issue https://issues.apache.org/jira/browse/SPARK-30668<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-30668&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136779324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yYKfiVbSW%2FbfD5V1leqB8cH349Qb6FzYtSn5ClcZrqc%3D&reserved=0>



Hive jira - https://issues.apache.org/jira/browse/HIVE-25576<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-25576&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136789283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=W9ZZoPgtBeA69eF%2FonPtdXdp15PG4%2F1M6rc99G%2BErcc%3D&reserved=0>



Thanks

Ashish Sharma

Re: [EXTERNAL] Raise exception instead of silent change for new DateTimeformatter

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hi Ashish, Sankar,

I am not sure if you both refer to the same problem.

As far as it concerns reading and writing to Parquet/Avro files the
compatibility issues should be resolved as part of HIVE-25104 [1], and
HIVE-25219 [2].
If I recall correctly we added some config properties to ease migration.

Regarding the UNIX_TIMESTAMP function indeed I remember seeing many JIRA
cases reporting problems. Let's find the relation with HIVE-25576 [3] and
try to address them.
We could opt for a new property but let's continue the discussion in the
respective JIRA case. People who have an opinion about the topic can jump
in there.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-25104
[2] https://issues.apache.org/jira/browse/HIVE-25219
[3] https://issues.apache.org/jira/browse/HIVE-25576


On Thu, Sep 30, 2021 at 3:29 PM Sankar Hariappan <
Sankar.Hariappan@microsoft.com> wrote:

> Hi @Stamatis Zampetakis <za...@gmail.com>, @David <da...@gmail.com>,
>
>
>
> Our current implementation using DateTimeFormatter is not backward
> compatible and it leads to migration issues.
>
> One of our customer who have this use-case where we don’t have a better
> options to migrate.
>
>
>
> *Hive 1.2/Spark 2.4 (Shared metastore):*
>
> Set VM time zone to Asia/Bangkok.
>
> INSERT values (“1400-01-01 00:00:00”) into parquet_table; // Here, parquet
> writer converts the data into UTC (- 07:00:00) and stored it.
>
>
>
> *Migrate to Hive 3.x/Spark 3.x (Shared metastore)::*
>
> Set VM time zone to Asia/Bangkok.
>
> SELECT ts from parquet_table; // Hive returns different value whereas
> Spark (spark.sql.legacy.timeParserPolicy=LEGACY) returns 1400-01-01 00:00:00
>
>
>
> It is not easy to change thousands of Hive scripts to handle this
> difference and it adds to migration cost.
>
> I think, it is necessary to enable backward compatibility for smooth
> migration. Pls share your thoughts.
>
>
>
> Thanks,
>
> Sankar
>
>
>
> *From:* Ashish Sharma <as...@gmail.com>
> *Sent:* 29 September 2021 19:11
> *To:* dev@hive.apache.org; user@hive.apache.org
> *Cc:* sankarh@apache.org
> *Subject:* [EXTERNAL] Raise exception instead of silent change for new
> DateTimeformatter
>
>
>
> *History*
>
> *Hive 1.2* -
>
> VM time zone set to Asia/Bangkok
>
> *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00
> UTC','yyyy-MM-dd HH:mm:ss z'));
>
> *Result* - 1800-01-01 07:00:00
>
> *Implementation details* -
>
> SimpleDateFormat formatter = new SimpleDateFormat(pattern);
> Long unixtime = formatter.parse(textval).getTime() / 1000;
> Date date = new Date(unixtime * 1000L);
>
> https://docs.oracle.com/javase/8/docs/api/java/util/Date.html
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F8%2Fdocs%2Fapi%2Fjava%2Futil%2FDate.html&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136779324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xxOBj5zDm29DTpPYC6rlgz639Dhn7vpHxALYHdn9VO0%3D&reserved=0> .
> In official documentation they have mentioned that "Unfortunately, the API
> for these functions was not amenable to internationalization and The
> corresponding methods in Date are deprecated" . Due to that this is
> producing wrong result
>
> *latest hive* -
>
> set hive.local.time.zone=Asia/Bangkok;
>
> *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00
> UTC','yyyy-MM-dd HH:mm:ss z'));
>
> *Result* - 1800-01-01 06:42:04
>
> *Implementation details* -
>
> DateTimeFormatter dtformatter = new DateTimeFormatterBuilder()
> .parseCaseInsensitive()
> .appendPattern(pattern)
> .toFormatter();
>
> ZonedDateTime zonedDateTime =
> ZonedDateTime.parse(textval,dtformatter).withZoneSameInstant(ZoneId.of(timezone));
> Long dttime = zonedDateTime.toInstant().getEpochSecond();
>
>
>
> *Problem*-
>
> Now *SimpleDateFormat* has been replaced with *DateTimeFormatter* which
> is not backward compatible. Causing issues at times for migration to the
> new version. Because the older data written using Hive 1.x or 2.x is not
> compatible with *DateTimeFormatter*.
>
>
>
> *Solution -*
>
> Introduce an config "hive.legacy.timeParserPolicy" with following values -
> *1. EXCEPTION* - compare value of
> both SimpleDateFormat & DateTimeFormatter raise exception if doesn't match
> *2. LEGACY *- use SimpleDateFormat
> *3. CORRECTED *- use DateTimeFormatter
>
> This will help hive user in the following manner -
> 1. Migrate to new version using *LEGACY*
> 2. Find values which are not compatible with the new version - *EXCEPTION*
> 3. Use latest date apis - *CORRECTED*
>
> Note: apache spark also face the same issue
> https://issues.apache.org/jira/browse/SPARK-30668
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-30668&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136779324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yYKfiVbSW%2FbfD5V1leqB8cH349Qb6FzYtSn5ClcZrqc%3D&reserved=0>
>
>
>
> Hive jira - https://issues.apache.org/jira/browse/HIVE-25576
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-25576&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136789283%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=W9ZZoPgtBeA69eF%2FonPtdXdp15PG4%2F1M6rc99G%2BErcc%3D&reserved=0>
>
>
>
>
> Thanks
>
> Ashish Sharma
>