You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/12/14 14:02:00 UTC

[jira] [Commented] (IMPALA-12322) return wrong timestamp when scan kudu timestamp with timezone

    [ https://issues.apache.org/jira/browse/IMPALA-12322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796761#comment-17796761 ] 

ASF subversion and git services commented on IMPALA-12322:
----------------------------------------------------------

Commit 3af193022916e42c33d6eafafb6f9560a0789895 in impala's branch refs/heads/master from Eyizoha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3af193022 ]

IMPALA-12322: Support converting UTC timestamps read from Kudu to local time

This patch adds a query option 'convert_kudu_utc_timestamps' similar to
'convert_legacy_hive_parquet_utc_timestamps'. When enabled, it converts
UTC timestamps read from Kudu to local timestamps.

The corresponding modification also include predicate pushdown and
runtime filter. Due to the ambiguity of timestamps caused by daylight
saving time changes, it is difficult to resolve in the bloom filter.
This patch additionally introduces a query option
'disable_kudu_local_timestamp_bloom_filter' to default disable the Kudu
timestamp bloom filter after enabling time zone conversion in order to
avoid erroneously filtering out data. However, for regions that do not
observe daylight saving time, it can be set to false to re-enable the
Kudu local timestamp bloom filter.

Testing:
- Add TestKuduTimestampConvert in query_test/test_kudu.py
Perform end-to-end testing in a custom cluster, including basic Kudu UTC
timestamp conversion testing, as well as checking if related predicate
pushdown and runtime filters are working correctly (even with timestamps
involving daylight saving time conversions).

Change-Id: I9a1e7a13e617cc18deef14289cf9b958588397d3
Reviewed-on: http://gerrit.cloudera.org:8080/20681
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
Tested-by: Csaba Ringhofer <cs...@cloudera.com>


> return wrong timestamp when scan kudu timestamp with timezone
> -------------------------------------------------------------
>
>                 Key: IMPALA-12322
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12322
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 4.1.1
>         Environment: impala 4.1.1
>            Reporter: daicheng
>            Assignee: Ye Zihao
>            Priority: Major
>         Attachments: image-2022-04-24-00-01-05-746-1.png, image-2022-04-24-00-01-05-746.png, image-2022-04-24-00-01-37-520.png, image-2022-04-24-00-03-14-467-1.png, image-2022-04-24-00-03-14-467.png, image-2022-04-24-00-04-16-240-1.png, image-2022-04-24-00-04-16-240.png, image-2022-04-24-00-04-52-860-1.png, image-2022-04-24-00-04-52-860.png, image-2022-04-24-00-05-52-086-1.png, image-2022-04-24-00-05-52-086.png, image-2022-04-24-00-07-09-776-1.png, image-2022-04-24-00-07-09-776.png, image-2023-07-28-20-31-09-457.png, image-2023-07-28-22-27-38-521.png, image-2023-07-28-22-29-40-083.png, image-2023-07-28-22-36-17-460.png, image-2023-07-28-22-36-37-884.png, image-2023-07-28-22-38-19-728.png
>
>
> impala version is 3.1.0-cdh6.1
> i have set system timezone=Asia/Shanghai:
> !image-2022-04-24-00-01-37-520.png!
> !image-2022-04-24-00-01-05-746.png!
> here is the bug:
> *step 1*
> i have parquet file with two columns like below,and read it with impala-shell and spark (timezone=shanghai)
> !image-2022-04-24-00-03-14-467.png|width=1016,height=154!
> !image-2022-04-24-00-04-16-240.png|width=944,height=367!
> the result both exactly right。
> *step two*
> create kudu table  with impala-shell:
> CREATE TABLE default.test_{_}test{_}_test_time2 (id BIGINT,t TIMESTAMP,PRIMARY KEY (id) ) STORED AS KUDU;
> note: kudu version:1.8
> and  insert 2 row into the table with spark :
> !image-2022-04-24-00-04-52-860.png|width=914,height=279!
> *stop 3*
> read it with spark (timezone=shanghai),spark read kudu table with kudu-client api,here is the result:
> !image-2022-04-24-00-05-52-086.png|width=914,height=301!
> the result is still exactly right。
> but read it with impala-shell: 
> !image-2022-04-24-00-07-09-776.png|width=915,height=154!
> the result show late 8hour
> *conclusion*
>    it seems like impala timezone didn't work when kudu column type is timestamp, but it work fine in parquet file,I don't know why?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org