You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/09/01 18:05:00 UTC

[jira] [Commented] (IMPALA-9773) Kudu temporal (AS OF) queries

    [ https://issues.apache.org/jira/browse/IMPALA-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408309#comment-17408309 ] 

ASF subversion and git services commented on IMPALA-9773:
---------------------------------------------------------

Commit 4f9f8c33cade430cec07974e00dc1d9031e9609f in impala's branch refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4f9f8c3 ]

IMPALA-10840: Add support for "FOR SYSTEM_TIME AS OF" and "FOR SYSTEM_VERSION AS OF" for Iceberg tables

This patch adds support "FOR SYSTEM_TIME AS OF" and
"FOR SYSTEM_VERSION AS OF" clauses for Iceberg tables. The new
clauses are part of the table ref. FOR SYSTEM_TIME AS OF conforms to the
SQL2011 standard:
https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf

With FOR SYSTEM_TIME AS OF we can query a table at a specific time
point, e.g. we can retrieve what was the table content 1 day ago.

The timestamp given to "FOR SYSTEM_TIME AS OF" is interpreted in the
local timezone. The local timezone can be set via the query option
TIMEZONE. By default the timezone being used is the coordinator node's
local timezone. The timestamp is translated to UTC because table
snapshots are tagged with a UTC timestamps.

"FOR SYSTEM_VERSION AS OF" is a non-standard extension. It works
similarly to FOR SYSTEM_TIME AS OF, but with this clause we can query
a table via a snapshot ID instead of a timestamp.

HIVE-25344 also added support for these clauses to Hive.

Table snapshot IDs and timestamp information can be queried with the
help of the DESCRIBE HISTORY command.

Sample queries:

 SELECT * FROM t FOR SYSTEM_TIME AS OF now();
 SELECT * FROM t FOR SYSTEM_TIME AS OF '2021-08-10 11:02:34';
 SELECT * FROM t FOR SYSTEM_TIME AS OF now() - interval 10 days + interval 3 hours;

 SELECT * FROM t FOR SYSTEM_VERSION AS OF 7080861547601448759;

 SELECT * FROM t FOR SYSTEM_TIME AS OF now()
 MINUS
 SELECT * FROM t FOR SYSTEM_TIME AS OF now() - interval 1 days;

This patch uses some parts of the in-progress
IMPALA-9773 (https://gerrit.cloudera.org/#/c/13342/) developed by
Todd Lipcon and Grant Henke. This patch also resolves some TODOs of
IMPALA-9773, i.e. after this patch it'll be easier to add
time travel for Kudu tables as well.

Testing:
 * added parser tests (ParserTest.java)
 * added analyzer tests (AnalyzeStmtsTest.java)
 * added e2e tests (test_iceberg.py)

Change-Id: Ib523c5e47b8d9c377bea39a82fe20249177cf824
Reviewed-on: http://gerrit.cloudera.org:8080/17765
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Kudu temporal (AS OF) queries
> -----------------------------
>
>                 Key: IMPALA-9773
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9773
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>            Reporter: Grant Henke
>            Priority: Major
>              Labels: kudu
>
> Kudu has the concept of snapshot scans which can be used to query the state of a table at some point in the past (up to 7 days by default). To date it is used for read consistency and backup and restore, but it can also be naturally leveraged to "time travel" and query table at some time in the past.
> A WIP patch adding a the `AS OF` syntax to Kudu queries can be found here:
> [https://gerrit.cloudera.org/#/c/13342/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org