You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2023/01/19 17:15:20 UTC

[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #6627: Docs: Update spark SQL examples for time travel to branches and tags

amogh-jahagirdar opened a new pull request, #6627:
URL: https://github.com/apache/iceberg/pull/6627

   Follow up to https://github.com/apache/iceberg/pull/6575/files, this change updates docs with examples of VERSION AS OF time travel for branches and tags, as well as some important notes.
   
   Cc: @ajantha-bhat @nastra @jackye1995 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081834635


##########
docs/spark-queries.md:
##########
@@ -95,21 +95,39 @@ The above list is in order of priority. For example: a matching catalog will tak
 
 #### SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can be used for time traveling to a specific snapshot via the following options:
+
+1. Snapshot ID
+2. Head of a branch
+3. Tagged snapshot
+
+Note: If the name of a branch or tag is the same as a snapshot ID, then the snapshot which is selected for time travel is the snapshot

Review Comment:
   nit: use `{{< hint info >}}` for notes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1082918805


##########
docs/spark-queries.md:
##########
@@ -95,21 +95,37 @@ The above list is in order of priority. For example: a matching catalog will tak
 
 #### SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can contain a long snapshot ID or a string branch or tag name.
+
+{{< hint info >}}
+Note: If the name of a branch or tag is the same as a snapshot ID, then the snapshot which is selected for time travel is the snapshot
+with the given snapshot ID. For example, consider the case where there is a tag named '1' and it references snapshot with ID 2. 
+If the version travel clause is `VERSION AS OF '1'`, time travel will be done to the snapshot with ID 1. 
+If this is not desired, rename the tag or branch with a well-defined prefix such as 'snapshot-1'.
+{{< /hint >}}
 
 ```sql 
 -- time travel to October 26, 1986 at 01:21:00
 SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00';
 
 -- time travel to snapshot with id 10963874102873L
 SELECT * FROM prod.db.table VERSION AS OF 10963874102873;
+
+-- time travel to the head snapshot of audit-branch
+SELECT * FROM prod.db.table VERSION AS OF 'audit-branch';
+
+-- time travel to the snapshot referenced by the tag historical-snapshot
+SELECT * FROM prod.db.table VERSION AS OF 'historical-snapshot';
 ```
 
 In addition, `FOR SYSTEM_TIME AS OF` and `FOR SYSTEM_VERSION AS OF` clauses are also supported:
 
 ```sql
 SELECT * FROM prod.db.table FOR SYSTEM_TIME AS OF '1986-10-26 01:21:00';
 SELECT * FROM prod.db.table FOR SYSTEM_VERSION AS OF 10963874102873;
+SELECT * FROM prod.db.table FOR SYSTEM_VERSION AS OF 'audit-branch';
+SELECT * FROM prod.db.table FOR SYSTEM_VERSION AS OF 'historical-snapshot';

Review Comment:
   Tags can't be combined with as-of-timestamp since tags can only reference a single snapshot. Sure I updated the docs to make it more clear, thanks! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081982200


##########
docs/spark-queries.md:
##########
@@ -95,21 +95,37 @@ The above list is in order of priority. For example: a matching catalog will tak
 
 #### SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can contain a long snapshot ID or a string branch or tag name.
+
+{{< hint info >}}
+Note: If the name of a branch or tag is the same as a snapshot ID, then the snapshot which is selected for time travel is the snapshot
+with the given snapshot ID. For example, consider the case where there is a tag named '1' and it references snapshot 2. 

Review Comment:
   nit: and it references snapshot with ID 2.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081835641


##########
docs/spark-queries.md:
##########
@@ -95,21 +95,39 @@ The above list is in order of priority. For example: a matching catalog will tak
 
 #### SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can be used for time traveling to a specific snapshot via the following options:

Review Comment:
   the wording seems a bit overly complex. What about we talk about `VERSION AS OF` with the data type? If it is long, then it is snapshot ID. If it is string, then it follows what is described below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081978533


##########
docs/spark-queries.md:
##########
@@ -95,21 +95,39 @@ The above list is in order of priority. For example: a matching catalog will tak
 
 #### SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can be used for time traveling to a specific snapshot via the following options:

Review Comment:
   Sure this seems reasonable and should be clear for readers 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

Posted by "yyanyy (via GitHub)" <gi...@apache.org>.
yyanyy commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1082937202


##########
docs/spark-queries.md:
##########
@@ -95,21 +95,37 @@ The above list is in order of priority. For example: a matching catalog will tak
 
 #### SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can contain a long snapshot ID or a string branch or tag name.
+
+{{< hint info >}}
+Note: If the name of a branch or tag is the same as a snapshot ID, then the snapshot which is selected for time travel is the snapshot
+with the given snapshot ID. For example, consider the case where there is a tag named '1' and it references snapshot with ID 2. 
+If the version travel clause is `VERSION AS OF '1'`, time travel will be done to the snapshot with ID 1. 
+If this is not desired, rename the tag or branch with a well-defined prefix such as 'snapshot-1'.

Review Comment:
   nit: do we have doc/can we link to the instructions for renaming?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1081982454


##########
docs/spark-queries.md:
##########
@@ -95,21 +95,37 @@ The above list is in order of priority. For example: a matching catalog will tak
 
 #### SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can contain a long snapshot ID or a string branch or tag name.
+
+{{< hint info >}}
+Note: If the name of a branch or tag is the same as a snapshot ID, then the snapshot which is selected for time travel is the snapshot
+with the given snapshot ID. For example, consider the case where there is a tag named '1' and it references snapshot 2. 
+If the time travel clause is `VERSION AS OF '1'` time travel will be done to the snapshot with id 1. 

Review Comment:
   nit: version travel clause



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#discussion_r1082026758


##########
docs/spark-queries.md:
##########
@@ -95,21 +95,37 @@ The above list is in order of priority. For example: a matching catalog will tak
 
 #### SQL
 
-Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses
+Spark 3.3 and later supports time travel in SQL queries using `TIMESTAMP AS OF` or `VERSION AS OF` clauses.
+The `VERSION AS OF` clause can contain a long snapshot ID or a string branch or tag name.
+
+{{< hint info >}}
+Note: If the name of a branch or tag is the same as a snapshot ID, then the snapshot which is selected for time travel is the snapshot
+with the given snapshot ID. For example, consider the case where there is a tag named '1' and it references snapshot with ID 2. 
+If the version travel clause is `VERSION AS OF '1'`, time travel will be done to the snapshot with ID 1. 
+If this is not desired, rename the tag or branch with a well-defined prefix such as 'snapshot-1'.
+{{< /hint >}}
 
 ```sql 
 -- time travel to October 26, 1986 at 01:21:00
 SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00';
 
 -- time travel to snapshot with id 10963874102873L
 SELECT * FROM prod.db.table VERSION AS OF 10963874102873;
+
+-- time travel to the head snapshot of audit-branch
+SELECT * FROM prod.db.table VERSION AS OF 'audit-branch';
+
+-- time travel to the snapshot referenced by the tag historical-snapshot
+SELECT * FROM prod.db.table VERSION AS OF 'historical-snapshot';
 ```
 
 In addition, `FOR SYSTEM_TIME AS OF` and `FOR SYSTEM_VERSION AS OF` clauses are also supported:
 
 ```sql
 SELECT * FROM prod.db.table FOR SYSTEM_TIME AS OF '1986-10-26 01:21:00';
 SELECT * FROM prod.db.table FOR SYSTEM_VERSION AS OF 10963874102873;
+SELECT * FROM prod.db.table FOR SYSTEM_VERSION AS OF 'audit-branch';
+SELECT * FROM prod.db.table FOR SYSTEM_VERSION AS OF 'historical-snapshot';

Review Comment:
   lines 145 and 146 are still ambiguous to me. 
   
   > * `branch` selects the head snapshot of the specified branch. Note that currently branch cannot be combined with as-of-timestamp.
   * `tag` selects the snapshot associated with the specified tag
   
   Does that mean tags support `as-of-timestamp`? No right? Maybe we need to update it for tags too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on PR #6627:
URL: https://github.com/apache/iceberg/pull/6627#issuecomment-1398780784

   Thanks for the update and reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 merged pull request #6627: Docs: Update spark SQL examples for time travel to branches and tags

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 merged PR #6627:
URL: https://github.com/apache/iceberg/pull/6627


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org