You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2023/01/12 22:13:53 UTC

[GitHub] [iceberg] jackye1995 opened a new pull request, #6575: Spark 3.3: support version travel by reference name

jackye1995 opened a new pull request, #6575:
URL: https://github.com/apache/iceberg/pull/6575

   Similar to the Trino PR I am trying to push https://github.com/trinodb/trino/pull/15646, support using reference name for the `VERSION AS OF` clause.
   
   We have a related PR https://github.com/apache/iceberg/pull/5294 by @hililiwei that tries to directly add the reference info in table name, while we are consolidating that experience, I think we can first have this feature added in parallel.
   
   @amogh-jahagirdar 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1073797662


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep
           sparkTable.snapshotId() == null,
           "Cannot do time-travel based on both table identifier and AS OF");
 
-      return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      try {

Review Comment:
   Currently there's no restrictions on what references can be named. For the lookup code, I think we should always be able to differentiate between snapshot ID and ref since for refs it will be in a quoted identifier, and should always fail the Long.parseLong() with a NumberParseException. So the current implementation seems good to me.
   
   But that's just me reading the code :), I think it's worth having a unit test just for this case to give us that confidence that it works as expected in this scenario. cc @jackye1995 let me know your thoughts



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 merged pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
jackye1995 merged PR #6575:
URL: https://github.com/apache/iceberg/pull/6575


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1073797220


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep
           sparkTable.snapshotId() == null,
           "Cannot do time-travel based on both table identifier and AS OF");
 
-      return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      try {

Review Comment:
   Currently there's no restrictions on what references can be named. For the lookup code, since the ref is in  I think we should always be able to differentiate between snapshot ID and ref since for refs it will be in a quoted identifier, and should always fail the Long.parseLong(). 
   
   But that's just me reading the code, I think it's worth having a unit test just for this case to give us that confidence that it works as expected in this scenario. cc @jackye1995 let me know your thoughts



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1073797662


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep
           sparkTable.snapshotId() == null,
           "Cannot do time-travel based on both table identifier and AS OF");
 
-      return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      try {

Review Comment:
   Currently there's no restrictions on what references can be named. For the lookup code, I think we should always be able to differentiate between snapshot ID and ref since for refs it will be in a quoted identifier, and should always fail the Long.parseLong(). 
   
   But that's just me reading the code :), I think it's worth having a unit test just for this case to give us that confidence that it works as expected in this scenario. cc @jackye1995 let me know your thoughts



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1073797662


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep
           sparkTable.snapshotId() == null,
           "Cannot do time-travel based on both table identifier and AS OF");
 
-      return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      try {

Review Comment:
   Currently there's no restrictions on what references can be named. For the lookup code, I think we should always be able to differentiate between snapshot ID and ref since for refs it will be in a quoted identifier, and should always fail the Long.parseLong() with a NumberParseException. 
   
   But that's just me reading the code :), I think it's worth having a unit test just for this case to give us that confidence that it works as expected in this scenario. cc @jackye1995 let me know your thoughts



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#issuecomment-1382632711

   @aokolnychyi @RussellSpitzer @rdblue any opinions about this support?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#issuecomment-1387520958

   > we also need to update this SQL syntax in https://github.com/apache/iceberg/blob/master/docs/spark-queries.md#sql
   
   I think @amogh-jahagirdar is working on a more detailed doc for branching and tagging, I will leave this part to him 😝 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1069451597


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java:
##########
@@ -231,6 +233,68 @@ public void testVersionAsOf() {
     assertEquals("Snapshot at specific ID " + snapshotId, expected, fromDF);
   }
 
+  @Test
+  public void testTagReferenceAsOf() {
+    Table table = validationCatalog.loadTable(tableIdent);
+    long snapshotId = table.currentSnapshot().snapshotId();
+    table.manageSnapshots().createTag("test_tag", snapshotId).commit();
+
+    // create a second snapshot, read the table at the snapshot
+    List<Object[]> expected = sql("SELECT * FROM %s", tableName);
+    sql("INSERT INTO %s VALUES (4, 'd', 4.0), (5, 'e', 5.0)", tableName);
+    List<Object[]> actual1 = sql("SELECT * FROM %s VERSION AS OF 'test_tag'", tableName);
+    assertEquals("Snapshot at specific tag reference name", expected, actual1);
+
+    // read the table at the snapshot
+    // HIVE time travel syntax
+    List<Object[]> actual2 = sql("SELECT * FROM %s FOR SYSTEM_VERSION AS OF 'test_tag'", tableName);
+    assertEquals("Snapshot at specific tag reference name", expected, actual2);
+
+    // read the table using DataFrameReader option: branch
+    Dataset<Row> df =
+        spark.read().format("iceberg").option(SparkReadOptions.TAG, "test_tag").load(tableName);
+    List<Object[]> fromDF = rowsToJava(df.collectAsList());
+    assertEquals("Snapshot at specific tag reference name", expected, fromDF);
+  }
+
+  @Test
+  public void testBranchReferenceAsOf() {
+    Table table = validationCatalog.loadTable(tableIdent);
+    long snapshotId = table.currentSnapshot().snapshotId();
+    table.manageSnapshots().createBranch("test_branch", snapshotId).commit();
+
+    // create a second snapshot, read the table at the snapshot
+    List<Object[]> expected = sql("SELECT * FROM %s", tableName);
+    sql("INSERT INTO %s VALUES (4, 'd', 4.0), (5, 'e', 5.0)", tableName);
+    List<Object[]> actual1 = sql("SELECT * FROM %s VERSION AS OF 'test_branch'", tableName);
+    assertEquals("Snapshot at specific branch reference name", expected, actual1);
+
+    // read the table at the snapshot
+    // HIVE time travel syntax
+    List<Object[]> actual2 =
+        sql("SELECT * FROM %s FOR SYSTEM_VERSION AS OF 'test_branch'", tableName);
+    assertEquals("Snapshot at specific branch reference name", expected, actual2);
+
+    // read the table using DataFrameReader option: branch
+    Dataset<Row> df =
+        spark
+            .read()
+            .format("iceberg")
+            .option(SparkReadOptions.BRANCH, "test_branch")
+            .load(tableName);
+    List<Object[]> fromDF = rowsToJava(df.collectAsList());
+    assertEquals("Snapshot at specific branch reference name", expected, fromDF);
+  }
+
+  @Test
+  public void testUnknownReferenceAsOf() {
+    AssertHelpers.assertThrows(
+        "Cannot find matching snapshot ID or reference name for version",

Review Comment:
   nit: would be great to use `Assertions.assertThatThrownBy(...)` here. Then you wouldn't need to have to specify the failure reason here because it would be obvious from when the assertion would fail



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1070237113


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java:
##########
@@ -231,6 +234,67 @@ public void testVersionAsOf() {
     assertEquals("Snapshot at specific ID " + snapshotId, expected, fromDF);
   }
 
+  @Test
+  public void testTagReferenceAsOf() {
+    Table table = validationCatalog.loadTable(tableIdent);
+    long snapshotId = table.currentSnapshot().snapshotId();
+    table.manageSnapshots().createTag("test_tag", snapshotId).commit();
+
+    // create a second snapshot, read the table at the snapshot
+    List<Object[]> expected = sql("SELECT * FROM %s", tableName);
+    sql("INSERT INTO %s VALUES (4, 'd', 4.0), (5, 'e', 5.0)", tableName);
+    List<Object[]> actual1 = sql("SELECT * FROM %s VERSION AS OF 'test_tag'", tableName);
+    assertEquals("Snapshot at specific tag reference name", expected, actual1);
+
+    // read the table at the snapshot
+    // HIVE time travel syntax
+    List<Object[]> actual2 = sql("SELECT * FROM %s FOR SYSTEM_VERSION AS OF 'test_tag'", tableName);
+    assertEquals("Snapshot at specific tag reference name", expected, actual2);
+
+    // read the table using DataFrameReader option: branch
+    Dataset<Row> df =
+        spark.read().format("iceberg").option(SparkReadOptions.TAG, "test_tag").load(tableName);
+    List<Object[]> fromDF = rowsToJava(df.collectAsList());
+    assertEquals("Snapshot at specific tag reference name", expected, fromDF);
+  }
+
+  @Test
+  public void testBranchReferenceAsOf() {
+    Table table = validationCatalog.loadTable(tableIdent);
+    long snapshotId = table.currentSnapshot().snapshotId();
+    table.manageSnapshots().createBranch("test_branch", snapshotId).commit();
+
+    // create a second snapshot, read the table at the snapshot
+    List<Object[]> expected = sql("SELECT * FROM %s", tableName);
+    sql("INSERT INTO %s VALUES (4, 'd', 4.0), (5, 'e', 5.0)", tableName);
+    List<Object[]> actual1 = sql("SELECT * FROM %s VERSION AS OF 'test_branch'", tableName);
+    assertEquals("Snapshot at specific branch reference name", expected, actual1);
+
+    // read the table at the snapshot
+    // HIVE time travel syntax
+    List<Object[]> actual2 =
+        sql("SELECT * FROM %s FOR SYSTEM_VERSION AS OF 'test_branch'", tableName);
+    assertEquals("Snapshot at specific branch reference name", expected, actual2);
+
+    // read the table using DataFrameReader option: branch
+    Dataset<Row> df =
+        spark
+            .read()
+            .format("iceberg")
+            .option(SparkReadOptions.BRANCH, "test_branch")
+            .load(tableName);
+    List<Object[]> fromDF = rowsToJava(df.collectAsList());
+    assertEquals("Snapshot at specific branch reference name", expected, fromDF);
+  }
+
+  @Test
+  public void testUnknownReferenceAsOf() {
+    Assertions.assertThatThrownBy(
+            () -> sql("SELECT * FROM %s VERSION AS OF 'test_unknown'", tableName))
+        .as("Cannot find matching snapshot ID or reference name for version")

Review Comment:
   I think you rather want to have `hasMessage(..)` or `hasMessageContaining(...)` here instead of `as(..)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1069798452


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep
           sparkTable.snapshotId() == null,
           "Cannot do time-travel based on both table identifier and AS OF");
 
-      return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      try {
+        return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      } catch (NumberFormatException e) {
+        SnapshotRef ref = sparkTable.table().refs().get(version);
+        ValidationException.check(
+            ref != null,
+            "Cannot find matching snapshot ID or reference name for version " + version);
+        return sparkTable.copyWithSnapshotId(ref.snapshotId());

Review Comment:
   I think one thing that might be useful is to time travel in a branch, something like `FOR SYSTEM_VERSION AS OF branchName@123456789012`. But that feels very hacky, I' rather have some syntax as we have been suggesting like `SELECT * FROM table BRANCH branch FOR SYSTEM_TIME AS OF xxx`. So I am leaving that part out of the implementation for now. At least I think most people can agree that a tag/branch head can be viewed as a version to travel to.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1069789840


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java:
##########
@@ -231,6 +233,68 @@ public void testVersionAsOf() {
     assertEquals("Snapshot at specific ID " + snapshotId, expected, fromDF);
   }
 
+  @Test
+  public void testTagReferenceAsOf() {
+    Table table = validationCatalog.loadTable(tableIdent);
+    long snapshotId = table.currentSnapshot().snapshotId();
+    table.manageSnapshots().createTag("test_tag", snapshotId).commit();
+
+    // create a second snapshot, read the table at the snapshot
+    List<Object[]> expected = sql("SELECT * FROM %s", tableName);
+    sql("INSERT INTO %s VALUES (4, 'd', 4.0), (5, 'e', 5.0)", tableName);
+    List<Object[]> actual1 = sql("SELECT * FROM %s VERSION AS OF 'test_tag'", tableName);
+    assertEquals("Snapshot at specific tag reference name", expected, actual1);
+
+    // read the table at the snapshot
+    // HIVE time travel syntax
+    List<Object[]> actual2 = sql("SELECT * FROM %s FOR SYSTEM_VERSION AS OF 'test_tag'", tableName);
+    assertEquals("Snapshot at specific tag reference name", expected, actual2);
+
+    // read the table using DataFrameReader option: branch
+    Dataset<Row> df =
+        spark.read().format("iceberg").option(SparkReadOptions.TAG, "test_tag").load(tableName);
+    List<Object[]> fromDF = rowsToJava(df.collectAsList());
+    assertEquals("Snapshot at specific tag reference name", expected, fromDF);
+  }
+
+  @Test
+  public void testBranchReferenceAsOf() {
+    Table table = validationCatalog.loadTable(tableIdent);
+    long snapshotId = table.currentSnapshot().snapshotId();
+    table.manageSnapshots().createBranch("test_branch", snapshotId).commit();
+
+    // create a second snapshot, read the table at the snapshot
+    List<Object[]> expected = sql("SELECT * FROM %s", tableName);
+    sql("INSERT INTO %s VALUES (4, 'd', 4.0), (5, 'e', 5.0)", tableName);
+    List<Object[]> actual1 = sql("SELECT * FROM %s VERSION AS OF 'test_branch'", tableName);
+    assertEquals("Snapshot at specific branch reference name", expected, actual1);
+
+    // read the table at the snapshot
+    // HIVE time travel syntax
+    List<Object[]> actual2 =
+        sql("SELECT * FROM %s FOR SYSTEM_VERSION AS OF 'test_branch'", tableName);
+    assertEquals("Snapshot at specific branch reference name", expected, actual2);
+
+    // read the table using DataFrameReader option: branch
+    Dataset<Row> df =
+        spark
+            .read()
+            .format("iceberg")
+            .option(SparkReadOptions.BRANCH, "test_branch")
+            .load(tableName);
+    List<Object[]> fromDF = rowsToJava(df.collectAsList());
+    assertEquals("Snapshot at specific branch reference name", expected, fromDF);
+  }
+
+  @Test
+  public void testUnknownReferenceAsOf() {
+    AssertHelpers.assertThrows(
+        "Cannot find matching snapshot ID or reference name for version",

Review Comment:
   sure, I can update to use that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1073797662


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep
           sparkTable.snapshotId() == null,
           "Cannot do time-travel based on both table identifier and AS OF");
 
-      return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      try {

Review Comment:
   Currently there's no restrictions on what references can be named. For the lookup code, I think we should always be able to differentiate between snapshot ID and ref since for refs it will be in a quoted identifier, and should always fail the Long.parseLong(). 
   
   But that's just me reading the code, I think it's worth having a unit test just for this case to give us that confidence that it works as expected in this scenario. cc @jackye1995 let me know your thoughts



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#issuecomment-1396243591

   Thanks for the review everyone, I think there are enough votes and all concerns are addressed, I will go ahead merging this PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1069026706


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep
           sparkTable.snapshotId() == null,
           "Cannot do time-travel based on both table identifier and AS OF");
 
-      return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      try {
+        return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      } catch (NumberFormatException e) {
+        SnapshotRef ref = sparkTable.table().refs().get(version);
+        ValidationException.check(
+            ref != null,
+            "Cannot find matching snapshot ID or reference name for version " + version);
+        return sparkTable.copyWithSnapshotId(ref.snapshotId());

Review Comment:
   Never mind. 
   After thinking a bit more about it and reviewing #6573, As the snapshot log contains all the snapshots from all the branches/tags. If we want to use any particular snapshot, we can directly use snapshot-id without specifying branch/tag information. So, no need of `snapshotId@refName` syntax



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1069013177


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep
           sparkTable.snapshotId() == null,
           "Cannot do time-travel based on both table identifier and AS OF");
 
-      return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      try {
+        return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      } catch (NumberFormatException e) {
+        SnapshotRef ref = sparkTable.table().refs().get(version);
+        ValidationException.check(
+            ref != null,
+            "Cannot find matching snapshot ID or reference name for version " + version);
+        return sparkTable.copyWithSnapshotId(ref.snapshotId());

Review Comment:
   It always use the latest commit from the reference. 
   I think we also need to provide a way to time travel to a snapshot within a branch/tag? 
   
   So along with existing of `FOR SYSTEM_VERSION AS OF snapshotId`
   we should have `FOR SYSTEM_VERSION AS OF snapshotId@refName`
   
   But whether to use '@' or some other syntax is an open point for a long time which @rdblue wanted to conclude.  
   
   Nessie SQL syntax for reference: 
   https://projectnessie.org/tools/sql/#grammar



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#issuecomment-1383390015

   we also need to update this SQL syntax in 
   https://github.com/apache/iceberg/blob/master/docs/spark-queries.md#sql


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1073797662


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep
           sparkTable.snapshotId() == null,
           "Cannot do time-travel based on both table identifier and AS OF");
 
-      return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      try {

Review Comment:
   Currently there's no restrictions on what references can be named. For the lookup code, since the ref is in  I think we should always be able to differentiate between snapshot ID and ref since for refs it will be in a quoted identifier, and should always fail the Long.parseLong(). 
   
   But that's just me reading the code, I think it's worth having a unit test just for this case to give us that confidence that it works as expected in this scenario. cc @jackye1995 let me know your thoughts



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1073920699


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep
           sparkTable.snapshotId() == null,
           "Cannot do time-travel based on both table identifier and AS OF");
 
-      return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      try {

Review Comment:
   I added a test case specifically for this. Unlike Trino, Spark directly ignores the type of the `VERSION AS OF`, so if a tag name matches exactly the snapshot ID, then snapshot ID is always chosen.
   
   I think this is a okay limitation, because people can work around it by adding some text like `snapshot-123456890` as the tag name. But we should make it very clear in documentation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1073926889


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep
           sparkTable.snapshotId() == null,
           "Cannot do time-travel based on both table identifier and AS OF");
 
-      return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      try {

Review Comment:
   Yeah I don't want this to be a blocker, just something to take note of.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1070263376


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -159,7 +160,15 @@ public Table loadTable(Identifier ident, String version) throws NoSuchTableExcep
           sparkTable.snapshotId() == null,
           "Cannot do time-travel based on both table identifier and AS OF");
 
-      return sparkTable.copyWithSnapshotId(Long.parseLong(version));
+      try {

Review Comment:
   I'm not sure this can come up, but do we allow version tags to be SnapshotIds? 
   
   Like can I tag snapshot 2 to be known as 1?
   
   Weird edge case so I don't think we really need to handle it, just thinking if this is a potential issue with the lookup code here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org