You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2023/01/13 13:26:39 UTC

[GitHub] [iceberg] nastra commented on a diff in pull request #6575: Spark 3.3: support version travel by reference name

nastra commented on code in PR #6575:
URL: https://github.com/apache/iceberg/pull/6575#discussion_r1069451597


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java:
##########
@@ -231,6 +233,68 @@ public void testVersionAsOf() {
     assertEquals("Snapshot at specific ID " + snapshotId, expected, fromDF);
   }
 
+  @Test
+  public void testTagReferenceAsOf() {
+    Table table = validationCatalog.loadTable(tableIdent);
+    long snapshotId = table.currentSnapshot().snapshotId();
+    table.manageSnapshots().createTag("test_tag", snapshotId).commit();
+
+    // create a second snapshot, read the table at the snapshot
+    List<Object[]> expected = sql("SELECT * FROM %s", tableName);
+    sql("INSERT INTO %s VALUES (4, 'd', 4.0), (5, 'e', 5.0)", tableName);
+    List<Object[]> actual1 = sql("SELECT * FROM %s VERSION AS OF 'test_tag'", tableName);
+    assertEquals("Snapshot at specific tag reference name", expected, actual1);
+
+    // read the table at the snapshot
+    // HIVE time travel syntax
+    List<Object[]> actual2 = sql("SELECT * FROM %s FOR SYSTEM_VERSION AS OF 'test_tag'", tableName);
+    assertEquals("Snapshot at specific tag reference name", expected, actual2);
+
+    // read the table using DataFrameReader option: branch
+    Dataset<Row> df =
+        spark.read().format("iceberg").option(SparkReadOptions.TAG, "test_tag").load(tableName);
+    List<Object[]> fromDF = rowsToJava(df.collectAsList());
+    assertEquals("Snapshot at specific tag reference name", expected, fromDF);
+  }
+
+  @Test
+  public void testBranchReferenceAsOf() {
+    Table table = validationCatalog.loadTable(tableIdent);
+    long snapshotId = table.currentSnapshot().snapshotId();
+    table.manageSnapshots().createBranch("test_branch", snapshotId).commit();
+
+    // create a second snapshot, read the table at the snapshot
+    List<Object[]> expected = sql("SELECT * FROM %s", tableName);
+    sql("INSERT INTO %s VALUES (4, 'd', 4.0), (5, 'e', 5.0)", tableName);
+    List<Object[]> actual1 = sql("SELECT * FROM %s VERSION AS OF 'test_branch'", tableName);
+    assertEquals("Snapshot at specific branch reference name", expected, actual1);
+
+    // read the table at the snapshot
+    // HIVE time travel syntax
+    List<Object[]> actual2 =
+        sql("SELECT * FROM %s FOR SYSTEM_VERSION AS OF 'test_branch'", tableName);
+    assertEquals("Snapshot at specific branch reference name", expected, actual2);
+
+    // read the table using DataFrameReader option: branch
+    Dataset<Row> df =
+        spark
+            .read()
+            .format("iceberg")
+            .option(SparkReadOptions.BRANCH, "test_branch")
+            .load(tableName);
+    List<Object[]> fromDF = rowsToJava(df.collectAsList());
+    assertEquals("Snapshot at specific branch reference name", expected, fromDF);
+  }
+
+  @Test
+  public void testUnknownReferenceAsOf() {
+    AssertHelpers.assertThrows(
+        "Cannot find matching snapshot ID or reference name for version",

Review Comment:
   nit: would be great to use `Assertions.assertThatThrownBy(...)` here. Then you wouldn't need to have to specify the failure reason here because it would be obvious from when the assertion would fail



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org