You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "szehon-ho (via GitHub)" <gi...@apache.org> on 2023/04/06 23:58:11 UTC

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7120: Spark - Accept an `output-spec-id` that allows writing to a desired partition spec

szehon-ho commented on code in PR #7120:
URL: https://github.com/apache/iceberg/pull/7120#discussion_r1160342252


##########
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/sql/PartitionedWritesTestBase.java:
##########
@@ -183,4 +184,91 @@ public void testViewsReturnRecentResults() {
         ImmutableList.of(row(1L, "a"), row(1L, "a")),
         sql("SELECT * FROM tmp"));
   }
+
+  @Test
+  public void testWriteWithOutputSpec() throws NoSuchTableException {
+    Table table = validationCatalog.loadTable(tableIdent);
+    // Drop all records in table to have a fresh start.
+    sql("DELETE FROM %s", tableName);
+    table.refresh();
+
+    final int originalSpecId = table.spec().specId();
+    table.updateSpec().addField("data").commit();
+
+    table.refresh();
+    sql("REFRESH TABLE %s", tableName);
+
+    // By default, we write to the current spec.
+    sql("INSERT INTO TABLE %s VALUES (10, 'a')", tableName);
+
+    List<Object[]> expected = ImmutableList.of(row(10L, "a", table.spec().specId()));
+    assertEquals(
+        "Rows must match",
+        expected,
+        sql("SELECT id, data, _spec_id FROM %s WHERE id >= 10 ORDER BY id", tableName));
+
+    // Output spec ID should be respected when present.
+    List<SimpleRecord> data =
+        ImmutableList.of(new SimpleRecord(11, "b"), new SimpleRecord(12, "c"));
+    spark
+        .createDataFrame(data, SimpleRecord.class)
+        .toDF()
+        .writeTo(tableName)
+        .option("output-spec-id", Integer.toString(originalSpecId))
+        .append();
+
+    expected =
+        ImmutableList.of(
+            row(10L, "a", table.spec().specId()),
+            row(11L, "b", originalSpecId),
+            row(12L, "c", originalSpecId));
+    assertEquals(
+        "Rows must match",
+        expected,
+        sql("SELECT id, data, _spec_id FROM %s WHERE id >= 10 ORDER BY id", tableName));
+
+    // Verify that the actual partitions are written with the correct spec ID.
+    // Two of the partitions should have the original spec ID and one should have the new one.
+
+    if (!this.getClass().equals(TestPartitionedWritesToWapBranch.class)) {

Review Comment:
   Is there any issue to track this?  (can you file if not).  And also, is there an easier way to do this?  (Maybe override the test method there and don't run?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org