You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/10 18:53:06 UTC

[GitHub] [iceberg] bryanck opened a new pull request, #5235: REST: Create commit catalog handler fix

bryanck opened a new pull request, #5235:
URL: https://github.com/apache/iceberg/pull/5235

   Currently in the REST catalog handler, during a CTAS, the files will be written and associated with the partition spec ID from the initial create request. When the commit occurs, an empty table metadata object is created and updates are applied to that. However, applying the updates will increment the partition spec ID, so the files will be associated with the empty spec ID, and thus partition pruning will not be available for these files.
   
   This PR adds the ability to build on top of an empty TableMetadata so the initial IDs will match the initial files. It also moves `TableMetadata.newTableMetadata` to use the builder to instantiate a new instance so it can reuse functionality from the builder and to be consistent with `buildReplacement()`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on a diff in pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
bryanck commented on code in PR #5235:
URL: https://github.com/apache/iceberg/pull/5235#discussion_r917454647


##########
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java:
##########
@@ -1213,6 +1213,67 @@ public void testCompleteCreateTransaction() {
       Assert.assertEquals("Table location should match requested", "file:/tmp/ns/table", table.location());
     }
     assertFiles(table, FILE_A);
+    assertFilesPartitionSpec(table);
+    assertPreviousMetadataFileCount(table, 0);
+  }
+
+  @Test
+  public void testCompleteCreateTransactionMultipleSchemas() {
+    C catalog = catalog();
+
+    Map<String, String> properties = ImmutableMap.of("user", "someone", "created-at", "2022-02-25T00:38:19");
+    Transaction create = catalog.buildTable(TABLE, SCHEMA)
+        .withLocation("file:/tmp/ns/table")
+        .withPartitionSpec(SPEC)
+        .withSortOrder(WRITE_ORDER)
+        .withProperties(properties)
+        .createTransaction();
+
+    Assert.assertFalse("Table should not exist after createTransaction", catalog.tableExists(TABLE));
+
+    create.newFastAppend().appendFile(FILE_A).commit();
+
+    UpdateSchema updateSchema = create.updateSchema().addColumn("new_col", Types.LongType.get());
+    Schema newSchema = updateSchema.apply();
+    updateSchema.commit();
+
+    UpdatePartitionSpec updateSpec = create.updateSpec().addField("new_col");
+    PartitionSpec newSpec = updateSpec.apply();
+    updateSpec.commit();
+
+    ReplaceSortOrder replaceSortOrder = create.replaceSortOrder().asc("new_col");
+    SortOrder newSortOrder = replaceSortOrder.apply();
+    replaceSortOrder.commit();
+
+    DataFile anotherFile = DataFiles.builder(newSpec)
+        .withPath("/path/to/data-b.parquet")
+        .withFileSizeInBytes(10)
+        .withPartitionPath("id_bucket=0/new_col=0") // easy way to set partition data for now
+        .withRecordCount(2) // needs at least one record or else metrics will filter it out
+        .build();
+
+    create.newFastAppend().appendFile(anotherFile).commit();
+
+    Assert.assertFalse("Table should not exist after append commit", catalog.tableExists(TABLE));
+
+    create.commitTransaction();
+
+    Assert.assertTrue("Table should exist after append commit", catalog.tableExists(TABLE));
+    Table table = catalog.loadTable(TABLE);
+
+    Assert.assertEquals("Table schema should match the new schema",
+        newSchema.asStruct(), table.schema().asStruct());
+    Assert.assertEquals("Table should have create partition spec", newSpec.fields(), table.spec().fields());
+    Assert.assertEquals("Table should have create sort order", newSortOrder.fields(), table.sortOrder().fields());
+    Assert.assertEquals("Table properties should be a superset of the requested properties",
+        properties.entrySet(),
+        Sets.intersection(properties.entrySet(), table.properties().entrySet()));
+    if (!overridesRequestedLocation()) {
+      Assert.assertEquals("Table location should match requested", "file:/tmp/ns/table", table.location());
+    }
+    assertFiles(table, FILE_A, anotherFile);
+    assertFilePartitionSpec(table, FILE_A, 0);
+    assertFilePartitionSpec(table, anotherFile, 1);

Review Comment:
   I created constants for the expected IDs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5235:
URL: https://github.com/apache/iceberg/pull/5235#discussion_r917446360


##########
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java:
##########
@@ -1213,6 +1213,67 @@ public void testCompleteCreateTransaction() {
       Assert.assertEquals("Table location should match requested", "file:/tmp/ns/table", table.location());
     }
     assertFiles(table, FILE_A);
+    assertFilesPartitionSpec(table);
+    assertPreviousMetadataFileCount(table, 0);
+  }
+
+  @Test
+  public void testCompleteCreateTransactionMultipleSchemas() {
+    C catalog = catalog();
+
+    Map<String, String> properties = ImmutableMap.of("user", "someone", "created-at", "2022-02-25T00:38:19");
+    Transaction create = catalog.buildTable(TABLE, SCHEMA)
+        .withLocation("file:/tmp/ns/table")
+        .withPartitionSpec(SPEC)
+        .withSortOrder(WRITE_ORDER)
+        .withProperties(properties)
+        .createTransaction();
+
+    Assert.assertFalse("Table should not exist after createTransaction", catalog.tableExists(TABLE));
+
+    create.newFastAppend().appendFile(FILE_A).commit();
+
+    UpdateSchema updateSchema = create.updateSchema().addColumn("new_col", Types.LongType.get());
+    Schema newSchema = updateSchema.apply();
+    updateSchema.commit();
+
+    UpdatePartitionSpec updateSpec = create.updateSpec().addField("new_col");
+    PartitionSpec newSpec = updateSpec.apply();
+    updateSpec.commit();
+
+    ReplaceSortOrder replaceSortOrder = create.replaceSortOrder().asc("new_col");
+    SortOrder newSortOrder = replaceSortOrder.apply();
+    replaceSortOrder.commit();
+
+    DataFile anotherFile = DataFiles.builder(newSpec)
+        .withPath("/path/to/data-b.parquet")
+        .withFileSizeInBytes(10)
+        .withPartitionPath("id_bucket=0/new_col=0") // easy way to set partition data for now
+        .withRecordCount(2) // needs at least one record or else metrics will filter it out
+        .build();
+
+    create.newFastAppend().appendFile(anotherFile).commit();
+
+    Assert.assertFalse("Table should not exist after append commit", catalog.tableExists(TABLE));
+
+    create.commitTransaction();
+
+    Assert.assertTrue("Table should exist after append commit", catalog.tableExists(TABLE));
+    Table table = catalog.loadTable(TABLE);
+
+    Assert.assertEquals("Table schema should match the new schema",
+        newSchema.asStruct(), table.schema().asStruct());
+    Assert.assertEquals("Table should have create partition spec", newSpec.fields(), table.spec().fields());
+    Assert.assertEquals("Table should have create sort order", newSortOrder.fields(), table.sortOrder().fields());

Review Comment:
   Minor: I think it would make more sense to describe the new schema, spec, and sort order as "updated" rather than "create" because these are actually different than the created ones.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #5235:
URL: https://github.com/apache/iceberg/pull/5235


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on a diff in pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
bryanck commented on code in PR #5235:
URL: https://github.com/apache/iceberg/pull/5235#discussion_r917440086


##########
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##########
@@ -985,10 +1012,12 @@ public Builder addSnapshot(Snapshot snapshot) {
         return this;
       }
 
+      ValidationException.check(!schemas.isEmpty(), "Attempting to add a snapshot before a schema is added");
+      ValidationException.check(!specs.isEmpty(), "Attempting to add a snapshot before a partition spec is added");
+      ValidationException.check(!sortOrders.isEmpty(), "Attempting to add a snapshot before a sort order is added");

Review Comment:
   When building metadata from empty, the table specs aren't initialized. These assertions ensure they are set before adding files, so files added during a transaction line up with the table specs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on a diff in pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
bryanck commented on code in PR #5235:
URL: https://github.com/apache/iceberg/pull/5235#discussion_r917452464


##########
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##########
@@ -1212,7 +1246,7 @@ public TableMetadata build() {
           ImmutableList.copyOf(newSnapshotLog),
           ImmutableList.copyOf(metadataHistory),
           ImmutableMap.copyOf(refs),
-          discardChanges ? ImmutableList.of() : ImmutableList.copyOf(changes)
+          discardChanges || base == null ? ImmutableList.of() : ImmutableList.copyOf(changes)

Review Comment:
   I made this change



##########
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##########
@@ -985,10 +1012,12 @@ public Builder addSnapshot(Snapshot snapshot) {
         return this;
       }
 
+      ValidationException.check(!schemas.isEmpty(), "Attempting to add a snapshot before a schema is added");
+      ValidationException.check(!specs.isEmpty(), "Attempting to add a snapshot before a partition spec is added");
+      ValidationException.check(!sortOrders.isEmpty(), "Attempting to add a snapshot before a sort order is added");
       ValidationException.check(!snapshotsById.containsKey(snapshot.snapshotId()),
           "Snapshot already exists for id: %s",
           snapshot.snapshotId());
-

Review Comment:
   This is done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on a diff in pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
bryanck commented on code in PR #5235:
URL: https://github.com/apache/iceberg/pull/5235#discussion_r917452075


##########
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##########
@@ -985,10 +1012,12 @@ public Builder addSnapshot(Snapshot snapshot) {
         return this;
       }
 
+      ValidationException.check(!schemas.isEmpty(), "Attempting to add a snapshot before a schema is added");
+      ValidationException.check(!specs.isEmpty(), "Attempting to add a snapshot before a partition spec is added");
+      ValidationException.check(!sortOrders.isEmpty(), "Attempting to add a snapshot before a sort order is added");
       ValidationException.check(!snapshotsById.containsKey(snapshot.snapshotId()),
           "Snapshot already exists for id: %s",
           snapshot.snapshotId());
-

Review Comment:
   OK I'll put that back, was making spacing consistent with the validation checks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on a diff in pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
bryanck commented on code in PR #5235:
URL: https://github.com/apache/iceberg/pull/5235#discussion_r917453339


##########
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java:
##########
@@ -1213,6 +1213,67 @@ public void testCompleteCreateTransaction() {
       Assert.assertEquals("Table location should match requested", "file:/tmp/ns/table", table.location());
     }
     assertFiles(table, FILE_A);
+    assertFilesPartitionSpec(table);
+    assertPreviousMetadataFileCount(table, 0);
+  }
+
+  @Test
+  public void testCompleteCreateTransactionMultipleSchemas() {
+    C catalog = catalog();
+
+    Map<String, String> properties = ImmutableMap.of("user", "someone", "created-at", "2022-02-25T00:38:19");
+    Transaction create = catalog.buildTable(TABLE, SCHEMA)
+        .withLocation("file:/tmp/ns/table")
+        .withPartitionSpec(SPEC)
+        .withSortOrder(WRITE_ORDER)
+        .withProperties(properties)
+        .createTransaction();
+
+    Assert.assertFalse("Table should not exist after createTransaction", catalog.tableExists(TABLE));
+
+    create.newFastAppend().appendFile(FILE_A).commit();
+
+    UpdateSchema updateSchema = create.updateSchema().addColumn("new_col", Types.LongType.get());
+    Schema newSchema = updateSchema.apply();
+    updateSchema.commit();
+
+    UpdatePartitionSpec updateSpec = create.updateSpec().addField("new_col");
+    PartitionSpec newSpec = updateSpec.apply();
+    updateSpec.commit();
+
+    ReplaceSortOrder replaceSortOrder = create.replaceSortOrder().asc("new_col");
+    SortOrder newSortOrder = replaceSortOrder.apply();
+    replaceSortOrder.commit();
+
+    DataFile anotherFile = DataFiles.builder(newSpec)
+        .withPath("/path/to/data-b.parquet")
+        .withFileSizeInBytes(10)
+        .withPartitionPath("id_bucket=0/new_col=0") // easy way to set partition data for now
+        .withRecordCount(2) // needs at least one record or else metrics will filter it out
+        .build();
+
+    create.newFastAppend().appendFile(anotherFile).commit();
+
+    Assert.assertFalse("Table should not exist after append commit", catalog.tableExists(TABLE));
+
+    create.commitTransaction();
+
+    Assert.assertTrue("Table should exist after append commit", catalog.tableExists(TABLE));
+    Table table = catalog.loadTable(TABLE);
+
+    Assert.assertEquals("Table schema should match the new schema",
+        newSchema.asStruct(), table.schema().asStruct());
+    Assert.assertEquals("Table should have create partition spec", newSpec.fields(), table.spec().fields());
+    Assert.assertEquals("Table should have create sort order", newSortOrder.fields(), table.sortOrder().fields());
+    Assert.assertEquals("Table properties should be a superset of the requested properties",
+        properties.entrySet(),
+        Sets.intersection(properties.entrySet(), table.properties().entrySet()));
+    if (!overridesRequestedLocation()) {
+      Assert.assertEquals("Table location should match requested", "file:/tmp/ns/table", table.location());
+    }
+    assertFiles(table, FILE_A, anotherFile);
+    assertFilePartitionSpec(table, FILE_A, 0);
+    assertFilePartitionSpec(table, anotherFile, 1);

Review Comment:
   The spec ID isn't incremented at this point, so `newSpec.specId()` is zero



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5235:
URL: https://github.com/apache/iceberg/pull/5235#discussion_r917446409


##########
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java:
##########
@@ -1213,6 +1213,67 @@ public void testCompleteCreateTransaction() {
       Assert.assertEquals("Table location should match requested", "file:/tmp/ns/table", table.location());
     }
     assertFiles(table, FILE_A);
+    assertFilesPartitionSpec(table);
+    assertPreviousMetadataFileCount(table, 0);
+  }
+
+  @Test
+  public void testCompleteCreateTransactionMultipleSchemas() {
+    C catalog = catalog();
+
+    Map<String, String> properties = ImmutableMap.of("user", "someone", "created-at", "2022-02-25T00:38:19");
+    Transaction create = catalog.buildTable(TABLE, SCHEMA)
+        .withLocation("file:/tmp/ns/table")
+        .withPartitionSpec(SPEC)
+        .withSortOrder(WRITE_ORDER)
+        .withProperties(properties)
+        .createTransaction();
+
+    Assert.assertFalse("Table should not exist after createTransaction", catalog.tableExists(TABLE));
+
+    create.newFastAppend().appendFile(FILE_A).commit();
+
+    UpdateSchema updateSchema = create.updateSchema().addColumn("new_col", Types.LongType.get());
+    Schema newSchema = updateSchema.apply();
+    updateSchema.commit();
+
+    UpdatePartitionSpec updateSpec = create.updateSpec().addField("new_col");
+    PartitionSpec newSpec = updateSpec.apply();
+    updateSpec.commit();
+
+    ReplaceSortOrder replaceSortOrder = create.replaceSortOrder().asc("new_col");
+    SortOrder newSortOrder = replaceSortOrder.apply();
+    replaceSortOrder.commit();
+
+    DataFile anotherFile = DataFiles.builder(newSpec)
+        .withPath("/path/to/data-b.parquet")
+        .withFileSizeInBytes(10)
+        .withPartitionPath("id_bucket=0/new_col=0") // easy way to set partition data for now
+        .withRecordCount(2) // needs at least one record or else metrics will filter it out
+        .build();
+
+    create.newFastAppend().appendFile(anotherFile).commit();
+
+    Assert.assertFalse("Table should not exist after append commit", catalog.tableExists(TABLE));
+
+    create.commitTransaction();
+
+    Assert.assertTrue("Table should exist after append commit", catalog.tableExists(TABLE));
+    Table table = catalog.loadTable(TABLE);
+
+    Assert.assertEquals("Table schema should match the new schema",
+        newSchema.asStruct(), table.schema().asStruct());
+    Assert.assertEquals("Table should have create partition spec", newSpec.fields(), table.spec().fields());
+    Assert.assertEquals("Table should have create sort order", newSortOrder.fields(), table.sortOrder().fields());

Review Comment:
   Also, do we need to validate the IDs of each of these?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on a diff in pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
bryanck commented on code in PR #5235:
URL: https://github.com/apache/iceberg/pull/5235#discussion_r917452008


##########
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##########
@@ -1212,7 +1246,7 @@ public TableMetadata build() {
           ImmutableList.copyOf(newSnapshotLog),
           ImmutableList.copyOf(metadataHistory),
           ImmutableMap.copyOf(refs),
-          discardChanges ? ImmutableList.of() : ImmutableList.copyOf(changes)
+          discardChanges || base == null ? ImmutableList.of() : ImmutableList.copyOf(changes)

Review Comment:
   `TestLoadTableResponse.testHasOnlyKnownFields()` was failing, so my intent was to make it behave like `newTableMetadata` which was setting changes to empty, though probably my misunderstanding. I'll remove this and fix what's going on with that test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5235:
URL: https://github.com/apache/iceberg/pull/5235#discussion_r917446467


##########
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java:
##########
@@ -1213,6 +1213,67 @@ public void testCompleteCreateTransaction() {
       Assert.assertEquals("Table location should match requested", "file:/tmp/ns/table", table.location());
     }
     assertFiles(table, FILE_A);
+    assertFilesPartitionSpec(table);
+    assertPreviousMetadataFileCount(table, 0);
+  }
+
+  @Test
+  public void testCompleteCreateTransactionMultipleSchemas() {
+    C catalog = catalog();
+
+    Map<String, String> properties = ImmutableMap.of("user", "someone", "created-at", "2022-02-25T00:38:19");
+    Transaction create = catalog.buildTable(TABLE, SCHEMA)
+        .withLocation("file:/tmp/ns/table")
+        .withPartitionSpec(SPEC)
+        .withSortOrder(WRITE_ORDER)
+        .withProperties(properties)
+        .createTransaction();
+
+    Assert.assertFalse("Table should not exist after createTransaction", catalog.tableExists(TABLE));
+
+    create.newFastAppend().appendFile(FILE_A).commit();
+
+    UpdateSchema updateSchema = create.updateSchema().addColumn("new_col", Types.LongType.get());
+    Schema newSchema = updateSchema.apply();
+    updateSchema.commit();
+
+    UpdatePartitionSpec updateSpec = create.updateSpec().addField("new_col");
+    PartitionSpec newSpec = updateSpec.apply();
+    updateSpec.commit();
+
+    ReplaceSortOrder replaceSortOrder = create.replaceSortOrder().asc("new_col");
+    SortOrder newSortOrder = replaceSortOrder.apply();
+    replaceSortOrder.commit();
+
+    DataFile anotherFile = DataFiles.builder(newSpec)
+        .withPath("/path/to/data-b.parquet")
+        .withFileSizeInBytes(10)
+        .withPartitionPath("id_bucket=0/new_col=0") // easy way to set partition data for now
+        .withRecordCount(2) // needs at least one record or else metrics will filter it out
+        .build();
+
+    create.newFastAppend().appendFile(anotherFile).commit();
+
+    Assert.assertFalse("Table should not exist after append commit", catalog.tableExists(TABLE));
+
+    create.commitTransaction();
+
+    Assert.assertTrue("Table should exist after append commit", catalog.tableExists(TABLE));
+    Table table = catalog.loadTable(TABLE);
+
+    Assert.assertEquals("Table schema should match the new schema",
+        newSchema.asStruct(), table.schema().asStruct());
+    Assert.assertEquals("Table should have create partition spec", newSpec.fields(), table.spec().fields());
+    Assert.assertEquals("Table should have create sort order", newSortOrder.fields(), table.sortOrder().fields());
+    Assert.assertEquals("Table properties should be a superset of the requested properties",
+        properties.entrySet(),
+        Sets.intersection(properties.entrySet(), table.properties().entrySet()));
+    if (!overridesRequestedLocation()) {
+      Assert.assertEquals("Table location should match requested", "file:/tmp/ns/table", table.location());
+    }
+    assertFiles(table, FILE_A, anotherFile);
+    assertFilePartitionSpec(table, FILE_A, 0);
+    assertFilePartitionSpec(table, anotherFile, 1);

Review Comment:
   I think it would be better to use `TABLE_SPEC.specId()` and `newSpec.specId()` rather than hard-coding the IDs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5235:
URL: https://github.com/apache/iceberg/pull/5235#discussion_r917445448


##########
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##########
@@ -985,10 +1012,12 @@ public Builder addSnapshot(Snapshot snapshot) {
         return this;
       }
 
+      ValidationException.check(!schemas.isEmpty(), "Attempting to add a snapshot before a schema is added");
+      ValidationException.check(!specs.isEmpty(), "Attempting to add a snapshot before a partition spec is added");
+      ValidationException.check(!sortOrders.isEmpty(), "Attempting to add a snapshot before a sort order is added");
       ValidationException.check(!snapshotsById.containsKey(snapshot.snapshotId()),
           "Snapshot already exists for id: %s",
           snapshot.snapshotId());
-

Review Comment:
   Nit: unnecessary change that could cause commit conflicts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on a diff in pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
bryanck commented on code in PR #5235:
URL: https://github.com/apache/iceberg/pull/5235#discussion_r917454525


##########
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java:
##########
@@ -1213,6 +1213,67 @@ public void testCompleteCreateTransaction() {
       Assert.assertEquals("Table location should match requested", "file:/tmp/ns/table", table.location());
     }
     assertFiles(table, FILE_A);
+    assertFilesPartitionSpec(table);
+    assertPreviousMetadataFileCount(table, 0);
+  }
+
+  @Test
+  public void testCompleteCreateTransactionMultipleSchemas() {
+    C catalog = catalog();
+
+    Map<String, String> properties = ImmutableMap.of("user", "someone", "created-at", "2022-02-25T00:38:19");
+    Transaction create = catalog.buildTable(TABLE, SCHEMA)
+        .withLocation("file:/tmp/ns/table")
+        .withPartitionSpec(SPEC)
+        .withSortOrder(WRITE_ORDER)
+        .withProperties(properties)
+        .createTransaction();
+
+    Assert.assertFalse("Table should not exist after createTransaction", catalog.tableExists(TABLE));
+
+    create.newFastAppend().appendFile(FILE_A).commit();
+
+    UpdateSchema updateSchema = create.updateSchema().addColumn("new_col", Types.LongType.get());
+    Schema newSchema = updateSchema.apply();
+    updateSchema.commit();
+
+    UpdatePartitionSpec updateSpec = create.updateSpec().addField("new_col");
+    PartitionSpec newSpec = updateSpec.apply();
+    updateSpec.commit();
+
+    ReplaceSortOrder replaceSortOrder = create.replaceSortOrder().asc("new_col");
+    SortOrder newSortOrder = replaceSortOrder.apply();
+    replaceSortOrder.commit();
+
+    DataFile anotherFile = DataFiles.builder(newSpec)
+        .withPath("/path/to/data-b.parquet")
+        .withFileSizeInBytes(10)
+        .withPartitionPath("id_bucket=0/new_col=0") // easy way to set partition data for now
+        .withRecordCount(2) // needs at least one record or else metrics will filter it out
+        .build();
+
+    create.newFastAppend().appendFile(anotherFile).commit();
+
+    Assert.assertFalse("Table should not exist after append commit", catalog.tableExists(TABLE));
+
+    create.commitTransaction();
+
+    Assert.assertTrue("Table should exist after append commit", catalog.tableExists(TABLE));
+    Table table = catalog.loadTable(TABLE);
+
+    Assert.assertEquals("Table schema should match the new schema",
+        newSchema.asStruct(), table.schema().asStruct());
+    Assert.assertEquals("Table should have create partition spec", newSpec.fields(), table.spec().fields());
+    Assert.assertEquals("Table should have create sort order", newSortOrder.fields(), table.sortOrder().fields());

Review Comment:
   I updated the message to `updated`. and added ID validation. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5235:
URL: https://github.com/apache/iceberg/pull/5235#issuecomment-1179826290

   Thanks, @bryanck! Looks great. And thanks for finding a better solution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5235: REST: Create commit catalog handler fix

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5235:
URL: https://github.com/apache/iceberg/pull/5235#discussion_r917446187


##########
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##########
@@ -1212,7 +1246,7 @@ public TableMetadata build() {
           ImmutableList.copyOf(newSnapshotLog),
           ImmutableList.copyOf(metadataHistory),
           ImmutableMap.copyOf(refs),
-          discardChanges ? ImmutableList.of() : ImmutableList.copyOf(changes)
+          discardChanges || base == null ? ImmutableList.of() : ImmutableList.copyOf(changes)

Review Comment:
   I'm not sure about this. If base is `null`, aren't all of the changes still valid?
   
   Tests pass without this change, so I would probably not include it. What was the reason for adding this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org