You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/02 12:18:22 UTC

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #5419: [WIP] Handle table evolution using schemaId mapping

ajantha-bhat opened a new pull request, #5419:
URL: https://github.com/apache/iceberg/pull/5419

   `CREATE TABLE %s (id bigint, data string, category string) USING iceberg PARTITIONED BY (category) TBLPROPERTIES('format-version' = '2')`
   `ALTER TABLE %s DROP PARTITION FIELD category`
   `ALTER TABLE %s DROP COLUMN category` -- throws NPE as mentioned in #5399 
   
   - Fields in a partition spec can belong to an older schema. So, keep a schema id mapping in partition spec field.
   - TODO: test and handle backward compatibility
   - TODO: Same problem can exist for sort orders also. Need to modify spec there as well.
   
   Fixes #5399 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat closed pull request #5419: [WIP] Handle table evolution using schemaId mapping

Posted by GitBox <gi...@apache.org>.
ajantha-bhat closed pull request #5419: [WIP] Handle table evolution using schemaId mapping
URL: https://github.com/apache/iceberg/pull/5419


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #5419: [WIP] Handle table evolution using schemaId mapping

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #5419:
URL: https://github.com/apache/iceberg/pull/5419#discussion_r973568346


##########
.palantir/revapi.yml:
##########
@@ -3,6 +3,14 @@ versionOverrides:
 acceptedBreaks:
   apache-iceberg-0.14.0:
     org.apache.iceberg:iceberg-api:
+    - code: "java.class.defaultSerializationChanged"
+      old: "class org.apache.iceberg.PartitionField"
+      new: "class org.apache.iceberg.PartitionField"
+      justification: "{Introduce schema id mapping for partition spec field}"
+    - code: "java.class.defaultSerializationChanged"
+      old: "class org.apache.iceberg.PartitionKey"
+      new: "class org.apache.iceberg.PartitionKey"

Review Comment:
   auto moved by rev-api or spotless.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #5419: [WIP] Handle table evolution using schemaId mapping

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #5419:
URL: https://github.com/apache/iceberg/pull/5419#issuecomment-1205219983

   Somehow this problem disappeared. 
   Iceberg 0.14.0 spark3.2 and spark3.3 doesn't have this issue anymore.
   
   Only spark 3.1 has an issue. 
   
   I still need to figure out which PR fixed this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #5419: [WIP] Handle table evolution using schemaId mapping

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #5419:
URL: https://github.com/apache/iceberg/pull/5419#issuecomment-1203012423

   @rdblue, @RussellSpitzer : This PR needs some more work (mentioned TODO in the description).
   But I would like to have early feedback about this Iceberg spec modification.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5419: [WIP] Handle table evolution using schemaId mapping

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5419:
URL: https://github.com/apache/iceberg/pull/5419#discussion_r976359747


##########
.palantir/revapi.yml:
##########
@@ -3,6 +3,14 @@ versionOverrides:
 acceptedBreaks:
   apache-iceberg-0.14.0:
     org.apache.iceberg:iceberg-api:
+    - code: "java.class.defaultSerializationChanged"
+      old: "class org.apache.iceberg.PartitionField"
+      new: "class org.apache.iceberg.PartitionField"
+      justification: "{Introduce schema id mapping for partition spec field}"
+    - code: "java.class.defaultSerializationChanged"
+      old: "class org.apache.iceberg.PartitionKey"
+      new: "class org.apache.iceberg.PartitionKey"

Review Comment:
   it's auto-moved by revapi if you execute the gradle cmd it proposes, as spotless doesn't touch this file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5419: [WIP] Handle table evolution using schemaId mapping

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5419:
URL: https://github.com/apache/iceberg/pull/5419#discussion_r979749088


##########
api/src/main/java/org/apache/iceberg/PartitionField.java:
##########
@@ -28,12 +28,14 @@ public class PartitionField implements Serializable {
   private final int fieldId;
   private final String name;
   private final Transform<?, ?> transform;
+  private final int schemaId;
 
-  PartitionField(int sourceId, int fieldId, String name, Transform<?, ?> transform) {

Review Comment:
   I think we want to keep this constructor, and add another one to avoid breaking the API.



##########
api/src/main/java/org/apache/iceberg/PartitionSpec.java:
##########
@@ -536,14 +576,49 @@ Builder add(int sourceId, String name, Transform<?, ?> transform) {
 
     Builder add(int sourceId, int fieldId, String name, Transform<?, ?> transform) {
       checkAndAddPartitionName(name, sourceId);
-      fields.add(new PartitionField(sourceId, fieldId, name, transform));
+      fields.add(new PartitionField(sourceId, fieldId, name, transform, schema.schemaId()));
+      lastAssignedFieldId.getAndAccumulate(fieldId, Math::max);
+      return this;
+    }
+
+    Builder addExistingWithNewTransform(PartitionField field, Transform<?, ?> transform) {
+      return addFrom(field.sourceId(), field.fieldId(), field.name(), transform, field.schemaId());
+    }
+
+    // to add an existing UnboundPartitionField
+    Builder addExisting(UnboundPartitionSpec.UnboundPartitionField field) {
+      if (field.partitionId() != null) {
+        return addFrom(
+            field.sourceId(),
+            field.partitionId(),
+            field.name(),
+            field.transform(),
+            field.schemaId());
+      }
+      return addFrom(
+          field.sourceId(), nextFieldId(), field.name(), field.transform(), field.schemaId());
+    }
+
+    // to add an existing PartitionField
+    Builder addExisting(PartitionField field) {
+      return addFrom(
+          field.sourceId(), field.fieldId(), field.name(), field.transform(), field.schemaId());
+    }
+
+    // to add an existing field with minor or no modifications
+    Builder addFrom(
+        int sourceId, int fieldId, String name, Transform<?, ?> transform, int schemaId) {
+      checkAndAddPartitionName(name, sourceId);
+      fields.add(new PartitionField(sourceId, fieldId, name, transform, schemaId));
       lastAssignedFieldId.getAndAccumulate(fieldId, Math::max);
       return this;
     }
 
     public PartitionSpec build() {
       PartitionSpec spec = buildUnchecked();
-      checkCompatibility(spec, schema);
+      // TODO: fields in a spec can be mapped to more than one schema. Hence, we cannot check
+      // against single schema.
+      //      checkCompatibility(spec, schema);

Review Comment:
   Without this we can't really tell if it works



##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java:
##########
@@ -436,4 +441,20 @@ private SparkTable sparkTable() throws Exception {
     Identifier identifier = Identifier.of(tableIdent.namespace().levels(), tableIdent.name());
     return (SparkTable) catalog.loadTable(identifier);
   }
+
+  @Test
+  public void testDropColumnAfterDropPartition() {
+    Assume.assumeTrue(catalogName.equals(SparkCatalogConfig.HADOOP.catalogName()));
+    sql(
+        "CREATE TABLE %s (id bigint, data string, category string) USING iceberg PARTITIONED BY (category) TBLPROPERTIES('format-version' = '2')",
+        tableName);
+    sql("ALTER TABLE %s DROP PARTITION FIELD category", tableName);
+    sql("ALTER TABLE %s DROP COLUMN category", tableName);
+    Set<String> fieldNames =
+        validationCatalog.loadTable(tableIdent).schema().asStruct().fields().stream()
+            .map(Types.NestedField::name)
+            .collect(Collectors.toSet());
+    Assert.assertEquals("Should only have two columns", fieldNames.size(), 2);
+    Assert.assertFalse(fieldNames.contains("category"));
+  }

Review Comment:
   I think we want to test selecting rows here as well.



##########
api/src/main/java/org/apache/iceberg/UnboundPartitionSpec.java:
##########
@@ -53,11 +53,7 @@ private PartitionSpec.Builder copyToBuilder(Schema schema) {
     PartitionSpec.Builder builder = PartitionSpec.builderFor(schema).withSpecId(specId);
 
     for (UnboundPartitionField field : fields) {
-      if (field.partitionId != null) {

Review Comment:
   I think this is there for backward compatibility with V1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org