You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/01/28 22:02:15 UTC

[GitHub] [incubator-iceberg] rdsr opened a new pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

rdsr opened a new pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema
URL: https://github.com/apache/incubator-iceberg/pull/759
 
 
   First step to support #540
   
   Currently only additions and updates are supported. No renames and deletes. Currently unclear if there's a good way to support renames and deletes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema
URL: https://github.com/apache/incubator-iceberg/pull/759#discussion_r381033741
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
 ##########
 @@ -260,6 +261,64 @@ public UpdateSchema updateColumnDoc(String name, String doc) {
     return this;
   }
 
+  @Override
+  public UpdateSchema updateSchema(Schema newSchema) {
+    TypeUtil.visit(newSchema, new ApplyUpdates(schema, this));
+    return this;
+  }
+
+  private static final Joiner DOT = Joiner.on(".");
+
+  private static class ApplyUpdates extends TypeUtil.SchemaVisitor<Void> {
+    private final Schema schema;
+    private final UpdateSchema updateSchema;
+    private final Map<String, Integer> indexByName;
+
+    private ApplyUpdates(Schema schema, UpdateSchema updateSchema) {
+      this.schema = schema;
+      this.updateSchema = updateSchema;
+      this.indexByName = TypeUtil.indexByName(schema.asStruct());
+    }
+
+    @Override
+    public Void struct(Types.StructType struct, List<Void> fieldResults) {
 
 Review comment:
   Is this not intended to handle renames or deletes?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on issue #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #759: Update schema for field additions, updates [e.g type widening] using an input schema
URL: https://github.com/apache/incubator-iceberg/pull/759#issuecomment-587991569
 
 
   Thanks, @rdsr! Looks like a good start.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] fbocse commented on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse commented on pull request #759:
URL: https://github.com/apache/iceberg/pull/759#issuecomment-655091779


   @rdsr I've opened a PR against apache:master branch with this feature that you've started working on, I hope this is ok w/ you. Let me know if you think I should go about this a different way, thank you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] fbocse edited a comment on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse edited a comment on pull request #759:
URL: https://github.com/apache/iceberg/pull/759#issuecomment-644304984


   @rdsr pls don't ruin your weekend with this PR give me a bump when you get a chance to glance over it :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdsr commented on pull request #759:
URL: https://github.com/apache/iceberg/pull/759#issuecomment-636240552


   Thanks @fbocse . I'll have a look at this next week. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] fbocse edited a comment on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse edited a comment on pull request #759:
URL: https://github.com/apache/iceberg/pull/759#issuecomment-655091779


   @rdsr I've opened a PR https://github.com/apache/iceberg/pull/1177 against apache:master branch with this feature that you've started working on, I hope this is ok w/ you. Let me know if you think I should go about this a different way, thank you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema
URL: https://github.com/apache/incubator-iceberg/pull/759#discussion_r372537009
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
 ##########
 @@ -260,6 +261,64 @@ public UpdateSchema updateColumnDoc(String name, String doc) {
     return this;
   }
 
+  @Override
+  public UpdateSchema updateSchema(Schema newSchema) {
+    TypeUtil.visit(newSchema, new ApplyUpdates(schema, this));
+    return this;
+  }
+
+  private static final Joiner DOT = Joiner.on(".");
+
+  private static class ApplyUpdates extends TypeUtil.SchemaVisitor<Void> {
 
 Review comment:
   If we cannot reliably distinguish between renames  and deletes + add. Maybe we should throw an error if we encounter any deletes when using this API?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema
URL: https://github.com/apache/incubator-iceberg/pull/759#discussion_r382866331
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
 ##########
 @@ -260,6 +261,64 @@ public UpdateSchema updateColumnDoc(String name, String doc) {
     return this;
   }
 
+  @Override
+  public UpdateSchema updateSchema(Schema newSchema) {
+    TypeUtil.visit(newSchema, new ApplyUpdates(schema, this));
+    return this;
+  }
+
+  private static final Joiner DOT = Joiner.on(".");
+
+  private static class ApplyUpdates extends TypeUtil.SchemaVisitor<Void> {
+    private final Schema schema;
+    private final UpdateSchema updateSchema;
+    private final Map<String, Integer> indexByName;
+
+    private ApplyUpdates(Schema schema, UpdateSchema updateSchema) {
+      this.schema = schema;
+      this.updateSchema = updateSchema;
+      this.indexByName = TypeUtil.indexByName(schema.asStruct());
+    }
+
+    @Override
+    public Void struct(Types.StructType struct, List<Void> fieldResults) {
 
 Review comment:
   @rdblue .  I considered this API for supporting runtime [automatic] schema evolution. I think, in that scenario, we cannot distinguish between delete + add vs rename  reliably as the updated schema is coming from a compute engine, e.g Spark.
   
   I think I may still be able to support field additions, type widening using IDs as you mention here - https://github.com/apache/incubator-iceberg/pull/759#discussion_r381033678.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema
URL: https://github.com/apache/incubator-iceberg/pull/759#discussion_r381033678
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
 ##########
 @@ -260,6 +261,64 @@ public UpdateSchema updateColumnDoc(String name, String doc) {
     return this;
   }
 
+  @Override
+  public UpdateSchema updateSchema(Schema newSchema) {
+    TypeUtil.visit(newSchema, new ApplyUpdates(schema, this));
+    return this;
+  }
+
+  private static final Joiner DOT = Joiner.on(".");
+
+  private static class ApplyUpdates extends TypeUtil.SchemaVisitor<Void> {
+    private final Schema schema;
+    private final UpdateSchema updateSchema;
+    private final Map<String, Integer> indexByName;
+
+    private ApplyUpdates(Schema schema, UpdateSchema updateSchema) {
+      this.schema = schema;
+      this.updateSchema = updateSchema;
+      this.indexByName = TypeUtil.indexByName(schema.asStruct());
+    }
+
+    @Override
+    public Void struct(Types.StructType struct, List<Void> fieldResults) {
+      for (Types.NestedField newField : struct.fields()) {
+        Types.NestedField field = schema.findField(newField.fieldId());
+        String parents = DOT.join(fieldNames());
+        if (field == null) {
+          // found new field
+          if (parents.isEmpty()) {
+            // top level field
+            updateSchema.addColumn(null, newField.name(), newField.type());
+          } else if (indexByName.containsKey(parents)) {
+            // parent struct present
+            updateSchema.addColumn(parents, newField.name(), newField.type());
+          }
+          // else parent struct not present, so we will
+          // backtrack until we find a parent or reach top level
+        } else {
+          // updates
+          String name = join(parents, field.name());
 
 Review comment:
   I think it would be good to get rid of parents and configure `UpdateSchema` using field IDs instead of names. That would make some of these changes easier to handle. Is it possible to create limited-access variants of the configuration methods that accept field IDs instead of names?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] fbocse edited a comment on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse edited a comment on pull request #759:
URL: https://github.com/apache/iceberg/pull/759#issuecomment-634688680


   @rdsr I've picked up your changes and created a new PR on top of a refreshed master at https://github.com/fbocse/incubator-iceberg/pull/1 - if you think that this looks promising I would be glad to create a PR to get that code into this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] fbocse commented on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse commented on pull request #759:
URL: https://github.com/apache/iceberg/pull/759#issuecomment-644304984


   @rdsr pls don't ruin weekend with this PR give me a bump when you get a chance to glance over it :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema
URL: https://github.com/apache/incubator-iceberg/pull/759#discussion_r382871804
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
 ##########
 @@ -260,6 +261,64 @@ public UpdateSchema updateColumnDoc(String name, String doc) {
     return this;
   }
 
+  @Override
+  public UpdateSchema updateSchema(Schema newSchema) {
+    TypeUtil.visit(newSchema, new ApplyUpdates(schema, this));
+    return this;
+  }
+
+  private static final Joiner DOT = Joiner.on(".");
+
+  private static class ApplyUpdates extends TypeUtil.SchemaVisitor<Void> {
+    private final Schema schema;
+    private final UpdateSchema updateSchema;
+    private final Map<String, Integer> indexByName;
+
+    private ApplyUpdates(Schema schema, UpdateSchema updateSchema) {
+      this.schema = schema;
+      this.updateSchema = updateSchema;
+      this.indexByName = TypeUtil.indexByName(schema.asStruct());
+    }
+
+    @Override
+    public Void struct(Types.StructType struct, List<Void> fieldResults) {
 
 Review comment:
   I should note that in these cases, we tend to really try to get schema changes expressed as the intended changes instead of the new schema, to avoid this problem.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema
URL: https://github.com/apache/incubator-iceberg/pull/759#discussion_r381031878
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
 ##########
 @@ -260,6 +261,64 @@ public UpdateSchema updateColumnDoc(String name, String doc) {
     return this;
   }
 
+  @Override
+  public UpdateSchema updateSchema(Schema newSchema) {
+    TypeUtil.visit(newSchema, new ApplyUpdates(schema, this));
+    return this;
+  }
+
+  private static final Joiner DOT = Joiner.on(".");
+
+  private static class ApplyUpdates extends TypeUtil.SchemaVisitor<Void> {
+    private final Schema schema;
+    private final UpdateSchema updateSchema;
+    private final Map<String, Integer> indexByName;
+
+    private ApplyUpdates(Schema schema, UpdateSchema updateSchema) {
 
 Review comment:
   It would be nice to update the names to signal a bit better what's happening here. From the usage, it looks like the schema passed in here is the current table schema and the schema that is traversed is the updated schema. Maybe some Javadoc for this visitor class would help.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] fbocse commented on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse commented on pull request #759:
URL: https://github.com/apache/iceberg/pull/759#issuecomment-642207516


   Hi @rdsr - bump :) if you happen to find some spare time pls glance over this PR here https://github.com/fbocse/incubator-iceberg/pull/1 - if this looks like the same idea you had about schema reconciliation I'd rather merge this into your PR so we can both collaborate on the API


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema
URL: https://github.com/apache/incubator-iceberg/pull/759#discussion_r381033984
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
 ##########
 @@ -260,6 +261,64 @@ public UpdateSchema updateColumnDoc(String name, String doc) {
     return this;
   }
 
+  @Override
+  public UpdateSchema updateSchema(Schema newSchema) {
+    TypeUtil.visit(newSchema, new ApplyUpdates(schema, this));
+    return this;
+  }
+
+  private static final Joiner DOT = Joiner.on(".");
+
+  private static class ApplyUpdates extends TypeUtil.SchemaVisitor<Void> {
 
 Review comment:
   Since this uses field IDs, we can distinguish between a rename and a delete+add. We wouldn't be able to do that if we no longer have reliable field IDs though.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #759:
URL: https://github.com/apache/iceberg/pull/759#issuecomment-689022825


   I'm closing this because #1177 was merged. Thanks @fbocse and @rdsr!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdsr commented on pull request #759:
URL: https://github.com/apache/iceberg/pull/759#issuecomment-643381355


   @fbocse . Sorry I got a little busy at work. Plan to look this over over this weekend.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] fbocse edited a comment on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse edited a comment on pull request #759:
URL: https://github.com/apache/iceberg/pull/759#issuecomment-644304984


   @rdsr pls don't ruin your weekend with this PR, it can wait, work takes precedence so give me a bump when you get a chance to glance over it :) thank you  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdblue closed pull request #759:
URL: https://github.com/apache/iceberg/pull/759


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema
URL: https://github.com/apache/incubator-iceberg/pull/759#discussion_r382866771
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
 ##########
 @@ -260,6 +261,64 @@ public UpdateSchema updateColumnDoc(String name, String doc) {
     return this;
   }
 
+  @Override
+  public UpdateSchema updateSchema(Schema newSchema) {
+    TypeUtil.visit(newSchema, new ApplyUpdates(schema, this));
+    return this;
+  }
+
+  private static final Joiner DOT = Joiner.on(".");
+
+  private static class ApplyUpdates extends TypeUtil.SchemaVisitor<Void> {
+    private final Schema schema;
+    private final UpdateSchema updateSchema;
+    private final Map<String, Integer> indexByName;
+
+    private ApplyUpdates(Schema schema, UpdateSchema updateSchema) {
+      this.schema = schema;
+      this.updateSchema = updateSchema;
+      this.indexByName = TypeUtil.indexByName(schema.asStruct());
+    }
+
+    @Override
+    public Void struct(Types.StructType struct, List<Void> fieldResults) {
 
 Review comment:
   So do you think it is ok, If I only support field additions, updates like type widening as in this PR and not address deletes and renames ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] fbocse edited a comment on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse edited a comment on pull request #759:
URL: https://github.com/apache/iceberg/pull/759#issuecomment-644304984


   @rdsr pls don't ruin your weekend with this PRm this can wait, work takes precedence so give me a bump when you get a chance to glance over it :) thank you  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema
URL: https://github.com/apache/incubator-iceberg/pull/759#discussion_r382871744
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
 ##########
 @@ -260,6 +261,64 @@ public UpdateSchema updateColumnDoc(String name, String doc) {
     return this;
   }
 
+  @Override
+  public UpdateSchema updateSchema(Schema newSchema) {
+    TypeUtil.visit(newSchema, new ApplyUpdates(schema, this));
+    return this;
+  }
+
+  private static final Joiner DOT = Joiner.on(".");
+
+  private static class ApplyUpdates extends TypeUtil.SchemaVisitor<Void> {
+    private final Schema schema;
+    private final UpdateSchema updateSchema;
+    private final Map<String, Integer> indexByName;
+
+    private ApplyUpdates(Schema schema, UpdateSchema updateSchema) {
+      this.schema = schema;
+      this.updateSchema = updateSchema;
+      this.indexByName = TypeUtil.indexByName(schema.asStruct());
+    }
+
+    @Override
+    public Void struct(Types.StructType struct, List<Void> fieldResults) {
 
 Review comment:
   There are two cases, I think. If you have field IDs for both schemas, then you can handle renames by detecting the ID matches but names do not. If you don't have field IDs then I think the most important thing is to be consistent, rather than trying to guess. I'd say go with drop + add in that case.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] fbocse commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse commented on a change in pull request #759:
URL: https://github.com/apache/incubator-iceberg/pull/759#discussion_r426275852



##########
File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
##########
@@ -260,6 +261,64 @@ public UpdateSchema updateColumnDoc(String name, String doc) {
     return this;
   }
 
+  @Override
+  public UpdateSchema updateSchema(Schema newSchema) {
+    TypeUtil.visit(newSchema, new ApplyUpdates(schema, this));
+    return this;
+  }
+
+  private static final Joiner DOT = Joiner.on(".");
+
+  private static class ApplyUpdates extends TypeUtil.SchemaVisitor<Void> {
+    private final Schema schema;
+    private final UpdateSchema updateSchema;
+    private final Map<String, Integer> indexByName;
+
+    private ApplyUpdates(Schema schema, UpdateSchema updateSchema) {
+      this.schema = schema;
+      this.updateSchema = updateSchema;
+      this.indexByName = TypeUtil.indexByName(schema.asStruct());
+    }
+
+    @Override
+    public Void struct(Types.StructType struct, List<Void> fieldResults) {
+      for (Types.NestedField newField : struct.fields()) {
+        Types.NestedField field = schema.findField(newField.fieldId());
+        String parents = DOT.join(fieldNames());
+        if (field == null) {
+          // found new field
+          if (parents.isEmpty()) {
+            // top level field
+            updateSchema.addColumn(null, newField.name(), newField.type());
+          } else if (indexByName.containsKey(parents)) {
+            // parent struct present
+            updateSchema.addColumn(parents, newField.name(), newField.type());
+          }
+          // else parent struct not present, so we will
+          // backtrack until we find a parent or reach top level
+        } else {
+          // updates
+          String name = join(parents, field.name());
+          if (field.type().isPrimitiveType() && !field.type().equals(newField.type())) {
+            Preconditions.checkState(newField.type().isPrimitiveType(), "%s is not a primitive type", field);
+            updateSchema.updateColumn(name, newField.type().asPrimitiveType());
+          }
+          if (newField.doc() != null && !newField.doc().equals(field.doc())) {
+            updateSchema.updateColumnDoc(name, newField.doc());
+          }
+        }
+      }
+      return null;
+    }
+  }
+
+  private static String join(String parent, String name) {
+    if (parent.isEmpty()) {
+      return name;
+    }
+    return DOT.join(parent, name);

Review comment:
       Does this look simpler?
   
   ```suggestion
   return (parent.isEmpty()) ? name : DOT.join(parent, name);
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] fbocse edited a comment on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse edited a comment on pull request #759:
URL: https://github.com/apache/incubator-iceberg/pull/759#issuecomment-631683741


   @rdsr wanted to check in w/ if this PR is still relevant - I tried to continue working the bugs and complete implementation for the PR you had started here - I'll share a branch from my fork and maybe you tell me how we can collaborate on seeing this through. wtyd?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] fbocse commented on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse commented on pull request #759:
URL: https://github.com/apache/incubator-iceberg/pull/759#issuecomment-631683741


   @rdsr wanted to check in w/ if this PR is still relevant - I tried to continue working the bugs and complete implementation for this - I'll share a branch from my fork and maybe you tell me how we can collaborate on seeing this through. wtyd?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] fbocse commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse commented on a change in pull request #759:
URL: https://github.com/apache/incubator-iceberg/pull/759#discussion_r426275786



##########
File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
##########
@@ -260,6 +261,64 @@ public UpdateSchema updateColumnDoc(String name, String doc) {
     return this;
   }
 
+  @Override
+  public UpdateSchema updateSchema(Schema newSchema) {
+    TypeUtil.visit(newSchema, new ApplyUpdates(schema, this));
+    return this;
+  }
+
+  private static final Joiner DOT = Joiner.on(".");
+
+  private static class ApplyUpdates extends TypeUtil.SchemaVisitor<Void> {
+    private final Schema schema;
+    private final UpdateSchema updateSchema;
+    private final Map<String, Integer> indexByName;
+
+    private ApplyUpdates(Schema schema, UpdateSchema updateSchema) {
+      this.schema = schema;
+      this.updateSchema = updateSchema;
+      this.indexByName = TypeUtil.indexByName(schema.asStruct());
+    }
+
+    @Override
+    public Void struct(Types.StructType struct, List<Void> fieldResults) {
+      for (Types.NestedField newField : struct.fields()) {
+        Types.NestedField field = schema.findField(newField.fieldId());
+        String parents = DOT.join(fieldNames());
+        if (field == null) {
+          // found new field
+          if (parents.isEmpty()) {
+            // top level field
+            updateSchema.addColumn(null, newField.name(), newField.type());
+          } else if (indexByName.containsKey(parents)) {
+            // parent struct present
+            updateSchema.addColumn(parents, newField.name(), newField.type());
+          }
+          // else parent struct not present, so we will
+          // backtrack until we find a parent or reach top level
+        } else {
+          // updates
+          String name = join(parents, field.name());
+          if (field.type().isPrimitiveType() && !field.type().equals(newField.type())) {
+            Preconditions.checkState(newField.type().isPrimitiveType(), "%s is not a primitive type", field);
+            updateSchema.updateColumn(name, newField.type().asPrimitiveType());
+          }
+          if (newField.doc() != null && !newField.doc().equals(field.doc())) {
+            updateSchema.updateColumnDoc(name, newField.doc());
+          }
+        }
+      }
+      return null;
+    }
+  }
+
+  private static String join(String parent, String name) {
+    if (parent.isEmpty()) {
+      return name;
+    }
+    return DOT.join(parent, name);

Review comment:
       ```suggestion
       return (parent.isEmpty()) ? name : DOT.join(parent, name);
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] fbocse commented on a change in pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse commented on a change in pull request #759:
URL: https://github.com/apache/incubator-iceberg/pull/759#discussion_r426275852



##########
File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
##########
@@ -260,6 +261,64 @@ public UpdateSchema updateColumnDoc(String name, String doc) {
     return this;
   }
 
+  @Override
+  public UpdateSchema updateSchema(Schema newSchema) {
+    TypeUtil.visit(newSchema, new ApplyUpdates(schema, this));
+    return this;
+  }
+
+  private static final Joiner DOT = Joiner.on(".");
+
+  private static class ApplyUpdates extends TypeUtil.SchemaVisitor<Void> {
+    private final Schema schema;
+    private final UpdateSchema updateSchema;
+    private final Map<String, Integer> indexByName;
+
+    private ApplyUpdates(Schema schema, UpdateSchema updateSchema) {
+      this.schema = schema;
+      this.updateSchema = updateSchema;
+      this.indexByName = TypeUtil.indexByName(schema.asStruct());
+    }
+
+    @Override
+    public Void struct(Types.StructType struct, List<Void> fieldResults) {
+      for (Types.NestedField newField : struct.fields()) {
+        Types.NestedField field = schema.findField(newField.fieldId());
+        String parents = DOT.join(fieldNames());
+        if (field == null) {
+          // found new field
+          if (parents.isEmpty()) {
+            // top level field
+            updateSchema.addColumn(null, newField.name(), newField.type());
+          } else if (indexByName.containsKey(parents)) {
+            // parent struct present
+            updateSchema.addColumn(parents, newField.name(), newField.type());
+          }
+          // else parent struct not present, so we will
+          // backtrack until we find a parent or reach top level
+        } else {
+          // updates
+          String name = join(parents, field.name());
+          if (field.type().isPrimitiveType() && !field.type().equals(newField.type())) {
+            Preconditions.checkState(newField.type().isPrimitiveType(), "%s is not a primitive type", field);
+            updateSchema.updateColumn(name, newField.type().asPrimitiveType());
+          }
+          if (newField.doc() != null && !newField.doc().equals(field.doc())) {
+            updateSchema.updateColumnDoc(name, newField.doc());
+          }
+        }
+      }
+      return null;
+    }
+  }
+
+  private static String join(String parent, String name) {
+    if (parent.isEmpty()) {
+      return name;
+    }
+    return DOT.join(parent, name);

Review comment:
       ```suggestion
   return (parent.isEmpty()) ? name : DOT.join(parent, name);
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] fbocse edited a comment on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse edited a comment on pull request #759:
URL: https://github.com/apache/incubator-iceberg/pull/759#issuecomment-631683741


   @rdsr wanted to check in w/ you on whether this PR is still relevant or not - I tried to continue to  complete the implementation you've started on this PR - I'll share a branch from my fork and maybe you tell me how we can collaborate on seeing this through. wtyd?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdsr commented on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
rdsr commented on pull request #759:
URL: https://github.com/apache/incubator-iceberg/pull/759#issuecomment-631890464


   @fbocse . This PR is still relevant. I just didn't have the time to get back to it. Happy to collaborate.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] fbocse commented on pull request #759: Update schema for field additions, updates [e.g type widening] using an input schema

Posted by GitBox <gi...@apache.org>.
fbocse commented on pull request #759:
URL: https://github.com/apache/incubator-iceberg/pull/759#issuecomment-634688680


   @rdsr picked up your changes and created a new PR on top of a refreshed master at https://github.com/fbocse/incubator-iceberg/pull/1 - if you think that this looks promising I would be glad to create a PR to get that code into this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org