You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/13 08:37:31 UTC

[GitHub] [iceberg] natsukawa-kanou opened a new pull request #3888: AWS: show old fields in Glue table

natsukawa-kanou opened a new pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888


   based on #3887 
   
   My organization wants to have Glue show old fields for Iceberg tables, so that people know what were the columns that were already used in the past and avoid adding the same name column.
   
   @jackye1995 @yyanyy 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #3888: AWS: show old fields in Glue table

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#issuecomment-1020513093


   @natsukawa-kanou unit test failed, I think you need to fix old tests for this change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #3888: AWS: show old fields in Glue table

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#issuecomment-1012459164


   Similar to the other PR, could you add integration tests for it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3888: AWS: show old fields in Glue table

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#discussion_r784279724



##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
##########
@@ -252,59 +256,118 @@ private static String toTypeString(Type type) {
 
   private static List<Column> toColumns(TableMetadata metadata) {
     List<Column> columns = Lists.newArrayList();
-    Set<NestedField> rootColumnSet = Sets.newHashSet();
+    Set<String> addedNames = Sets.newHashSet();
     // Add schema-column fields
     for (NestedField field : metadata.schema().columns()) {
-      rootColumnSet.add(field);
-      columns.add(Column.builder()
-          .name(field.name())
-          .type(toTypeString(field.type()))
-          .comment(field.doc())
-          .parameters(convertToParameters(SCHEMA_COLUMN, field))
-          .build());
+      addColumnWithDedupe(columns, addedNames, field, field.name(), SCHEMA_COLUMN, true);
     }
     // Add schema-subfield
-    for (NestedField field : TypeUtil.indexById(metadata.schema().asStruct()).values()) {
-      if (!rootColumnSet.contains(field)) {
-        columns.add(Column.builder()
-            .name(field.name())
-            .type(toTypeString(field.type()))
-            .comment(field.doc())
-            .parameters(convertToParameters(SCHEMA_SUBFIELD, field))
-            .build());
+    for (String fieldName : TypeUtil.indexNameById(metadata.schema().asStruct()).values()) {
+      NestedField field = metadata.schema().findField(fieldName);
+      if (field != null) {
+        addColumnWithDedupe(columns, addedNames, field, fieldName, SCHEMA_SUBFIELD, true);
+      }
+    }
+    // Add old schema fields
+    for (Schema schema : metadata.schemas()) {
+      if (schema.schemaId() != metadata.currentSchemaId()) {
+        // Add old schema-column fields
+        for (NestedField field : schema.columns()) {
+          addColumnWithDedupe(columns, addedNames, field, field.name(), SCHEMA_COLUMN, false);
+        }
+        // Add old schema-subfield
+        for (String fieldName : TypeUtil.indexNameById(schema.asStruct()).values()) {
+          NestedField field = schema.findField(fieldName);
+          if (field != null) {
+            addColumnWithDedupe(columns, addedNames, field, fieldName, SCHEMA_SUBFIELD, false);
+          }
+        }
       }
     }
     // Add partition-field
     for (PartitionField partitionField : metadata.spec().fields()) {
+      addPartitionColumnWithDedupe(columns, addedNames, partitionField, metadata.spec().schema(), true);
+    }
+    // Add old partition-field
+    for (PartitionSpec spec : metadata.specs()) {

Review comment:
       similar comment for the partition spec, we can have a helper method




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #3888: AWS: show old fields in Glue table

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#issuecomment-1020516390


   thanks for the quick fix! overall looks good to me, running AWS integ test now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3888: AWS: show old fields in Glue table

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#discussion_r791119060



##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
##########
@@ -244,21 +246,31 @@ private static String toTypeString(Type type) {
     Set<String> addedNames = Sets.newHashSet();
 
     for (NestedField field : metadata.schema().columns()) {
-      addColumnWithDedupe(columns, addedNames, field);
+      addColumnWithDedupe(columns, addedNames, field, true);
+    }
+
+    for (Schema schema : metadata.schemas()) {
+      if (schema.schemaId() != metadata.currentSchemaId()) {
+        for (NestedField field : schema.columns()) {
+          addColumnWithDedupe(columns, addedNames, field, false);

Review comment:
       nit: same as above, inline comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #3888: AWS: show old fields in Glue table

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#issuecomment-1020566477


   AWS integ test passes and CI passes, approving


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #3888: AWS: show old fields in Glue table

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#issuecomment-1020513093






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3888: AWS: show old fields in Glue table

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#discussion_r791118861



##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
##########
@@ -244,21 +246,31 @@ private static String toTypeString(Type type) {
     Set<String> addedNames = Sets.newHashSet();
 
     for (NestedField field : metadata.schema().columns()) {
-      addColumnWithDedupe(columns, addedNames, field);
+      addColumnWithDedupe(columns, addedNames, field, true);

Review comment:
       nit: prefer inline comment for boolean argument `true /* is current */`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3888: AWS: show old fields in Glue table

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#discussion_r791118861



##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
##########
@@ -244,21 +246,31 @@ private static String toTypeString(Type type) {
     Set<String> addedNames = Sets.newHashSet();
 
     for (NestedField field : metadata.schema().columns()) {
-      addColumnWithDedupe(columns, addedNames, field);
+      addColumnWithDedupe(columns, addedNames, field, true);

Review comment:
       nit: prefer inline comment for boolean argument `true /* is current */`

##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
##########
@@ -244,21 +246,31 @@ private static String toTypeString(Type type) {
     Set<String> addedNames = Sets.newHashSet();
 
     for (NestedField field : metadata.schema().columns()) {
-      addColumnWithDedupe(columns, addedNames, field);
+      addColumnWithDedupe(columns, addedNames, field, true);
+    }
+
+    for (Schema schema : metadata.schemas()) {
+      if (schema.schemaId() != metadata.currentSchemaId()) {
+        for (NestedField field : schema.columns()) {
+          addColumnWithDedupe(columns, addedNames, field, false);

Review comment:
       nit: same as above, inline comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #3888: AWS: show old fields in Glue table

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#discussion_r784279573



##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
##########
@@ -252,59 +256,118 @@ private static String toTypeString(Type type) {
 
   private static List<Column> toColumns(TableMetadata metadata) {
     List<Column> columns = Lists.newArrayList();
-    Set<NestedField> rootColumnSet = Sets.newHashSet();
+    Set<String> addedNames = Sets.newHashSet();
     // Add schema-column fields
     for (NestedField field : metadata.schema().columns()) {
-      rootColumnSet.add(field);
-      columns.add(Column.builder()
-          .name(field.name())
-          .type(toTypeString(field.type()))
-          .comment(field.doc())
-          .parameters(convertToParameters(SCHEMA_COLUMN, field))
-          .build());
+      addColumnWithDedupe(columns, addedNames, field, field.name(), SCHEMA_COLUMN, true);
     }
     // Add schema-subfield
-    for (NestedField field : TypeUtil.indexById(metadata.schema().asStruct()).values()) {
-      if (!rootColumnSet.contains(field)) {
-        columns.add(Column.builder()
-            .name(field.name())
-            .type(toTypeString(field.type()))
-            .comment(field.doc())
-            .parameters(convertToParameters(SCHEMA_SUBFIELD, field))
-            .build());
+    for (String fieldName : TypeUtil.indexNameById(metadata.schema().asStruct()).values()) {
+      NestedField field = metadata.schema().findField(fieldName);
+      if (field != null) {
+        addColumnWithDedupe(columns, addedNames, field, fieldName, SCHEMA_SUBFIELD, true);
+      }
+    }
+    // Add old schema fields
+    for (Schema schema : metadata.schemas()) {

Review comment:
       I think we can have some sort of helper method for adding columns for a schema, so the logic do not have to be repeated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 merged pull request #3888: AWS: show old fields in Glue table

Posted by GitBox <gi...@apache.org>.
jackye1995 merged pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 merged pull request #3888: AWS: show old fields in Glue table

Posted by GitBox <gi...@apache.org>.
jackye1995 merged pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org