You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/13 08:37:31 UTC
[GitHub] [iceberg] natsukawa-kanou opened a new pull request #3888: AWS: show old fields in Glue table
natsukawa-kanou opened a new pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888
based on #3887
My organization wants to have Glue show old fields for Iceberg tables, so that people know what were the columns that were already used in the past and avoid adding the same name column.
@jackye1995 @yyanyy
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on pull request #3888: AWS: show old fields in Glue table
Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#issuecomment-1020513093
@natsukawa-kanou unit test failed, I think you need to fix old tests for this change
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on pull request #3888: AWS: show old fields in Glue table
Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#issuecomment-1012459164
Similar to the other PR, could you add integration tests for it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a change in pull request #3888: AWS: show old fields in Glue table
Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#discussion_r784279724
##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
##########
@@ -252,59 +256,118 @@ private static String toTypeString(Type type) {
private static List<Column> toColumns(TableMetadata metadata) {
List<Column> columns = Lists.newArrayList();
- Set<NestedField> rootColumnSet = Sets.newHashSet();
+ Set<String> addedNames = Sets.newHashSet();
// Add schema-column fields
for (NestedField field : metadata.schema().columns()) {
- rootColumnSet.add(field);
- columns.add(Column.builder()
- .name(field.name())
- .type(toTypeString(field.type()))
- .comment(field.doc())
- .parameters(convertToParameters(SCHEMA_COLUMN, field))
- .build());
+ addColumnWithDedupe(columns, addedNames, field, field.name(), SCHEMA_COLUMN, true);
}
// Add schema-subfield
- for (NestedField field : TypeUtil.indexById(metadata.schema().asStruct()).values()) {
- if (!rootColumnSet.contains(field)) {
- columns.add(Column.builder()
- .name(field.name())
- .type(toTypeString(field.type()))
- .comment(field.doc())
- .parameters(convertToParameters(SCHEMA_SUBFIELD, field))
- .build());
+ for (String fieldName : TypeUtil.indexNameById(metadata.schema().asStruct()).values()) {
+ NestedField field = metadata.schema().findField(fieldName);
+ if (field != null) {
+ addColumnWithDedupe(columns, addedNames, field, fieldName, SCHEMA_SUBFIELD, true);
+ }
+ }
+ // Add old schema fields
+ for (Schema schema : metadata.schemas()) {
+ if (schema.schemaId() != metadata.currentSchemaId()) {
+ // Add old schema-column fields
+ for (NestedField field : schema.columns()) {
+ addColumnWithDedupe(columns, addedNames, field, field.name(), SCHEMA_COLUMN, false);
+ }
+ // Add old schema-subfield
+ for (String fieldName : TypeUtil.indexNameById(schema.asStruct()).values()) {
+ NestedField field = schema.findField(fieldName);
+ if (field != null) {
+ addColumnWithDedupe(columns, addedNames, field, fieldName, SCHEMA_SUBFIELD, false);
+ }
+ }
}
}
// Add partition-field
for (PartitionField partitionField : metadata.spec().fields()) {
+ addPartitionColumnWithDedupe(columns, addedNames, partitionField, metadata.spec().schema(), true);
+ }
+ // Add old partition-field
+ for (PartitionSpec spec : metadata.specs()) {
Review comment:
similar comment for the partition spec, we can have a helper method
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on pull request #3888: AWS: show old fields in Glue table
Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#issuecomment-1020516390
thanks for the quick fix! overall looks good to me, running AWS integ test now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a change in pull request #3888: AWS: show old fields in Glue table
Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#discussion_r791119060
##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
##########
@@ -244,21 +246,31 @@ private static String toTypeString(Type type) {
Set<String> addedNames = Sets.newHashSet();
for (NestedField field : metadata.schema().columns()) {
- addColumnWithDedupe(columns, addedNames, field);
+ addColumnWithDedupe(columns, addedNames, field, true);
+ }
+
+ for (Schema schema : metadata.schemas()) {
+ if (schema.schemaId() != metadata.currentSchemaId()) {
+ for (NestedField field : schema.columns()) {
+ addColumnWithDedupe(columns, addedNames, field, false);
Review comment:
nit: same as above, inline comment
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on pull request #3888: AWS: show old fields in Glue table
Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#issuecomment-1020566477
AWS integ test passes and CI passes, approving
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on pull request #3888: AWS: show old fields in Glue table
Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#issuecomment-1020513093
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a change in pull request #3888: AWS: show old fields in Glue table
Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#discussion_r791118861
##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
##########
@@ -244,21 +246,31 @@ private static String toTypeString(Type type) {
Set<String> addedNames = Sets.newHashSet();
for (NestedField field : metadata.schema().columns()) {
- addColumnWithDedupe(columns, addedNames, field);
+ addColumnWithDedupe(columns, addedNames, field, true);
Review comment:
nit: prefer inline comment for boolean argument `true /* is current */`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a change in pull request #3888: AWS: show old fields in Glue table
Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#discussion_r791118861
##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
##########
@@ -244,21 +246,31 @@ private static String toTypeString(Type type) {
Set<String> addedNames = Sets.newHashSet();
for (NestedField field : metadata.schema().columns()) {
- addColumnWithDedupe(columns, addedNames, field);
+ addColumnWithDedupe(columns, addedNames, field, true);
Review comment:
nit: prefer inline comment for boolean argument `true /* is current */`
##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
##########
@@ -244,21 +246,31 @@ private static String toTypeString(Type type) {
Set<String> addedNames = Sets.newHashSet();
for (NestedField field : metadata.schema().columns()) {
- addColumnWithDedupe(columns, addedNames, field);
+ addColumnWithDedupe(columns, addedNames, field, true);
+ }
+
+ for (Schema schema : metadata.schemas()) {
+ if (schema.schemaId() != metadata.currentSchemaId()) {
+ for (NestedField field : schema.columns()) {
+ addColumnWithDedupe(columns, addedNames, field, false);
Review comment:
nit: same as above, inline comment
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a change in pull request #3888: AWS: show old fields in Glue table
Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888#discussion_r784279573
##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/IcebergToGlueConverter.java
##########
@@ -252,59 +256,118 @@ private static String toTypeString(Type type) {
private static List<Column> toColumns(TableMetadata metadata) {
List<Column> columns = Lists.newArrayList();
- Set<NestedField> rootColumnSet = Sets.newHashSet();
+ Set<String> addedNames = Sets.newHashSet();
// Add schema-column fields
for (NestedField field : metadata.schema().columns()) {
- rootColumnSet.add(field);
- columns.add(Column.builder()
- .name(field.name())
- .type(toTypeString(field.type()))
- .comment(field.doc())
- .parameters(convertToParameters(SCHEMA_COLUMN, field))
- .build());
+ addColumnWithDedupe(columns, addedNames, field, field.name(), SCHEMA_COLUMN, true);
}
// Add schema-subfield
- for (NestedField field : TypeUtil.indexById(metadata.schema().asStruct()).values()) {
- if (!rootColumnSet.contains(field)) {
- columns.add(Column.builder()
- .name(field.name())
- .type(toTypeString(field.type()))
- .comment(field.doc())
- .parameters(convertToParameters(SCHEMA_SUBFIELD, field))
- .build());
+ for (String fieldName : TypeUtil.indexNameById(metadata.schema().asStruct()).values()) {
+ NestedField field = metadata.schema().findField(fieldName);
+ if (field != null) {
+ addColumnWithDedupe(columns, addedNames, field, fieldName, SCHEMA_SUBFIELD, true);
+ }
+ }
+ // Add old schema fields
+ for (Schema schema : metadata.schemas()) {
Review comment:
I think we can have some sort of helper method for adding columns for a schema, so the logic do not have to be repeated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] jackye1995 merged pull request #3888: AWS: show old fields in Glue table
Posted by GitBox <gi...@apache.org>.
jackye1995 merged pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] jackye1995 merged pull request #3888: AWS: show old fields in Glue table
Posted by GitBox <gi...@apache.org>.
jackye1995 merged pull request #3888:
URL: https://github.com/apache/iceberg/pull/3888
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org