You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "wmoustafa (via GitHub)" <gi...@apache.org> on 2023/04/21 01:08:30 UTC

[GitHub] [iceberg] wmoustafa opened a new pull request, #7392: Avro: Add Avro-assisted name mapping

wmoustafa opened a new pull request, #7392:
URL: https://github.com/apache/iceberg/pull/7392

   This PR adds a new name mapping mechanism that leverages mapping keys outside the Iceberg schema, e.g., from a source Avro schema. The application of such name mapping is mapping complex union type fields to their respective field IDs in the Iceberg schema. Some more context is in #5704.
   
   The logical sequence for reading union types is that there is an input Avro schema using which the union type data was originally written. Such schema is used to generate the table Iceberg schema. A complex union type in the Avro schema is converted to an Iceberg struct with fields `(int tag, Type0 field0, Type1 field1, ..)`, where each `Type_i` corresponds to a union branch from the input Avro schema, in the same order they appear with in the Avro schema. At read stage the Iceberg schema is being used to read the Avro files with union types, but there is no guarantee that the fields in the file correspond to the Iceberg schema struct in the same order; hence name mapping is required to guide how to connect the union type branches from the file to the respective fields in the Iceberg struct. Hence name mapping is used to relate schema from the file to the Iceberg IDs.
   
   The current name mapping mechanism builds the entire name mapping from the Iceberg schema as the sole source of truth. However, this is not extensible to cases where we want to map Avro union types to Iceberg structs because:
   * Avro union type options do not have a field name to begin with.
   * Other sources of identification (e.g., union branch data type or branch record type in case of Avro named types) do not make it to the Iceberg schema.
   Due to the above reasons, a supporting Avro schema is required to derive the name mapping, unlike the case with using field names where only the Iceberg schema is adequate.
   
   To map union types using both the source Avro schema and the derived Iceberg schema, the two schemas are traversed simultaneously, and the ID is extracted from the Iceberg schema and other identifying information is extracted from the Avro schema:
   * In case of union type branch that is a named type (e.g., `RECORD`, `ENUM`, `FIXED`), the record name is used as an identifier.
   * Else the type `toString()` value is used.
   
   The above is adequate identifying information according to Avro Spec since Avro unions could not contain more than one map or array type, or two of the same primitive types. Further, named type names are unique within the same union types.
   
   In non-union type branch cases (e.g., regular nested fields), standard field name is used as the name mapping key.
   
   Since the Avro schema in this scenario is the main schema from which the Iceberg schema is derived, reusing `AvroSchemaWithTypeVisitor` is not possible as the abstract visitor class since it assumes the Avro schema is already annotated with the IDs. Hence, this PR introduces a new abstract visitor class `AvroSchemaWithDerivedTypeVisitor` which assumes that the Avro schema is the source schema and the Iceberg schema is the derived schema. The name mapping logic is defined in `NameMappingWithAvroSchema`. Specific union-type-specific implementation that reflects the identification method above is in `NameMappingWithAvroSchema#union`.
   
   Testing:
   Added unit test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Avro: Add Avro-assisted name mapping [iceberg]

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#issuecomment-1756153921

   > I think this is ready. Just a few minor updates needed; mostly https://github.com/apache/iceberg/pull/7392/files#r1224853756.
   
   Addressed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Avro: Add Avro-assisted name mapping [iceberg]

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#issuecomment-1868128440

   Thanks, @wmoustafa! Good to have the ability to generate these mappings. Can you also update `ApplyNameMapping` to use them? It was added in https://github.com/apache/iceberg/pull/9347.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1224847740


##########
core/src/main/java/org/apache/iceberg/avro/SchemaToType.java:
##########
@@ -104,13 +104,23 @@ public Type record(Schema record, List<String> names, List<Type> fieldTypes) {
 
   @Override
   public Type union(Schema union, List<Type> options) {
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Unsupported type: non-option union: %s", union);
-    // records, arrays, and maps will check nullability later
-    if (options.get(0) == null) {
-      return options.get(1);
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      if (options.get(0) == null) {
+        return options.get(1);
+      } else {
+        return options.get(0);
+      }
     } else {
-      return options.get(0);
+      // Create list of Iceberg schema fields
+      List<Types.NestedField> fields = Lists.newArrayListWithExpectedSize(options.size());
+      int tagIndex = 0;
+      fields.add(Types.NestedField.required(allocateId(), "tag", Types.IntegerType.get()));
+      for (Type option : options) {
+        if (option != null) {

Review Comment:
   I think I'm okay with this to match the behavior of other conversions. We do need to ensure that the tag matches the field name. That's going to be annoying in the read path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1193228601


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroSchemaWithDerivedTypeVisitor<MappedFields> {
+
+  @Override
+  public MappedFields record(
+      Types.StructType iStruct,
+      Schema record,
+      List<String> names,
+      List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = iStruct.fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type iType, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          iType instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          iType,
+          union);
+      Types.StructType iStruct = (Types.StructType) iType;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the full record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    iStruct.fields().get(index).fieldId(),
+                    option.getFullName(),

Review Comment:
   Why cannot we avoid deduplication by using the full name?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1179922145


##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithDerivedTypeVisitor.java:
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.Deque;
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * Avro schema with Type visitor, where Type is guaranteed to be derived from Avro schema, and Avro
+ * schema does not necessarily have field IDs (unlike {@link AvroSchemaWithTypeVisitor}). Avro
+ * schema and Iceberg schema fields are matched by position.
+ */
+public abstract class AvroSchemaWithDerivedTypeVisitor<T> {

Review Comment:
   So I tried `AvroWithPartnerByStructureVisitor `, but could not modify `visitUnion` to call `fieldNameAndType` since the former is static and the latter is not. `fieldNameAndType` is required to access a union option's respective Iceberg type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1174456058


##########
core/src/main/java/org/apache/iceberg/avro/SchemaToType.java:
##########
@@ -104,13 +104,23 @@ public Type record(Schema record, List<String> names, List<Type> fieldTypes) {
 
   @Override
   public Type union(Schema union, List<Type> options) {
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Unsupported type: non-option union: %s", union);
-    // records, arrays, and maps will check nullability later
-    if (options.get(0) == null) {
-      return options.get(1);
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      if (options.get(0) == null) {
+        return options.get(1);
+      } else {
+        return options.get(0);
+      }
     } else {
-      return options.get(0);
+      // Create list of Iceberg schema fields
+      List<Types.NestedField> fields = Lists.newArrayListWithExpectedSize(options.size());
+      int tagIndex = 0;
+      fields.add(Types.NestedField.required(allocateId(), "tag", Types.IntegerType.get()));
+      for (Type option : options) {
+        if (option != null) {
+          fields.add(Types.NestedField.optional(allocateId(), "field" + tagIndex++, option));

Review Comment:
   Iceberg does not use the return value of the `++` operator because it is confusing to readers. Can you move this to a separate line?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1224533549


##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithDerivedTypeVisitor.java:
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.Deque;
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * Avro schema with Type visitor, where Type is guaranteed to be derived from Avro schema, and Avro
+ * schema does not necessarily have field IDs (unlike {@link AvroSchemaWithTypeVisitor}). Avro
+ * schema and Iceberg schema fields are matched by position.
+ */
+public abstract class AvroSchemaWithDerivedTypeVisitor<T> {

Review Comment:
   Is this visitor needed? The visitor is now based on `AvroWithPartnerByStructureVisitor` so I don't think we need to add this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1224850778


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroWithTypeByStructureVisitor<MappedFields> {
+  @Override
+  public MappedFields record(
+      Type struct, Schema record, List<String> names, List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = struct.asStructType().fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type type, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          type instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          type,
+          union);
+      Types.StructType struct = (Types.StructType) type;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      // Avro spec for union types states that unions may not contain more than one schema with the
+      // same type, except for the named types record, fixed and enum. For example, unions
+      // containing two array types or two map types are not permitted, but two types with different
+      // names are permitted.
+      // Therefore, for non-named types, use the Avro type toString() as the field mapping key. For
+      // named types, use the record name of the Avro type as the field mapping key.
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    struct.fields().get(index).fieldId(),
+                    option.getName(),

Review Comment:
   I still think this should add the short name as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1174457125


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroSchemaWithDerivedTypeVisitor<MappedFields> {
+
+  @Override
+  public MappedFields record(
+      Types.StructType iStruct,
+      Schema record,
+      List<String> names,
+      List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = iStruct.fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type iType, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          iType instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          iType,
+          union);
+      Types.StructType iStruct = (Types.StructType) iType;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the full record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    iStruct.fields().get(index).fieldId(),
+                    option.getFullName(),
+                    optionResults.get(index)));
+          } else {
+            fields.add(
+                MappedField.of(
+                    iStruct.fields().get(index).fieldId(),
+                    option.toString(),
+                    optionResults.get(index)));
+          }
+          // Both iStruct and optionResults do not contain an entry for the NULL type, so we need to
+          // increment i only
+          // when we encounter a non-NULL type.
+          index++;
+        }
+      }
+      return MappedFields.of(fields);
+    }
+    return null;
+  }
+
+  @Override
+  public MappedFields array(Types.ListType iList, Schema array, MappedFields elementResult) {
+    return MappedFields.of(MappedField.of(iList.elementId(), "element", elementResult));
+  }
+
+  @Override
+  public MappedFields map(
+      Types.MapType iMap, Schema map, MappedFields keyResult, MappedFields valueResult) {
+    return MappedFields.of(
+        MappedField.of(iMap.keyId(), "key", keyResult),
+        MappedField.of(iMap.valueId(), "value", valueResult));
+  }
+
+  @Override
+  public MappedFields map(Types.MapType iMap, Schema map, MappedFields valueResult) {
+    return MappedFields.of(MappedField.of(iMap.valueId(), "value", valueResult));

Review Comment:
   This still needs to create a mapping for the key field, right? The Iceberg map will have a key field with a String type. There won't be a nested mapping because the key is a String, but there should be a mapping from `iMap.keyId()` to `"key"`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1174456675


##########
core/src/main/java/org/apache/iceberg/avro/SchemaToType.java:
##########
@@ -104,13 +104,23 @@ public Type record(Schema record, List<String> names, List<Type> fieldTypes) {
 
   @Override
   public Type union(Schema union, List<Type> options) {
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Unsupported type: non-option union: %s", union);
-    // records, arrays, and maps will check nullability later
-    if (options.get(0) == null) {
-      return options.get(1);
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      if (options.get(0) == null) {
+        return options.get(1);
+      } else {
+        return options.get(0);
+      }
     } else {
-      return options.get(0);
+      // Create list of Iceberg schema fields
+      List<Types.NestedField> fields = Lists.newArrayListWithExpectedSize(options.size());
+      int tagIndex = 0;
+      fields.add(Types.NestedField.required(allocateId(), "tag", Types.IntegerType.get()));
+      for (Type option : options) {
+        if (option != null) {

Review Comment:
   I'm thinking through the case of a non-option union with null, `["int", "null", "float"]`. That would produce: `struct<1: tag required int, 2: field0 optional int, 3: field1 optional float>`.
   
   When the union's value is null, the would be `row(tag=1, field0=null, field1=null)` and when the union's value is a float it would be `row(tag=2, field0=null, field1=3.1)`. The mismatch between field names and the `tag` value is odd to me. I think the tag index should be updated every time through the loop, regardless of whether the option type is `null`.
   
   Also, since these are called options, what about using "option" rather than "field"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1224539725


##########
core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerByStructureVisitor.java:
##########
@@ -93,14 +94,23 @@ private static <P, T> T visitRecord(
   private static <P, T> T visitUnion(
       P type, Schema union, AvroWithPartnerByStructureVisitor<P, T> visitor) {
     List<Schema> types = union.getTypes();
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Cannot visit non-option union: %s", union);
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit(visitor.nullType(), branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit(visitor.nullType(), branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else {
+      List<Schema> nonNullTypes =
+          types.stream().filter(t -> t.getType() != Schema.Type.NULL).collect(Collectors.toList());
+      for (int i = 0; i < nonNullTypes.size(); i++) {

Review Comment:
   Shouldn't this block cast `type` to `StructType` and visit each field in that struct?
   
   It should also throw an exception if `type` is not a struct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1224854011


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroWithTypeByStructureVisitor<MappedFields> {
+  @Override
+  public MappedFields record(
+      Type struct, Schema record, List<String> names, List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = struct.asStructType().fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type type, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          type instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          type,
+          union);
+      Types.StructType struct = (Types.StructType) type;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      // Avro spec for union types states that unions may not contain more than one schema with the
+      // same type, except for the named types record, fixed and enum. For example, unions
+      // containing two array types or two map types are not permitted, but two types with different
+      // names are permitted.
+      // Therefore, for non-named types, use the Avro type toString() as the field mapping key. For
+      // named types, use the record name of the Avro type as the field mapping key.
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    struct.fields().get(index).fieldId(),
+                    option.getName(),
+                    optionResults.get(index)));
+          } else {
+            fields.add(
+                MappedField.of(
+                    struct.fields().get(index).fieldId(),
+                    option.toString(),
+                    optionResults.get(index)));
+          }
+
+          // Both iStruct and optionResults do not contain an entry for the NULL type, so we need to
+          // increment i only
+          // when we encounter a non-NULL type.

Review Comment:
   Style: Line wrapping is incorrect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1193229935


##########
core/src/test/java/org/apache/iceberg/avro/TestNameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.SchemaBuilder;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestNameMappingWithAvroSchema {
+  @Test
+  public void testNameMappingWithAvroSchema() {
+    NameMappingWithAvroSchema nameMappingWithAvroSchema = new NameMappingWithAvroSchema();
+    // Create example Avro schema
+    Schema schema =
+        SchemaBuilder.record("test")

Review Comment:
   Done.



##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroSchemaWithDerivedTypeVisitor<MappedFields> {
+
+  @Override
+  public MappedFields record(
+      Types.StructType iStruct,
+      Schema record,
+      List<String> names,
+      List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = iStruct.fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type iType, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          iType instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          iType,
+          union);
+      Types.StructType iStruct = (Types.StructType) iType;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the full record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    iStruct.fields().get(index).fieldId(),
+                    option.getFullName(),
+                    optionResults.get(index)));
+          } else {
+            fields.add(
+                MappedField.of(
+                    iStruct.fields().get(index).fieldId(),
+                    option.toString(),
+                    optionResults.get(index)));
+          }
+          // Both iStruct and optionResults do not contain an entry for the NULL type, so we need to

Review Comment:
   Added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1193229973


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroSchemaWithDerivedTypeVisitor<MappedFields> {
+
+  @Override
+  public MappedFields record(
+      Types.StructType iStruct,
+      Schema record,
+      List<String> names,
+      List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = iStruct.fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type iType, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          iType instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          iType,
+          union);
+      Types.StructType iStruct = (Types.StructType) iType;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      for (Schema option : union.getTypes()) {

Review Comment:
   Added comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1193229848


##########
core/src/main/java/org/apache/iceberg/avro/SchemaToType.java:
##########
@@ -104,13 +104,23 @@ public Type record(Schema record, List<String> names, List<Type> fieldTypes) {
 
   @Override
   public Type union(Schema union, List<Type> options) {
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Unsupported type: non-option union: %s", union);
-    // records, arrays, and maps will check nullability later
-    if (options.get(0) == null) {
-      return options.get(1);
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      if (options.get(0) == null) {
+        return options.get(1);
+      } else {
+        return options.get(0);
+      }
     } else {
-      return options.get(0);
+      // Create list of Iceberg schema fields
+      List<Types.NestedField> fields = Lists.newArrayListWithExpectedSize(options.size());
+      int tagIndex = 0;
+      fields.add(Types.NestedField.required(allocateId(), "tag", Types.IntegerType.get()));
+      for (Type option : options) {
+        if (option != null) {
+          fields.add(Types.NestedField.optional(allocateId(), "field" + tagIndex++, option));

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1224537373


##########
core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerByStructureVisitor.java:
##########
@@ -93,14 +94,23 @@ private static <P, T> T visitRecord(
   private static <P, T> T visitUnion(
       P type, Schema union, AvroWithPartnerByStructureVisitor<P, T> visitor) {
     List<Schema> types = union.getTypes();
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Cannot visit non-option union: %s", union);
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit(visitor.nullType(), branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit(visitor.nullType(), branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else {
+      List<Schema> nonNullTypes =
+          types.stream().filter(t -> t.getType() != Schema.Type.NULL).collect(Collectors.toList());

Review Comment:
   This should visit every type, including null. The option schema does this above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Avro: Add Avro-assisted name mapping [iceberg]

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue merged PR #7392:
URL: https://github.com/apache/iceberg/pull/7392


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1174457469


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroSchemaWithDerivedTypeVisitor<MappedFields> {
+
+  @Override
+  public MappedFields record(
+      Types.StructType iStruct,
+      Schema record,
+      List<String> names,
+      List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = iStruct.fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type iType, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          iType instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          iType,
+          union);
+      Types.StructType iStruct = (Types.StructType) iType;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the full record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    iStruct.fields().get(index).fieldId(),
+                    option.getFullName(),

Review Comment:
   I would add the short name without the namespace as well, since we can map multiple names. It's really unlikely that we would have conflicting class names, like `com.example.Pair` and `org.apache.iceberg.Pair`.
   
   If you did this, you'd need to look for duplicates and only add the first, but that's a pretty reasonable addition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1315120319


##########
core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerByStructureVisitor.java:
##########
@@ -93,14 +94,23 @@ private static <P, T> T visitRecord(
   private static <P, T> T visitUnion(
       P type, Schema union, AvroWithPartnerByStructureVisitor<P, T> visitor) {
     List<Schema> types = union.getTypes();
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Cannot visit non-option union: %s", union);
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit(visitor.nullType(), branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit(visitor.nullType(), branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else {
+      List<Schema> nonNullTypes =
+          types.stream().filter(t -> t.getType() != Schema.Type.NULL).collect(Collectors.toList());
+      for (int i = 0; i < nonNullTypes.size(); i++) {
+        // In the case of complex union, the corresponding "type" is a struct. Non-null type i in
+        // the union maps to struct filed i + 1 because the first struct field is the "tag".
+        options.add(
+            visit(visitor.fieldNameAndType(type, i + 1).second(), nonNullTypes.get(i), visitor));

Review Comment:
   Yes. The visitor needs to be generic and should not assume that you're handling null in a particular way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1210480417


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroSchemaWithDerivedTypeVisitor<MappedFields> {
+
+  @Override
+  public MappedFields record(
+      Types.StructType iStruct,
+      Schema record,
+      List<String> names,
+      List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = iStruct.fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type iType, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          iType instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          iType,
+          union);
+      Types.StructType iStruct = (Types.StructType) iType;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the full record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    iStruct.fields().get(index).fieldId(),
+                    option.getFullName(),

Review Comment:
   Updated to short-name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1210479941


##########
core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerByStructureVisitor.java:
##########
@@ -93,19 +94,41 @@ private static <P, T> T visitRecord(
   private static <P, T> T visitUnion(
       P type, Schema union, AvroWithPartnerByStructureVisitor<P, T> visitor) {
     List<Schema> types = union.getTypes();
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Cannot visit non-option union: %s", union);
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit(visitor.nullType(), branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit(visitor.nullType(), branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else {
+      Types.StructType struct = (Types.StructType) type;
+      int index = 0;
+      for (Schema branch : types) {
+        if (branch.getType() != Schema.Type.NULL) {
+          options.add(visit(fieldTypeByName(type, "field" + index, visitor), branch, visitor));

Review Comment:
   Reason for going by the name was that the name also reflects a position (which in some cases is more authoritative), but it could work either way. Removed the name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1193225755


##########
core/src/main/java/org/apache/iceberg/avro/SchemaToType.java:
##########
@@ -104,13 +104,23 @@ public Type record(Schema record, List<String> names, List<Type> fieldTypes) {
 
   @Override
   public Type union(Schema union, List<Type> options) {
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Unsupported type: non-option union: %s", union);
-    // records, arrays, and maps will check nullability later
-    if (options.get(0) == null) {
-      return options.get(1);
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      if (options.get(0) == null) {
+        return options.get(1);
+      } else {
+        return options.get(0);
+      }
     } else {
-      return options.get(0);
+      // Create list of Iceberg schema fields
+      List<Types.NestedField> fields = Lists.newArrayListWithExpectedSize(options.size());
+      int tagIndex = 0;
+      fields.add(Types.NestedField.required(allocateId(), "tag", Types.IntegerType.get()));
+      for (Type option : options) {
+        if (option != null) {

Review Comment:
   So this is skipping the optional branch in the union. In this step, its value is Java's null (see the counterpart simple union if branch  -a few lines before this- which also uses Java's null).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1199221178


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroSchemaWithDerivedTypeVisitor<MappedFields> {
+
+  @Override
+  public MappedFields record(
+      Types.StructType iStruct,
+      Schema record,
+      List<String> names,
+      List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = iStruct.fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type iType, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          iType instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          iType,
+          union);
+      Types.StructType iStruct = (Types.StructType) iType;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the full record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    iStruct.fields().get(index).fieldId(),
+                    option.getFullName(),

Review Comment:
   Conflicting short names are extremely unlikely, so we shouldn't worry about it. But using a short name is really useful for unrelated code modifications. For example, if you've moved a class in your project between packages, you don't want to have to go update the mapping by hand.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1193229991


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroSchemaWithDerivedTypeVisitor<MappedFields> {
+
+  @Override
+  public MappedFields record(
+      Types.StructType iStruct,
+      Schema record,
+      List<String> names,
+      List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = iStruct.fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type iType, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          iType instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          iType,
+          union);
+      Types.StructType iStruct = (Types.StructType) iType;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the full record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    iStruct.fields().get(index).fieldId(),
+                    option.getFullName(),
+                    optionResults.get(index)));
+          } else {
+            fields.add(
+                MappedField.of(
+                    iStruct.fields().get(index).fieldId(),
+                    option.toString(),
+                    optionResults.get(index)));
+          }
+          // Both iStruct and optionResults do not contain an entry for the NULL type, so we need to
+          // increment i only
+          // when we encounter a non-NULL type.
+          index++;
+        }
+      }
+      return MappedFields.of(fields);
+    }
+    return null;
+  }
+
+  @Override
+  public MappedFields array(Types.ListType iList, Schema array, MappedFields elementResult) {
+    return MappedFields.of(MappedField.of(iList.elementId(), "element", elementResult));
+  }
+
+  @Override
+  public MappedFields map(
+      Types.MapType iMap, Schema map, MappedFields keyResult, MappedFields valueResult) {
+    return MappedFields.of(
+        MappedField.of(iMap.keyId(), "key", keyResult),
+        MappedField.of(iMap.valueId(), "value", valueResult));
+  }
+
+  @Override
+  public MappedFields map(Types.MapType iMap, Schema map, MappedFields valueResult) {
+    return MappedFields.of(MappedField.of(iMap.valueId(), "value", valueResult));

Review Comment:
   Added map key mapping.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1174457662


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroSchemaWithDerivedTypeVisitor<MappedFields> {
+
+  @Override
+  public MappedFields record(
+      Types.StructType iStruct,
+      Schema record,
+      List<String> names,
+      List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = iStruct.fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type iType, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          iType instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          iType,
+          union);
+      Types.StructType iStruct = (Types.StructType) iType;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      for (Schema option : union.getTypes()) {

Review Comment:
   It would be nice to add a comment about why this mapping works, like the Avro rule that disallows the same type appearing in the union more than once.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1186920701


##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithDerivedTypeVisitor.java:
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.Deque;
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * Avro schema with Type visitor, where Type is guaranteed to be derived from Avro schema, and Avro
+ * schema does not necessarily have field IDs (unlike {@link AvroSchemaWithTypeVisitor}). Avro
+ * schema and Iceberg schema fields are matched by position.
+ */
+public abstract class AvroSchemaWithDerivedTypeVisitor<T> {

Review Comment:
   `fieldNameAndType` is a method on the visitor. In `visitUnion` you should be able to call `visitor.fieldNameAndType`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1186921392


##########
core/src/main/java/org/apache/iceberg/avro/SchemaToType.java:
##########
@@ -104,13 +104,23 @@ public Type record(Schema record, List<String> names, List<Type> fieldTypes) {
 
   @Override
   public Type union(Schema union, List<Type> options) {
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Unsupported type: non-option union: %s", union);
-    // records, arrays, and maps will check nullability later
-    if (options.get(0) == null) {
-      return options.get(1);
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      if (options.get(0) == null) {
+        return options.get(1);
+      } else {
+        return options.get(0);
+      }
     } else {
-      return options.get(0);
+      // Create list of Iceberg schema fields
+      List<Types.NestedField> fields = Lists.newArrayListWithExpectedSize(options.size());
+      int tagIndex = 0;
+      fields.add(Types.NestedField.required(allocateId(), "tag", Types.IntegerType.get()));
+      for (Type option : options) {
+        if (option != null) {

Review Comment:
   My understanding was that the intent here was to skip any NULL entry in the union. That would mean that the example above has tags that don't match the `fieldN` naming convention after any null value.
   
   However, it looks like this is checking for Java's `null` rather than `Schema.create(Schema.Type.NULL)`.
   
   If you're not skipping the NULL schema in ID assignment, then I'm happy with it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1193228601


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroSchemaWithDerivedTypeVisitor<MappedFields> {
+
+  @Override
+  public MappedFields record(
+      Types.StructType iStruct,
+      Schema record,
+      List<String> names,
+      List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = iStruct.fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type iType, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          iType instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          iType,
+          union);
+      Types.StructType iStruct = (Types.StructType) iType;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the full record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    iStruct.fields().get(index).fieldId(),
+                    option.getFullName(),

Review Comment:
   Why is it better to use short name and deduplicate compared to using the full name and avoid deduplication?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1201676629


##########
core/src/main/java/org/apache/iceberg/avro/SchemaToType.java:
##########
@@ -104,13 +104,23 @@ public Type record(Schema record, List<String> names, List<Type> fieldTypes) {
 
   @Override
   public Type union(Schema union, List<Type> options) {
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Unsupported type: non-option union: %s", union);
-    // records, arrays, and maps will check nullability later
-    if (options.get(0) == null) {
-      return options.get(1);
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      if (options.get(0) == null) {
+        return options.get(1);
+      } else {
+        return options.get(0);
+      }
     } else {
-      return options.get(0);
+      // Create list of Iceberg schema fields
+      List<Types.NestedField> fields = Lists.newArrayListWithExpectedSize(options.size());
+      int tagIndex = 0;
+      fields.add(Types.NestedField.required(allocateId(), "tag", Types.IntegerType.get()));
+      for (Type option : options) {
+        if (option != null) {

Review Comment:
   The union null branch is to denote that the complex union as optional. The corresponding field will always be null, and even when the union is null, the whole corresponding struct is null (so it will not be the case that all fields in the struct are null). Typically that is also how other engines go about union to struct conversion. Here are some examples in [Hive](ttps://github.com/apache/hive/blob/4f1dba31b25fb2c3b071492ff06b54f76a55c99a/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFExtractUnion.java#L193) and [Trino](https://github.com/trinodb/trino/blob/1621cb6d92b3e96b8204ba71ceca06e8090f0a58/plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/HiveTypeTranslator.java#L217) (they convert from Hive schema but it is always nullable). Also Spark `SchemaConverters` [explicitly drops the null before mapping to struct](https://github.com/apache/spark/blob/a5c53384def22b01b8ef28bee6f2d10648bce1a1/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConver
 ters.scala#L129).
   
   We would want to be compatible with other other conventions because:
   * It actually makes sense to view the union type as optional + the main types, so the indexes map to the main types.
   * Continue the convention adopted elsewhere and not create multiple conventions/standards.
   * Avoid adding data conversion adapters (between two schema conventions) during migrations from other formats to Iceberg, especially in the Trino case, where the schemas exactly match (`(tag, field_0, field_1,...)` with no dedicated null field). Trino set the precedent that converts union to struct at the table level (unlike Hive which uses explicit UDF, and Spark that does it at the file level). In addition to avoiding migrations, we could establish the schema field names/convention adopted by both Iceberg and Trino as the standard for other engines to follow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1224850251


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroWithTypeByStructureVisitor<MappedFields> {
+  @Override
+  public MappedFields record(
+      Type struct, Schema record, List<String> names, List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = struct.asStructType().fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type type, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          type instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          type,
+          union);
+      Types.StructType struct = (Types.StructType) type;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      // Avro spec for union types states that unions may not contain more than one schema with the
+      // same type, except for the named types record, fixed and enum. For example, unions
+      // containing two array types or two map types are not permitted, but two types with different
+      // names are permitted.
+      // Therefore, for non-named types, use the Avro type toString() as the field mapping key. For
+      // named types, use the record name of the Avro type as the field mapping key.
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {

Review Comment:
   I think this should just skip the `null` in `optionResults` right? That seems easier.
   
   I don't see why this is looping over `union.getTypes` when we have predictable values coming in `optionResults` and `struct.fields()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1224541725


##########
core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerByStructureVisitor.java:
##########
@@ -93,14 +94,23 @@ private static <P, T> T visitRecord(
   private static <P, T> T visitUnion(
       P type, Schema union, AvroWithPartnerByStructureVisitor<P, T> visitor) {
     List<Schema> types = union.getTypes();
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Cannot visit non-option union: %s", union);
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit(visitor.nullType(), branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit(visitor.nullType(), branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else {
+      List<Schema> nonNullTypes =
+          types.stream().filter(t -> t.getType() != Schema.Type.NULL).collect(Collectors.toList());
+      for (int i = 0; i < nonNullTypes.size(); i++) {
+        // In the case of complex union, the corresponding "type" is a struct. Non-null type i in
+        // the union maps to struct filed i + 1 because the first struct field is the "tag".
+        options.add(
+            visit(visitor.fieldNameAndType(type, i + 1).second(), nonNullTypes.get(i), visitor));

Review Comment:
   This looks fine, except for the way null is handled.
   
   If the visitor implementation chooses to ignore the null type, that's fine. But it I think it should still be visited. Since there is no matching struct field, you can pass `null` for the field.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1280023106


##########
core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerByStructureVisitor.java:
##########
@@ -93,14 +94,23 @@ private static <P, T> T visitRecord(
   private static <P, T> T visitUnion(
       P type, Schema union, AvroWithPartnerByStructureVisitor<P, T> visitor) {
     List<Schema> types = union.getTypes();
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Cannot visit non-option union: %s", union);
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit(visitor.nullType(), branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit(visitor.nullType(), branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else {
+      List<Schema> nonNullTypes =
+          types.stream().filter(t -> t.getType() != Schema.Type.NULL).collect(Collectors.toList());
+      for (int i = 0; i < nonNullTypes.size(); i++) {
+        // In the case of complex union, the corresponding "type" is a struct. Non-null type i in
+        // the union maps to struct filed i + 1 because the first struct field is the "tag".
+        options.add(
+            visit(visitor.fieldNameAndType(type, i + 1).second(), nonNullTypes.get(i), visitor));

Review Comment:
   I am trying to avoid that to keep the code clean. If we visit NULL we will have to manipulate the `i` index differently before and after we visit the NULL. We can also keep the above code and add one line for the NULL if it exists, but it will not look very clean either. Do you think it is worth it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1279718265


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroWithTypeByStructureVisitor<MappedFields> {
+  @Override
+  public MappedFields record(
+      Type struct, Schema record, List<String> names, List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = struct.asStructType().fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type type, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          type instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          type,
+          union);
+      Types.StructType struct = (Types.StructType) type;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      // Avro spec for union types states that unions may not contain more than one schema with the
+      // same type, except for the named types record, fixed and enum. For example, unions
+      // containing two array types or two map types are not permitted, but two types with different
+      // names are permitted.
+      // Therefore, for non-named types, use the Avro type toString() as the field mapping key. For
+      // named types, use the record name of the Avro type as the field mapping key.
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    struct.fields().get(index).fieldId(),
+                    option.getName(),

Review Comment:
   `getName()` returns the short name.



##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithDerivedTypeVisitor.java:
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.Deque;
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * Avro schema with Type visitor, where Type is guaranteed to be derived from Avro schema, and Avro
+ * schema does not necessarily have field IDs (unlike {@link AvroSchemaWithTypeVisitor}). Avro
+ * schema and Iceberg schema fields are matched by position.
+ */
+public abstract class AvroSchemaWithDerivedTypeVisitor<T> {

Review Comment:
   Good catch. Should have been deleted after the refactor. It is gone now.



##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroWithTypeByStructureVisitor<MappedFields> {
+  @Override
+  public MappedFields record(
+      Type struct, Schema record, List<String> names, List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = struct.asStructType().fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type type, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          type instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          type,
+          union);
+      Types.StructType struct = (Types.StructType) type;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      // Avro spec for union types states that unions may not contain more than one schema with the
+      // same type, except for the named types record, fixed and enum. For example, unions
+      // containing two array types or two map types are not permitted, but two types with different
+      // names are permitted.
+      // Therefore, for non-named types, use the Avro type toString() as the field mapping key. For
+      // named types, use the record name of the Avro type as the field mapping key.
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    struct.fields().get(index).fieldId(),
+                    option.getName(),
+                    optionResults.get(index)));
+          } else {
+            fields.add(
+                MappedField.of(
+                    struct.fields().get(index).fieldId(),
+                    option.toString(),

Review Comment:
   Good point. Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1210481363


##########
core/src/main/java/org/apache/iceberg/avro/SchemaToType.java:
##########
@@ -104,13 +104,23 @@ public Type record(Schema record, List<String> names, List<Type> fieldTypes) {
 
   @Override
   public Type union(Schema union, List<Type> options) {
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Unsupported type: non-option union: %s", union);
-    // records, arrays, and maps will check nullability later
-    if (options.get(0) == null) {
-      return options.get(1);
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      if (options.get(0) == null) {
+        return options.get(1);
+      } else {
+        return options.get(0);
+      }
     } else {
-      return options.get(0);
+      // Create list of Iceberg schema fields
+      List<Types.NestedField> fields = Lists.newArrayListWithExpectedSize(options.size());
+      int tagIndex = 0;
+      fields.add(Types.NestedField.required(allocateId(), "tag", Types.IntegerType.get()));
+      for (Type option : options) {
+        if (option != null) {

Review Comment:
   Also, the `tag` field value will align with the non-optional branch indexes (in subsequent read PR).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1199826105


##########
core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerByStructureVisitor.java:
##########
@@ -93,19 +94,41 @@ private static <P, T> T visitRecord(
   private static <P, T> T visitUnion(
       P type, Schema union, AvroWithPartnerByStructureVisitor<P, T> visitor) {
     List<Schema> types = union.getTypes();
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Cannot visit non-option union: %s", union);
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit(visitor.nullType(), branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit(visitor.nullType(), branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else {
+      Types.StructType struct = (Types.StructType) type;
+      int index = 0;
+      for (Schema branch : types) {
+        if (branch.getType() != Schema.Type.NULL) {
+          options.add(visit(fieldTypeByName(type, "field" + index, visitor), branch, visitor));

Review Comment:
   This visitor is by structure and not by name. That should be sufficient for the purpose of this PR, because we know the structure of the schemas match (one was just converted from the other).
   
   Can you remove the by-name code here? It doesn't make sense for this visitor and I don't think we need it for the PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1224537891


##########
core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerByStructureVisitor.java:
##########
@@ -93,14 +94,23 @@ private static <P, T> T visitRecord(
   private static <P, T> T visitUnion(
       P type, Schema union, AvroWithPartnerByStructureVisitor<P, T> visitor) {
     List<Schema> types = union.getTypes();
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Cannot visit non-option union: %s", union);
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit(visitor.nullType(), branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit(visitor.nullType(), branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else {
+      List<Schema> nonNullTypes =
+          types.stream().filter(t -> t.getType() != Schema.Type.NULL).collect(Collectors.toList());
+      for (int i = 0; i < nonNullTypes.size(); i++) {
+        // In the case of complex union, the corresponding "type" is a struct. Non-null type i in
+        // the union maps to struct filed i + 1 because the first struct field is the "tag".

Review Comment:
   Typo: filed -> field.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1199221593


##########
core/src/main/java/org/apache/iceberg/avro/SchemaToType.java:
##########
@@ -104,13 +104,23 @@ public Type record(Schema record, List<String> names, List<Type> fieldTypes) {
 
   @Override
   public Type union(Schema union, List<Type> options) {
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Unsupported type: non-option union: %s", union);
-    // records, arrays, and maps will check nullability later
-    if (options.get(0) == null) {
-      return options.get(1);
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      if (options.get(0) == null) {
+        return options.get(1);
+      } else {
+        return options.get(0);
+      }
     } else {
-      return options.get(0);
+      // Create list of Iceberg schema fields
+      List<Types.NestedField> fields = Lists.newArrayListWithExpectedSize(options.size());
+      int tagIndex = 0;
+      fields.add(Types.NestedField.required(allocateId(), "tag", Types.IntegerType.get()));
+      for (Type option : options) {
+        if (option != null) {

Review Comment:
   I don't think we should skip the optional branch. That makes the field names no longer aligned with the tag.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1174457243


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroSchemaWithDerivedTypeVisitor<MappedFields> {
+
+  @Override
+  public MappedFields record(
+      Types.StructType iStruct,
+      Schema record,
+      List<String> names,
+      List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = iStruct.fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type iType, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          iType instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          iType,
+          union);
+      Types.StructType iStruct = (Types.StructType) iType;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the full record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    iStruct.fields().get(index).fieldId(),
+                    option.getFullName(),
+                    optionResults.get(index)));
+          } else {
+            fields.add(
+                MappedField.of(
+                    iStruct.fields().get(index).fieldId(),
+                    option.toString(),
+                    optionResults.get(index)));
+          }
+          // Both iStruct and optionResults do not contain an entry for the NULL type, so we need to

Review Comment:
   We generally add whitespace after control flow blocks if there is a statement following the block, like there is here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1174456739


##########
core/src/test/java/org/apache/iceberg/avro/TestNameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import org.apache.avro.Schema;
+import org.apache.avro.SchemaBuilder;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestNameMappingWithAvroSchema {
+  @Test
+  public void testNameMappingWithAvroSchema() {
+    NameMappingWithAvroSchema nameMappingWithAvroSchema = new NameMappingWithAvroSchema();
+    // Create example Avro schema
+    Schema schema =
+        SchemaBuilder.record("test")

Review Comment:
   Can you avoid using the schema builder? I think direct schema creation is not only more clear, it is also a LOT shorter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1193229818


##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithDerivedTypeVisitor.java:
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.Deque;
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * Avro schema with Type visitor, where Type is guaranteed to be derived from Avro schema, and Avro
+ * schema does not necessarily have field IDs (unlike {@link AvroSchemaWithTypeVisitor}). Avro
+ * schema and Iceberg schema fields are matched by position.
+ */
+public abstract class AvroSchemaWithDerivedTypeVisitor<T> {

Review Comment:
   Moved to `AvroWithPartnerByStructureVisitor`. Introduced intermediate class `AvroWithTypeByStructureVisitor` which could be helpful for other Avro + Iceberg schema structural traversals.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1201676629


##########
core/src/main/java/org/apache/iceberg/avro/SchemaToType.java:
##########
@@ -104,13 +104,23 @@ public Type record(Schema record, List<String> names, List<Type> fieldTypes) {
 
   @Override
   public Type union(Schema union, List<Type> options) {
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Unsupported type: non-option union: %s", union);
-    // records, arrays, and maps will check nullability later
-    if (options.get(0) == null) {
-      return options.get(1);
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      if (options.get(0) == null) {
+        return options.get(1);
+      } else {
+        return options.get(0);
+      }
     } else {
-      return options.get(0);
+      // Create list of Iceberg schema fields
+      List<Types.NestedField> fields = Lists.newArrayListWithExpectedSize(options.size());
+      int tagIndex = 0;
+      fields.add(Types.NestedField.required(allocateId(), "tag", Types.IntegerType.get()));
+      for (Type option : options) {
+        if (option != null) {

Review Comment:
   The union null branch is to denote that the complex union as optional. The corresponding field will always be null, and even when the union is null, the whole corresponding struct is null (so it will not be the case that all fields in the struct are null). Typically that is also how other engines go about union to struct conversion. Here are some examples in [Hive](https://github.com/apache/hive/blob/4f1dba31b25fb2c3b071492ff06b54f76a55c99a/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFExtractUnion.java#L193) and [Trino](https://github.com/trinodb/trino/blob/1621cb6d92b3e96b8204ba71ceca06e8090f0a58/plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/HiveTypeTranslator.java#L217) (they convert from Hive schema but it is always nullable). Also Spark `SchemaConverters` [explicitly drops the null before mapping to struct](https://github.com/apache/spark/blob/a5c53384def22b01b8ef28bee6f2d10648bce1a1/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConve
 rters.scala#L129).
   
   We would want to be compatible with other other conventions because:
   * It actually makes sense to view the union type as optional + the main types, so the indexes map to the main types.
   * Continue the convention adopted elsewhere and not create multiple conventions/standards.
   * Avoid adding data conversion adapters (between two schema conventions) during migrations from other formats to Iceberg, especially in the Trino case, where the schemas exactly match (`(tag, field_0, field_1,...)` with no dedicated null field). Trino set the precedent that converts union to struct at the table level (unlike Hive which uses explicit UDF, and Spark that does it at the file level). In addition to avoiding migrations, we could establish the schema field names/convention adopted by both Iceberg and Trino as the standard for other engines to follow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1174455775


##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithDerivedTypeVisitor.java:
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.Deque;
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * Avro schema with Type visitor, where Type is guaranteed to be derived from Avro schema, and Avro
+ * schema does not necessarily have field IDs (unlike {@link AvroSchemaWithTypeVisitor}). Avro
+ * schema and Iceberg schema fields are matched by position.
+ */
+public abstract class AvroSchemaWithDerivedTypeVisitor<T> {

Review Comment:
   How is this different from the `AvroWithPartnerByStructureVisitor`? Couldn't we use that instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1179939865


##########
core/src/main/java/org/apache/iceberg/avro/SchemaToType.java:
##########
@@ -104,13 +104,23 @@ public Type record(Schema record, List<String> names, List<Type> fieldTypes) {
 
   @Override
   public Type union(Schema union, List<Type> options) {
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Unsupported type: non-option union: %s", union);
-    // records, arrays, and maps will check nullability later
-    if (options.get(0) == null) {
-      return options.get(1);
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      if (options.get(0) == null) {
+        return options.get(1);
+      } else {
+        return options.get(0);
+      }
     } else {
-      return options.get(0);
+      // Create list of Iceberg schema fields
+      List<Types.NestedField> fields = Lists.newArrayListWithExpectedSize(options.size());
+      int tagIndex = 0;
+      fields.add(Types.NestedField.required(allocateId(), "tag", Types.IntegerType.get()));
+      for (Type option : options) {
+        if (option != null) {

Review Comment:
   To clarify:
   
   1- Does this depend on whether there is a `null` option or not? I think the behavior is the same. The current logic creates `field_i` corresponding to non-null union options, starting from 0. The behavior sounds the same whether `null` is there (in the options) or not. This means that `tag` convention does not depend on whether there is a `null` option or not. Do you agree?
   
   2- `tag` is set at read time. In the read PR, it is set in such a way that it is zero-based index as well. The test cases are [here](https://github.com/apache/iceberg/pull/5704/files#diff-409a0a443647e843af30276987f07ddd067c2ef7660d632e89bfce0d783c18e3R94). So in both cases, whether there is a null option or not, it will match `i` in `field_i`.
   
   3- I think when the union value is null, the struct will be null as well (although this does not affect the two points above).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wmoustafa commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1179922145


##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithDerivedTypeVisitor.java:
##########
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.Deque;
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+/**
+ * Avro schema with Type visitor, where Type is guaranteed to be derived from Avro schema, and Avro
+ * schema does not necessarily have field IDs (unlike {@link AvroSchemaWithTypeVisitor}). Avro
+ * schema and Iceberg schema fields are matched by position.
+ */
+public abstract class AvroSchemaWithDerivedTypeVisitor<T> {

Review Comment:
   I tried `AvroWithPartnerByStructureVisitor`, but could not modify `visitUnion` to call `fieldNameAndType` since the former is static and the latter is not. `fieldNameAndType` is required to access a union option's respective Iceberg type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7392: Avro: Add Avro-assisted name mapping

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1224853756


##########
core/src/main/java/org/apache/iceberg/avro/NameMappingWithAvroSchema.java:
##########
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.avro;
+
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.MappedFields;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.types.Type;
+import org.apache.iceberg.types.Types;
+
+public class NameMappingWithAvroSchema extends AvroWithTypeByStructureVisitor<MappedFields> {
+  @Override
+  public MappedFields record(
+      Type struct, Schema record, List<String> names, List<MappedFields> fieldResults) {
+    List<MappedField> fields = Lists.newArrayListWithExpectedSize(fieldResults.size());
+
+    for (int i = 0; i < fieldResults.size(); i += 1) {
+      Types.NestedField field = struct.asStructType().fields().get(i);
+      MappedFields result = fieldResults.get(i);
+      fields.add(MappedField.of(field.fieldId(), field.name(), result));
+    }
+
+    return MappedFields.of(fields);
+  }
+
+  @Override
+  public MappedFields union(Type type, Schema union, List<MappedFields> optionResults) {
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (int i = 0; i < optionResults.size(); i += 1) {
+        if (union.getTypes().get(i).getType() != Schema.Type.NULL) {
+          return optionResults.get(i);
+        }
+      }
+    } else { // Complex union
+      Preconditions.checkArgument(
+          type instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          type,
+          union);
+      Types.StructType struct = (Types.StructType) type;
+      List<MappedField> fields = Lists.newArrayListWithExpectedSize(optionResults.size());
+      int index = 0;
+      // Avro spec for union types states that unions may not contain more than one schema with the
+      // same type, except for the named types record, fixed and enum. For example, unions
+      // containing two array types or two map types are not permitted, but two types with different
+      // names are permitted.
+      // Therefore, for non-named types, use the Avro type toString() as the field mapping key. For
+      // named types, use the record name of the Avro type as the field mapping key.
+      for (Schema option : union.getTypes()) {
+        if (option.getType() != Schema.Type.NULL) {
+          // Check if current option is a named type, i.e., a RECORD, ENUM, or FIXED type. If so,
+          // use the record name of the Avro type as the field name. Otherwise, use the Avro
+          // type toString().
+          if (option.getType() == Schema.Type.RECORD
+              || option.getType() == Schema.Type.ENUM
+              || option.getType() == Schema.Type.FIXED) {
+            fields.add(
+                MappedField.of(
+                    struct.fields().get(index).fieldId(),
+                    option.getName(),
+                    optionResults.get(index)));
+          } else {
+            fields.add(
+                MappedField.of(
+                    struct.fields().get(index).fieldId(),
+                    option.toString(),

Review Comment:
   I think this should be `option.getType().getName()` instead of `toString`. Using `toString` could cause problems when a type has extra metadata. We want to use the type name specifically, so we shouldn't rely on the schema's representation being the same as the type name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Avro: Add Avro-assisted name mapping [iceberg]

Posted by "wmoustafa (via GitHub)" <gi...@apache.org>.
wmoustafa commented on code in PR #7392:
URL: https://github.com/apache/iceberg/pull/7392#discussion_r1353271608


##########
core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerByStructureVisitor.java:
##########
@@ -93,14 +94,23 @@ private static <P, T> T visitRecord(
   private static <P, T> T visitUnion(
       P type, Schema union, AvroWithPartnerByStructureVisitor<P, T> visitor) {
     List<Schema> types = union.getTypes();
-    Preconditions.checkArgument(
-        AvroSchemaUtil.isOptionSchema(union), "Cannot visit non-option union: %s", union);
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit(visitor.nullType(), branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit(visitor.nullType(), branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else {
+      List<Schema> nonNullTypes =
+          types.stream().filter(t -> t.getType() != Schema.Type.NULL).collect(Collectors.toList());
+      for (int i = 0; i < nonNullTypes.size(); i++) {
+        // In the case of complex union, the corresponding "type" is a struct. Non-null type i in
+        // the union maps to struct filed i + 1 because the first struct field is the "tag".
+        options.add(
+            visit(visitor.fieldNameAndType(type, i + 1).second(), nonNullTypes.get(i), visitor));

Review Comment:
   Visited null but did not add to the returned options since it does not correspond to a struct field.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org