You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/07 17:33:36 UTC

[GitHub] [iceberg] yiqiangin commented on a diff in pull request #5704: Support non-optional union types and column projection in complex union for Avro

yiqiangin commented on code in PR #5704:
URL: https://github.com/apache/iceberg/pull/5704#discussion_r990346173


##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithTypeVisitor.java:
##########
@@ -82,11 +83,45 @@ private static <T> T visitRecord(
   private static <T> T visitUnion(Type type, Schema union, AvroSchemaWithTypeVisitor<T> visitor) {
     List<Schema> types = union.getTypes();
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit((Type) null, branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+
+    // simple union case
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit((Type) null, branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else { // complex union case
+      Preconditions.checkArgument(
+          type instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s",
+          type,
+          union);
+      Map<String, Integer> fieldNameToId =
+          (Map) union.getObjectProp(SchemaToType.AVRO_FIELD_NAME_TO_ICEBERG_ID);
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit((Type) null, branch, visitor));
+        } else {
+          String name =
+              branch.getType().equals(Schema.Type.RECORD)
+                  ? branch.getName()
+                  : branch.getType().getName();
+          if (fieldNameToId.containsKey(name)) {
+            int fieldId = fieldNameToId.get(name);
+            Types.NestedField branchType = type.asStructType().field(fieldId);
+            if (branchType != null) {
+              options.add(visit(branchType.type(), branch, visitor));
+            } else {
+              Type pseudoBranchType = AvroSchemaUtil.convert(branch);
+              options.add(visit(pseudoBranchType, branch, visitor));
+            }
+          } else {
+            options.add(visit((Type) null, branch, visitor));
+          }
+        }
       }
     }
     return visitor.union(type, union, options);

Review Comment:
   I am afraid not. The readers for all branch types need to pass into `UnionReader`, as `UnionReader` needs to read all types of data from the union in Avro file to read the all the records successfully. Filtering the data based on the types projected in expected Iceberg schema can only happens after the data are read from Avro file  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org