You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/02 18:29:05 UTC

[GitHub] [iceberg] funcheetah commented on a diff in pull request #4242: Support non-optional union types for Avro

funcheetah commented on code in PR #4242:
URL: https://github.com/apache/iceberg/pull/4242#discussion_r863071723


##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/data/SparkValueReaders.java:
##########
@@ -81,6 +83,10 @@ static ValueReader<InternalRow> struct(List<ValueReader<?>> readers, Types.Struc
     return new StructReader(readers, struct, idToConstant);
   }
 
+  static ValueReader<InternalRow> union(Schema schema, List<ValueReader<?>> readers) {
+    return new UnionReader(schema, readers);

Review Comment:
   Thanks for the suggestions. Refactored as suggested.



##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaVisitor.java:
##########
@@ -52,8 +52,21 @@ public static <T> T visit(Schema schema, AvroSchemaVisitor<T> visitor) {
       case UNION:
         List<Schema> types = schema.getTypes();
         List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-        for (Schema type : types) {
-          options.add(visit(type, visitor));
+        if (AvroSchemaUtil.isOptionSchema(schema)) {
+          for (Schema type : types) {
+            options.add(visit(type, visitor));
+          }
+        } else {
+          // complex union case
+          int idx = 0;

Review Comment:
   Thanks for the suggestion. Refactored the variable name to `nonNullIdx`



##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/data/SparkValueReaders.java:
##########
@@ -285,4 +291,58 @@ protected void set(InternalRow struct, int pos, Object value) {
       }
     }
   }
+
+  private static class UnionReader implements ValueReader<InternalRow> {
+    private final Schema schema;
+    private final ValueReader[] readers;
+
+    private UnionReader(Schema schema, List<ValueReader<?>> readers) {
+      this.schema = schema;
+      this.readers = new ValueReader[readers.size()];
+      for (int i = 0; i < this.readers.length; i += 1) {
+        this.readers[i] = readers.get(i);
+      }
+    }
+
+    @Override
+    public InternalRow read(Decoder decoder, Object reuse) throws IOException {
+      // first we need to filter out NULL alternative if it exists in the union schema
+      int nullIndex = -1;
+      List<Schema> alts = schema.getTypes();

Review Comment:
   Thanks for the suggestions. Refactored as suggested.



##########
core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithTypeVisitor.java:
##########
@@ -79,11 +79,31 @@ private static <T> T visitRecord(Types.StructType struct, Schema record, AvroSch
   private static <T> T visitUnion(Type type, Schema union, AvroSchemaWithTypeVisitor<T> visitor) {
     List<Schema> types = union.getTypes();
     List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit((Type) null, branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
+
+    // simple union case
+    if (AvroSchemaUtil.isOptionSchema(union)) {
+      for (Schema branch : types) {
+        if (branch.getType() == Schema.Type.NULL) {
+          options.add(visit((Type) null, branch, visitor));
+        } else {
+          options.add(visit(type, branch, visitor));
+        }
+      }
+    } else { // complex union case
+      Preconditions.checkArgument(type instanceof Types.StructType,
+          "Cannot visit invalid Iceberg type: %s for Avro complex union type: %s", type, union);
+
+
+      int index = 1;

Review Comment:
   Thanks for the suggestions. Added comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org