You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/23 04:12:28 UTC

[GitHub] [iceberg] JingsongLi opened a new pull request #1235: avro: Abstract AvroWithPartnerSchemaVisitor

JingsongLi opened a new pull request #1235:
URL: https://github.com/apache/iceberg/pull/1235


   Abstract `AvroWithPartnerSchemaVisitor` to extract specific avro logical from `AvroSchemaWithTypeVisitor` and `AvroWithSparkSchemaVisitor`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1235: avro: Abstract AvroWithPartnerSchemaVisitor

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1235:
URL: https://github.com/apache/iceberg/pull/1235#issuecomment-664048809


   Thanks @JingsongLi! This looks like a good thing to do, but I would keep the two cases (id- or structure-based traversal) separate to simplify implementations and make it easier to read.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #1235: avro: Abstract AvroWithPartnerSchemaVisitor

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #1235:
URL: https://github.com/apache/iceberg/pull/1235#discussion_r459921033



##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerSchemaVisitor.java
##########
@@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.avro;
+
+import java.util.Deque;
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+
+/**
+ * A abstract avro schema visitor with partner type. This class is for both reading and writing:
+ * - For reading, the avro schema could evolve into the partner type. (schema evolution)
+ * - For writing, the avro schema should be consistent with partner type.
+ *
+ * @param <P> Partner type.
+ * @param <T> Return T.
+ */
+public abstract class AvroWithPartnerSchemaVisitor<P, T> {
+
+  public static <P, T> T visit(P partner, Schema schema, AvroWithPartnerSchemaVisitor<P, T> visitor) {
+    switch (schema.getType()) {
+      case RECORD:
+        return visitRecord(partner, schema, visitor);
+
+      case UNION:
+        return visitUnion(partner, schema, visitor);
+
+      case ARRAY:
+        return visitArray(partner, schema, visitor);
+
+      case MAP:
+        P keyType = visitor.mapKeyType(partner);
+        Preconditions.checkArgument(
+            visitor.isValidMapKey(keyType),
+            "Invalid map: %s is not a string", keyType);
+        return visitor.map(partner, schema, visit(visitor.mapValueType(partner), schema.getValueType(), visitor));
+
+      default:
+        return visitor.primitive(partner, schema);
+    }
+  }
+
+  // ---------------------------------- Static helpers ---------------------------------------------
+
+  private static <P, T> T visitRecord(P struct, Schema record, AvroWithPartnerSchemaVisitor<P, T> visitor) {
+    // check to make sure this hasn't been visited before
+    String name = record.getFullName();
+    Preconditions.checkState(!visitor.recordLevels.contains(name),
+        "Cannot process recursive Avro record %s", name);
+    List<Schema.Field> fields = record.getFields();
+    visitor.recordLevels.push(name);
+
+    List<String> names = Lists.newArrayListWithExpectedSize(fields.size());
+    List<T> results = Lists.newArrayListWithExpectedSize(fields.size());
+
+    if (visitor.schemaEvolution()) {
+      for (Schema.Field field : fields) {
+        int fieldId = AvroSchemaUtil.getFieldId(field);
+        names.add(field.name());
+        results.add(visit(visitor.structFieldTypeById(struct, fieldId), field.schema(), visitor));
+      }
+    } else {
+      String[] fieldNames = visitor.structFieldNames(struct);
+      P[] fieldTypes = visitor.structFieldTypes(struct);
+      Preconditions.checkArgument(fieldTypes.length == fields.size(),
+          "Structs do not match: %s != %s", struct, record);
+      for (int i = 0; i < fieldTypes.length; i += 1) {
+        String fieldName = fieldNames[i];
+        Schema.Field field = fields.get(i);
+        Preconditions.checkArgument(AvroSchemaUtil.makeCompatibleName(fieldName).equals(field.name()),
+            "Structs do not match: field %s != %s", fieldName, field.name());
+        results.add(visit(fieldTypes[i], field.schema(), visitor));

Review comment:
       So the difference between this sub field visit and the previous sub field [visit](https://github.com/apache/iceberg/pull/1235/files#diff-e12cc7b2561a2a909b41535f4eb86a04R78) is:  we use the different methods to get the data type of inner field.   I think we don't need both the structFieldTypeById and structFieldTypes,   is it possible to abstract them to be one method and then we could remove the `if (visitor.schemaEvolution()) {} else {...}` finally ? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1235: avro: Abstract AvroWithPartnerSchemaVisitor

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1235:
URL: https://github.com/apache/iceberg/pull/1235#discussion_r460570755



##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroSchemaWithTypeVisitor.java
##########
@@ -19,117 +19,57 @@
 
 package org.apache.iceberg.avro;
 
-import java.util.Deque;
-import java.util.List;
 import org.apache.avro.Schema;
-import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
-import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 import org.apache.iceberg.types.Type;
 import org.apache.iceberg.types.Types;
 
-public abstract class AvroSchemaWithTypeVisitor<T> {
+/**
+ * Avro {@link Schema} with expected Iceberg schema visitor. See {@link #schemaEvolution}, this class is for schema
+ * evolution reading. The avro schema can evolve into the expected Iceberg type.
+ */
+public abstract class AvroSchemaWithTypeVisitor<T> extends AvroWithPartnerSchemaVisitor<Type, T> {
   public static <T> T visit(org.apache.iceberg.Schema iSchema, Schema schema, AvroSchemaWithTypeVisitor<T> visitor) {
     return visit(iSchema.asStruct(), schema, visitor);
   }
 
-  public static <T> T visit(Type iType, Schema schema, AvroSchemaWithTypeVisitor<T> visitor) {
-    switch (schema.getType()) {
-      case RECORD:
-        return visitRecord(iType != null ? iType.asStructType() : null, schema, visitor);
-
-      case UNION:
-        return visitUnion(iType, schema, visitor);
-
-      case ARRAY:
-        return visitArray(iType, schema, visitor);
-
-      case MAP:
-        Types.MapType map = iType != null ? iType.asMapType() : null;
-        return visitor.map(map, schema,
-            visit(map != null ? map.valueType() : null, schema.getValueType(), visitor));
-
-      default:
-        return visitor.primitive(iType != null ? iType.asPrimitiveType() : null, schema);
-    }
-  }
-
-  private static <T> T visitRecord(Types.StructType struct, Schema record, AvroSchemaWithTypeVisitor<T> visitor) {
-    // check to make sure this hasn't been visited before
-    String name = record.getFullName();
-    Preconditions.checkState(!visitor.recordLevels.contains(name),
-        "Cannot process recursive Avro record %s", name);
-
-    visitor.recordLevels.push(name);
-
-    List<Schema.Field> fields = record.getFields();
-    List<String> names = Lists.newArrayListWithExpectedSize(fields.size());
-    List<T> results = Lists.newArrayListWithExpectedSize(fields.size());
-    for (Schema.Field field : fields) {
-      int fieldId = AvroSchemaUtil.getFieldId(field);
-      Types.NestedField iField = struct != null ? struct.field(fieldId) : null;
-      names.add(field.name());
-      results.add(visit(iField != null ? iField.type() : null, field.schema(), visitor));
-    }
-
-    visitor.recordLevels.pop();
-
-    return visitor.record(struct, record, names, results);
-  }
-
-  private static <T> T visitUnion(Type type, Schema union, AvroSchemaWithTypeVisitor<T> visitor) {
-    List<Schema> types = union.getTypes();
-    List<T> options = Lists.newArrayListWithExpectedSize(types.size());
-    for (Schema branch : types) {
-      if (branch.getType() == Schema.Type.NULL) {
-        options.add(visit((Type) null, branch, visitor));
-      } else {
-        options.add(visit(type, branch, visitor));
-      }
-    }
-    return visitor.union(type, union, options);
+  @Override
+  public boolean schemaEvolution() {
+    return true;
   }
 
-  private static <T> T visitArray(Type type, Schema array, AvroSchemaWithTypeVisitor<T> visitor) {
-    if (array.getLogicalType() instanceof LogicalMap || (type != null && type.isMapType())) {
-      Preconditions.checkState(
-          AvroSchemaUtil.isKeyValueSchema(array.getElementType()),
-          "Cannot visit invalid logical map type: %s", array);
-      Types.MapType map = type != null ? type.asMapType() : null;
-      List<Schema.Field> keyValueFields = array.getElementType().getFields();
-      return visitor.map(map, array,
-          visit(map != null ? map.keyType() : null, keyValueFields.get(0).schema(), visitor),
-          visit(map != null ? map.valueType() : null, keyValueFields.get(1).schema(), visitor));
-
-    } else {
-      Types.ListType list = type != null ? type.asListType() : null;
-      return visitor.array(list, array,
-          visit(list != null ? list.elementType() : null, array.getElementType(), visitor));
-    }
+  @Override
+  public boolean isMapType(Type type) {
+    return type != null && type.isMapType();
   }
 
-  private Deque<String> recordLevels = Lists.newLinkedList();
-
-  public T record(Types.StructType iStruct, Schema record, List<String> names, List<T> fields) {
-    return null;
+  @Override
+  public boolean isValidMapKey(Type type) {
+    return type == null || type instanceof Types.StringType;
   }
 
-  public T union(Type iType, Schema union, List<T> options) {
-    return null;
+  @Override
+  public Type arrayElementType(Type arrayType) {
+    return arrayType == null ? null : arrayType.asListType().elementType();
   }
 
-  public T array(Types.ListType iList, Schema array, T element) {
-    return null;
+  @Override
+  public Type mapKeyType(Type mapType) {
+    return mapType == null ? null : mapType.asMapType().keyType();
   }
 
-  public T map(Types.MapType iMap, Schema map, T key, T value) {
-    return null;
+  @Override
+  public Type mapValueType(Type mapType) {
+    return mapType == null ? null : mapType.asMapType().valueType();

Review comment:
       If this defined `isNullType`, then the visit method could do these checks instead:
   
   ```java
     if (isNullType(mapType)) {
       return nullType();
     } else {
       return mapValueType(mapType);
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1235: avro: Abstract AvroWithPartnerSchemaVisitor

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1235:
URL: https://github.com/apache/iceberg/pull/1235#discussion_r460581572



##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerSchemaVisitor.java
##########
@@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.avro;
+
+import java.util.Deque;
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+
+/**
+ * A abstract avro schema visitor with partner type. This class is for both reading and writing:
+ * - For reading, the avro schema could evolve into the partner type. (schema evolution)
+ * - For writing, the avro schema should be consistent with partner type.
+ *
+ * @param <P> Partner type.
+ * @param <T> Return T.
+ */
+public abstract class AvroWithPartnerSchemaVisitor<P, T> {
+
+  public static <P, T> T visit(P partner, Schema schema, AvroWithPartnerSchemaVisitor<P, T> visitor) {
+    switch (schema.getType()) {
+      case RECORD:
+        return visitRecord(partner, schema, visitor);
+
+      case UNION:
+        return visitUnion(partner, schema, visitor);
+
+      case ARRAY:
+        return visitArray(partner, schema, visitor);
+
+      case MAP:
+        P keyType = visitor.mapKeyType(partner);
+        Preconditions.checkArgument(
+            visitor.isValidMapKey(keyType),
+            "Invalid map: %s is not a string", keyType);
+        return visitor.map(partner, schema, visit(visitor.mapValueType(partner), schema.getValueType(), visitor));
+
+      default:
+        return visitor.primitive(partner, schema);
+    }
+  }
+
+  // ---------------------------------- Static helpers ---------------------------------------------
+
+  private static <P, T> T visitRecord(P struct, Schema record, AvroWithPartnerSchemaVisitor<P, T> visitor) {
+    // check to make sure this hasn't been visited before
+    String name = record.getFullName();
+    Preconditions.checkState(!visitor.recordLevels.contains(name),
+        "Cannot process recursive Avro record %s", name);
+    List<Schema.Field> fields = record.getFields();
+    visitor.recordLevels.push(name);
+
+    List<String> names = Lists.newArrayListWithExpectedSize(fields.size());
+    List<T> results = Lists.newArrayListWithExpectedSize(fields.size());
+
+    if (visitor.schemaEvolution()) {
+      for (Schema.Field field : fields) {
+        int fieldId = AvroSchemaUtil.getFieldId(field);
+        names.add(field.name());
+        results.add(visit(visitor.structFieldTypeById(struct, fieldId), field.schema(), visitor));
+      }
+    } else {
+      String[] fieldNames = visitor.structFieldNames(struct);
+      P[] fieldTypes = visitor.structFieldTypes(struct);
+      Preconditions.checkArgument(fieldTypes.length == fields.size(),
+          "Structs do not match: %s != %s", struct, record);
+      for (int i = 0; i < fieldTypes.length; i += 1) {
+        String fieldName = fieldNames[i];
+        Schema.Field field = fields.get(i);
+        Preconditions.checkArgument(AvroSchemaUtil.makeCompatibleName(fieldName).equals(field.name()),
+            "Structs do not match: field %s != %s", fieldName, field.name());
+        results.add(visit(fieldTypes[i], field.schema(), visitor));
+      }
+    }
+
+    visitor.recordLevels.pop();
+
+    return visitor.record(struct, record, names, results);
+  }
+
+  private static <P, T> T visitUnion(P type, Schema union, AvroWithPartnerSchemaVisitor<P, T> visitor) {
+    List<Schema> types = union.getTypes();
+    Preconditions.checkArgument(AvroSchemaUtil.isOptionSchema(union),
+        "Cannot visit non-option union: %s", union);
+    List<T> options = Lists.newArrayListWithExpectedSize(types.size());
+    for (Schema branch : types) {
+      if (branch.getType() == Schema.Type.NULL) {
+        options.add(visit(visitor.nullType(), branch, visitor));
+      } else {
+        options.add(visit(type, branch, visitor));
+      }
+    }
+    return visitor.union(type, union, options);
+  }
+
+  private static <P, T> T visitArray(P type, Schema array, AvroWithPartnerSchemaVisitor<P, T> visitor) {
+    if (array.getLogicalType() instanceof LogicalMap || visitor.isMapType(type)) {
+      Preconditions.checkState(
+          AvroSchemaUtil.isKeyValueSchema(array.getElementType()),
+          "Cannot visit invalid logical map type: %s", array);
+      List<Schema.Field> keyValueFields = array.getElementType().getFields();
+      return visitor.map(type, array,
+          visit(visitor.mapKeyType(type), keyValueFields.get(0).schema(), visitor),
+          visit(visitor.mapValueType(type), keyValueFields.get(1).schema(), visitor));
+
+    } else {
+      return visitor.array(type, array, visit(visitor.arrayElementType(type), array.getElementType(), visitor));
+    }
+  }
+
+  /**
+   * Just for checking state.
+   */
+  private Deque<String> recordLevels = Lists.newLinkedList();
+
+  // ---------------------------------- Partner type methods ---------------------------------------------
+
+  public boolean schemaEvolution() {
+    return false;
+  }
+
+  public abstract boolean isMapType(P type);
+
+  public abstract boolean isValidMapKey(P type);
+
+  public abstract P arrayElementType(P arrayType);
+
+  public abstract P mapKeyType(P mapType);
+  public abstract P mapValueType(P mapType);
+
+  public String[] structFieldNames(P structType) {
+    throw new UnsupportedOperationException();
+  }
+
+  public P[] structFieldTypes(P structType) {

Review comment:
       Arrays are a bit difficult to work with. For the structure-based traversal, how about combining `structFieldNames` and `structFieldTypes` into a single method indexed by position in the struct?
   
   ```java
     Pair<String, P> fieldNameAndType(P structType, int pos);
   ```
   
   That corresponds to the `fieldType(int id)` for the id-based lookup.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JingsongLi commented on pull request #1235: avro: Abstract AvroWithPartnerSchemaVisitor

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on pull request #1235:
URL: https://github.com/apache/iceberg/pull/1235#issuecomment-664820529


   > Thanks @JingsongLi! This looks like a good thing to do, but I would keep the two cases (id- or structure-based traversal) separate to simplify implementations and make it easier to read.
   
   Thanks @rdblue for your review, I think we can keep `AvroSchemaWithTypeVisitor` as it is (No other implementation), and introduce `AvroWithPartnerByStructureVisitor` for Flink implementation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1235: avro: Abstract AvroWithPartnerSchemaVisitor

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1235:
URL: https://github.com/apache/iceberg/pull/1235


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1235: avro: Abstract AvroWithPartnerSchemaVisitor

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1235:
URL: https://github.com/apache/iceberg/pull/1235#discussion_r460581241



##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerSchemaVisitor.java
##########
@@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.avro;
+
+import java.util.Deque;
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+
+/**
+ * A abstract avro schema visitor with partner type. This class is for both reading and writing:
+ * - For reading, the avro schema could evolve into the partner type. (schema evolution)
+ * - For writing, the avro schema should be consistent with partner type.
+ *
+ * @param <P> Partner type.
+ * @param <T> Return T.

Review comment:
       I like that this standardizes the logic to traverse a schema with a partner, but I don't think that it makes sense to mix the two cases together into a single class for a few reasons:
   
   * The meaning of `schemaEvolution` is hard to understand. The case where the schemas must have the same structure is more related to when we don't have IDs for one type of schema, like Spark. In that case, we rely on the structure matching exactly and are guaranteed that because both schemas are derived from the same Iceberg schema. We prefer to match up schemas by ID, even for the write path, but require it for the read path (because of evolution as you correctly noted).
   * It isn't clear which methods should be implemented for a visitor. Even if we added documentation, that's not going to be as easy to understand as having two types, one for traversing by IDs and the other for traversing by structure.
   * There isn't much benefit to sharing because record visiting is very different between cases. The union, array, and map methods aren't very complicated.
   
   I think it would be better to have the two cases broken out into `AvroWithPartnerByIDVisitor` and `AvroWithPartnerByStructureVisitor`. Then it is clear what needs to be implemented in both cases and there is no `schemaEvolution` flag.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] JingsongLi commented on a change in pull request #1235: avro: Abstract AvroWithPartnerSchemaVisitor

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on a change in pull request #1235:
URL: https://github.com/apache/iceberg/pull/1235#discussion_r461335501



##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerSchemaVisitor.java
##########
@@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.avro;
+
+import java.util.Deque;
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+
+/**
+ * A abstract avro schema visitor with partner type. This class is for both reading and writing:
+ * - For reading, the avro schema could evolve into the partner type. (schema evolution)
+ * - For writing, the avro schema should be consistent with partner type.
+ *
+ * @param <P> Partner type.
+ * @param <T> Return T.

Review comment:
       +1
   I also tangled for a long time, it is more confusing to put them together by force.

##########
File path: core/src/main/java/org/apache/iceberg/avro/AvroWithPartnerSchemaVisitor.java
##########
@@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.avro;
+
+import java.util.Deque;
+import java.util.List;
+import org.apache.avro.Schema;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+
+/**
+ * A abstract avro schema visitor with partner type. This class is for both reading and writing:
+ * - For reading, the avro schema could evolve into the partner type. (schema evolution)
+ * - For writing, the avro schema should be consistent with partner type.
+ *
+ * @param <P> Partner type.
+ * @param <T> Return T.
+ */
+public abstract class AvroWithPartnerSchemaVisitor<P, T> {
+
+  public static <P, T> T visit(P partner, Schema schema, AvroWithPartnerSchemaVisitor<P, T> visitor) {
+    switch (schema.getType()) {
+      case RECORD:
+        return visitRecord(partner, schema, visitor);
+
+      case UNION:
+        return visitUnion(partner, schema, visitor);
+
+      case ARRAY:
+        return visitArray(partner, schema, visitor);
+
+      case MAP:
+        P keyType = visitor.mapKeyType(partner);
+        Preconditions.checkArgument(
+            visitor.isValidMapKey(keyType),
+            "Invalid map: %s is not a string", keyType);
+        return visitor.map(partner, schema, visit(visitor.mapValueType(partner), schema.getValueType(), visitor));
+
+      default:
+        return visitor.primitive(partner, schema);
+    }
+  }
+
+  // ---------------------------------- Static helpers ---------------------------------------------
+
+  private static <P, T> T visitRecord(P struct, Schema record, AvroWithPartnerSchemaVisitor<P, T> visitor) {
+    // check to make sure this hasn't been visited before
+    String name = record.getFullName();
+    Preconditions.checkState(!visitor.recordLevels.contains(name),
+        "Cannot process recursive Avro record %s", name);
+    List<Schema.Field> fields = record.getFields();
+    visitor.recordLevels.push(name);
+
+    List<String> names = Lists.newArrayListWithExpectedSize(fields.size());
+    List<T> results = Lists.newArrayListWithExpectedSize(fields.size());
+
+    if (visitor.schemaEvolution()) {
+      for (Schema.Field field : fields) {
+        int fieldId = AvroSchemaUtil.getFieldId(field);
+        names.add(field.name());
+        results.add(visit(visitor.structFieldTypeById(struct, fieldId), field.schema(), visitor));
+      }
+    } else {
+      String[] fieldNames = visitor.structFieldNames(struct);
+      P[] fieldTypes = visitor.structFieldTypes(struct);
+      Preconditions.checkArgument(fieldTypes.length == fields.size(),
+          "Structs do not match: %s != %s", struct, record);
+      for (int i = 0; i < fieldTypes.length; i += 1) {
+        String fieldName = fieldNames[i];
+        Schema.Field field = fields.get(i);
+        Preconditions.checkArgument(AvroSchemaUtil.makeCompatibleName(fieldName).equals(field.name()),
+            "Structs do not match: field %s != %s", fieldName, field.name());
+        results.add(visit(fieldTypes[i], field.schema(), visitor));

Review comment:
       The core difference is matching up schemas by ID or not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org