You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/09 15:19:58 UTC

[GitHub] [hudi] lw309637554 opened a new pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

lw309637554 opened a new pull request #2160:
URL: https://github.com/apache/hudi/pull/2160


   …Schema transformations,skip the extra hop to parquet schema
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Hive Sync integration would resort to the following translations for finding table schema 
   Avro-Schema to Parquet-Schema to Hive Schema transformations
   We need to implement logic to skip the extra hop to parquet schema when generating hive schema. 
   
   This is more as a cleanup and  to standardize hive schema syncing.
   It also helps to keep hive schema syncing standardized in future when we support other types like ORC.
   
   
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#issuecomment-777649047


   @lw309637554 : Can you please check the feedback and address them. would be nice to have this in. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 commented on a change in pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on a change in pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#discussion_r512377446



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##########
@@ -155,113 +150,75 @@ private static boolean isFieldExistsInSchema(Map<String, String> newTableSchema,
    * @param parquetType : Single paruet field
    * @return : Equivalent sHive schema
    */
-  private static String convertField(final Type parquetType) {
+  private static String convertFieldFromAvro(final Schema schema) {
     StringBuilder field = new StringBuilder();
-    if (parquetType.isPrimitive()) {
-      final PrimitiveType.PrimitiveTypeName parquetPrimitiveTypeName =
-          parquetType.asPrimitiveType().getPrimitiveTypeName();
-      final OriginalType originalType = parquetType.getOriginalType();
-      if (originalType == OriginalType.DECIMAL) {
-        final DecimalMetadata decimalMetadata = parquetType.asPrimitiveType().getDecimalMetadata();
-        return field.append("DECIMAL(").append(decimalMetadata.getPrecision()).append(" , ")
-            .append(decimalMetadata.getScale()).append(")").toString();
-      } else if (originalType == OriginalType.DATE) {
+    Schema.Type type = schema.getType();
+    LogicalType logicalType = schema.getLogicalType();
+    if (logicalType != null) {
+      if (logicalType instanceof LogicalTypes.Decimal) {
+        return field.append("DECIMAL(").append(((LogicalTypes.Decimal) logicalType).getPrecision()).append(" , ")
+            .append(((LogicalTypes.Decimal) logicalType).getScale()).append(")").toString();
+      } else if (logicalType instanceof LogicalTypes.Date) {
         return field.append("DATE").toString();
+      } else {
+        Log.info("not handle the type transform");
       }
-      // TODO - fix the method naming here
-      return parquetPrimitiveTypeName.convert(new PrimitiveType.PrimitiveTypeNameConverter<String, RuntimeException>() {
-        @Override
-        public String convertBOOLEAN(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "boolean";
-        }
-
-        @Override
-        public String convertINT32(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "int";
-        }
-
-        @Override
-        public String convertINT64(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "bigint";
-        }
-
-        @Override
-        public String convertINT96(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "timestamp-millis";
-        }
-
-        @Override
-        public String convertFLOAT(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "float";
-        }
-
-        @Override
-        public String convertDOUBLE(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "double";
-        }
-
-        @Override
-        public String convertFIXED_LEN_BYTE_ARRAY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "binary";
-        }
-
-        @Override
-        public String convertBINARY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          if (originalType == OriginalType.UTF8 || originalType == OriginalType.ENUM) {
-            return "string";
-          } else {
-            return "binary";
-          }
+    }
+    if (type.equals(Schema.Type.BOOLEAN)) {

Review comment:
       I think this is a good case where we should use switch case like how it was getting used, instead of if else.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#issuecomment-713266011


   @umehrot2 are you able to revieew this? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 commented on a change in pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on a change in pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#discussion_r513795592



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##########
@@ -155,113 +150,75 @@ private static boolean isFieldExistsInSchema(Map<String, String> newTableSchema,
    * @param parquetType : Single paruet field
    * @return : Equivalent sHive schema
    */
-  private static String convertField(final Type parquetType) {
+  private static String convertFieldFromAvro(final Schema schema) {
     StringBuilder field = new StringBuilder();
-    if (parquetType.isPrimitive()) {
-      final PrimitiveType.PrimitiveTypeName parquetPrimitiveTypeName =
-          parquetType.asPrimitiveType().getPrimitiveTypeName();
-      final OriginalType originalType = parquetType.getOriginalType();
-      if (originalType == OriginalType.DECIMAL) {
-        final DecimalMetadata decimalMetadata = parquetType.asPrimitiveType().getDecimalMetadata();
-        return field.append("DECIMAL(").append(decimalMetadata.getPrecision()).append(" , ")
-            .append(decimalMetadata.getScale()).append(")").toString();
-      } else if (originalType == OriginalType.DATE) {
+    Schema.Type type = schema.getType();
+    LogicalType logicalType = schema.getLogicalType();
+    if (logicalType != null) {
+      if (logicalType instanceof LogicalTypes.Decimal) {
+        return field.append("DECIMAL(").append(((LogicalTypes.Decimal) logicalType).getPrecision()).append(" , ")
+            .append(((LogicalTypes.Decimal) logicalType).getScale()).append(")").toString();
+      } else if (logicalType instanceof LogicalTypes.Date) {
         return field.append("DATE").toString();
+      } else {
+        Log.info("not handle the type transform");
       }
-      // TODO - fix the method naming here
-      return parquetPrimitiveTypeName.convert(new PrimitiveType.PrimitiveTypeNameConverter<String, RuntimeException>() {
-        @Override
-        public String convertBOOLEAN(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "boolean";
-        }
-
-        @Override
-        public String convertINT32(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "int";
-        }
-
-        @Override
-        public String convertINT64(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "bigint";
-        }
-
-        @Override
-        public String convertINT96(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "timestamp-millis";
-        }
-
-        @Override
-        public String convertFLOAT(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "float";
-        }
-
-        @Override
-        public String convertDOUBLE(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "double";
-        }
-
-        @Override
-        public String convertFIXED_LEN_BYTE_ARRAY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "binary";
-        }
-
-        @Override
-        public String convertBINARY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          if (originalType == OriginalType.UTF8 || originalType == OriginalType.ENUM) {
-            return "string";
-          } else {
-            return "binary";
-          }
+    }
+    if (type.equals(Schema.Type.BOOLEAN)) {
+      return field.append("boolean").toString();
+    } else if (type.equals(Schema.Type.INT)) {
+      return field.append("int").toString();
+    } else if (type.equals(Schema.Type.LONG)) {
+      return field.append("bigint").toString();
+    } else if (type.equals(Schema.Type.FLOAT)) {
+      return field.append("float").toString();
+    } else if (type.equals(Schema.Type.DOUBLE)) {
+      return field.append("double").toString();
+    } else if (type.equals(Schema.Type.BYTES)) {
+      return field.append("binary").toString();
+    } else if (type.equals(Schema.Type.STRING)) {
+      return field.append("string").toString();
+    } else if (type.equals(Schema.Type.RECORD)) {
+      List<Pair<String, Schema>> noNullSchemaFields = new ArrayList<>();
+      for (Schema.Field fieldItem : schema.getFields()) {
+        if (fieldItem.schema().getType().equals(Schema.Type.NULL)) {
+          continue; // Avro nulls are not encoded, unless they are null unions
         }
-      });
-    } else {
-      GroupType parquetGroupType = parquetType.asGroupType();
-      OriginalType originalType = parquetGroupType.getOriginalType();
-      if (originalType != null) {
-        switch (originalType) {
-          case LIST:
-            if (parquetGroupType.getFieldCount() != 1) {
-              throw new UnsupportedOperationException("Invalid list type " + parquetGroupType);
-            }
-            Type elementType = parquetGroupType.getType(0);
-            if (!elementType.isRepetition(Type.Repetition.REPEATED)) {
-              throw new UnsupportedOperationException("Invalid list type " + parquetGroupType);
-            }
-            return createHiveArray(elementType, parquetGroupType.getName());
-          case MAP:
-            if (parquetGroupType.getFieldCount() != 1 || parquetGroupType.getType(0).isPrimitive()) {
-              throw new UnsupportedOperationException("Invalid map type " + parquetGroupType);
-            }
-            GroupType mapKeyValType = parquetGroupType.getType(0).asGroupType();
-            if (!mapKeyValType.isRepetition(Type.Repetition.REPEATED)
-                || !mapKeyValType.getOriginalType().equals(OriginalType.MAP_KEY_VALUE)
-                || mapKeyValType.getFieldCount() != 2) {
-              throw new UnsupportedOperationException("Invalid map type " + parquetGroupType);
-            }
-            Type keyType = mapKeyValType.getType(0);
-            if (!keyType.isPrimitive()
-                || !keyType.asPrimitiveType().getPrimitiveTypeName().equals(PrimitiveType.PrimitiveTypeName.BINARY)
-                || !keyType.getOriginalType().equals(OriginalType.UTF8)) {
-              throw new UnsupportedOperationException("Map key type must be binary (UTF8): " + keyType);
-            }
-            Type valueType = mapKeyValType.getType(1);
-            return createHiveMap(convertField(keyType), convertField(valueType));
-          case ENUM:
-          case UTF8:
-            return "string";
-          case MAP_KEY_VALUE:
-            // MAP_KEY_VALUE was supposed to be used to annotate key and
-            // value group levels in a
-            // MAP. However, that is always implied by the structure of
-            // MAP. Hence, PARQUET-113
-            // dropped the requirement for having MAP_KEY_VALUE.
-          default:
-            throw new UnsupportedOperationException("Cannot convert Parquet type " + parquetType);
+        noNullSchemaFields.add(new ImmutablePair<String, Schema>(fieldItem.name(), fieldItem.schema()));
+      }
+      return createHiveStructFromAvro(noNullSchemaFields);
+    } else if (type.equals(Schema.Type.ENUM)) {
+      return field.append("string").toString();
+    } else if (type.equals(Schema.Type.ARRAY)) {
+      Schema elementType = schema.getElementType();
+      String schemaName = schema.getName();

Review comment:
       Perhaps `elementName` would be more appropriate ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 commented on a change in pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on a change in pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#discussion_r512379798



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##########
@@ -155,113 +150,75 @@ private static boolean isFieldExistsInSchema(Map<String, String> newTableSchema,
    * @param parquetType : Single paruet field
    * @return : Equivalent sHive schema
    */
-  private static String convertField(final Type parquetType) {
+  private static String convertFieldFromAvro(final Schema schema) {
     StringBuilder field = new StringBuilder();
-    if (parquetType.isPrimitive()) {
-      final PrimitiveType.PrimitiveTypeName parquetPrimitiveTypeName =
-          parquetType.asPrimitiveType().getPrimitiveTypeName();
-      final OriginalType originalType = parquetType.getOriginalType();
-      if (originalType == OriginalType.DECIMAL) {
-        final DecimalMetadata decimalMetadata = parquetType.asPrimitiveType().getDecimalMetadata();
-        return field.append("DECIMAL(").append(decimalMetadata.getPrecision()).append(" , ")
-            .append(decimalMetadata.getScale()).append(")").toString();
-      } else if (originalType == OriginalType.DATE) {
+    Schema.Type type = schema.getType();
+    LogicalType logicalType = schema.getLogicalType();
+    if (logicalType != null) {
+      if (logicalType instanceof LogicalTypes.Decimal) {
+        return field.append("DECIMAL(").append(((LogicalTypes.Decimal) logicalType).getPrecision()).append(" , ")
+            .append(((LogicalTypes.Decimal) logicalType).getScale()).append(")").toString();
+      } else if (logicalType instanceof LogicalTypes.Date) {
         return field.append("DATE").toString();
+      } else {
+        Log.info("not handle the type transform");
       }
-      // TODO - fix the method naming here
-      return parquetPrimitiveTypeName.convert(new PrimitiveType.PrimitiveTypeNameConverter<String, RuntimeException>() {
-        @Override
-        public String convertBOOLEAN(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "boolean";
-        }
-
-        @Override
-        public String convertINT32(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "int";
-        }
-
-        @Override
-        public String convertINT64(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "bigint";
-        }
-
-        @Override
-        public String convertINT96(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "timestamp-millis";
-        }
-
-        @Override
-        public String convertFLOAT(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "float";
-        }
-
-        @Override
-        public String convertDOUBLE(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "double";
-        }
-
-        @Override
-        public String convertFIXED_LEN_BYTE_ARRAY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "binary";
-        }
-
-        @Override
-        public String convertBINARY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          if (originalType == OriginalType.UTF8 || originalType == OriginalType.ENUM) {
-            return "string";
-          } else {
-            return "binary";
-          }
+    }
+    if (type.equals(Schema.Type.BOOLEAN)) {
+      return field.append("boolean").toString();
+    } else if (type.equals(Schema.Type.INT)) {
+      return field.append("int").toString();
+    } else if (type.equals(Schema.Type.LONG)) {
+      return field.append("bigint").toString();
+    } else if (type.equals(Schema.Type.FLOAT)) {
+      return field.append("float").toString();
+    } else if (type.equals(Schema.Type.DOUBLE)) {
+      return field.append("double").toString();
+    } else if (type.equals(Schema.Type.BYTES)) {
+      return field.append("binary").toString();
+    } else if (type.equals(Schema.Type.STRING)) {
+      return field.append("string").toString();
+    } else if (type.equals(Schema.Type.RECORD)) {
+      List<Pair<String, Schema>> noNullSchemaFields = new ArrayList<>();
+      for (Schema.Field fieldItem : schema.getFields()) {
+        if (fieldItem.schema().getType().equals(Schema.Type.NULL)) {

Review comment:
       I would suggest moving this logic inside `createHiveStructFromAvro` when you have a dedicated function just for that. This is how the code was organized earlier. So you can just pass `List<Schema.Field>` from here to `createHiveStructFromAvro` and let it do the handling.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#discussion_r577552355



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##########
@@ -155,113 +150,75 @@ private static boolean isFieldExistsInSchema(Map<String, String> newTableSchema,
    * @param parquetType : Single paruet field

Review comment:
       minor. fix java docs. param talks about parquet type. 

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##########
@@ -155,113 +150,75 @@ private static boolean isFieldExistsInSchema(Map<String, String> newTableSchema,
    * @param parquetType : Single paruet field
    * @return : Equivalent sHive schema
    */
-  private static String convertField(final Type parquetType) {
+  private static String convertFieldFromAvro(final Schema schema) {
     StringBuilder field = new StringBuilder();
-    if (parquetType.isPrimitive()) {
-      final PrimitiveType.PrimitiveTypeName parquetPrimitiveTypeName =
-          parquetType.asPrimitiveType().getPrimitiveTypeName();
-      final OriginalType originalType = parquetType.getOriginalType();
-      if (originalType == OriginalType.DECIMAL) {
-        final DecimalMetadata decimalMetadata = parquetType.asPrimitiveType().getDecimalMetadata();
-        return field.append("DECIMAL(").append(decimalMetadata.getPrecision()).append(" , ")
-            .append(decimalMetadata.getScale()).append(")").toString();
-      } else if (originalType == OriginalType.DATE) {
+    Schema.Type type = schema.getType();
+    LogicalType logicalType = schema.getLogicalType();
+    if (logicalType != null) {
+      if (logicalType instanceof LogicalTypes.Decimal) {
+        return field.append("DECIMAL(").append(((LogicalTypes.Decimal) logicalType).getPrecision()).append(" , ")
+            .append(((LogicalTypes.Decimal) logicalType).getScale()).append(")").toString();
+      } else if (logicalType instanceof LogicalTypes.Date) {
         return field.append("DATE").toString();
+      } else {
+        Log.info("not handle the type transform");

Review comment:
       do we need INFO logging here? won't this be printed for every primitive type. can you help me understand. 

##########
File path: hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##########
@@ -84,31 +87,32 @@ public void testSchemaConvertArray() throws IOException {
     MessageType schema = Types.buildMessage().optionalGroup().as(OriginalType.LIST).repeatedGroup()

Review comment:
       minor. do we need to fix line 79(old) / 82 (new) for java doc. it links to parquet list. guess we are moving to Avro with this diff. 

##########
File path: hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##########
@@ -84,31 +87,32 @@ public void testSchemaConvertArray() throws IOException {
     MessageType schema = Types.buildMessage().optionalGroup().as(OriginalType.LIST).repeatedGroup()
         .optional(PrimitiveType.PrimitiveTypeName.INT32).named("element").named("list").named("int_list")
         .named("ArrayOfInts");
-
-    String schemaString = HiveSchemaUtil.generateSchemaString(schema);
+    AvroSchemaConverter avroSchemaConverter = new AvroSchemaConverter(new Configuration());

Review comment:
       guess we don't have good coverage of tests for convertFieldFromAvro. All I see are tests for testSchemaConvertArray and testSchemaConvertTimestampMicros. Would you mind adding tests for convertFieldFromAvro to cover all field types (primitives, list, enum, map, nested records)

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##########
@@ -155,113 +150,75 @@ private static boolean isFieldExistsInSchema(Map<String, String> newTableSchema,
    * @param parquetType : Single paruet field
    * @return : Equivalent sHive schema
    */
-  private static String convertField(final Type parquetType) {
+  private static String convertFieldFromAvro(final Schema schema) {

Review comment:
       Did you take inspiration from somewhere to convert Avro schema to hive field? Can you attach any references. 

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##########
@@ -155,113 +150,75 @@ private static boolean isFieldExistsInSchema(Map<String, String> newTableSchema,
    * @param parquetType : Single paruet field
    * @return : Equivalent sHive schema
    */
-  private static String convertField(final Type parquetType) {
+  private static String convertFieldFromAvro(final Schema schema) {
     StringBuilder field = new StringBuilder();
-    if (parquetType.isPrimitive()) {
-      final PrimitiveType.PrimitiveTypeName parquetPrimitiveTypeName =
-          parquetType.asPrimitiveType().getPrimitiveTypeName();
-      final OriginalType originalType = parquetType.getOriginalType();
-      if (originalType == OriginalType.DECIMAL) {
-        final DecimalMetadata decimalMetadata = parquetType.asPrimitiveType().getDecimalMetadata();
-        return field.append("DECIMAL(").append(decimalMetadata.getPrecision()).append(" , ")
-            .append(decimalMetadata.getScale()).append(")").toString();
-      } else if (originalType == OriginalType.DATE) {
+    Schema.Type type = schema.getType();
+    LogicalType logicalType = schema.getLogicalType();
+    if (logicalType != null) {
+      if (logicalType instanceof LogicalTypes.Decimal) {
+        return field.append("DECIMAL(").append(((LogicalTypes.Decimal) logicalType).getPrecision()).append(" , ")
+            .append(((LogicalTypes.Decimal) logicalType).getScale()).append(")").toString();
+      } else if (logicalType instanceof LogicalTypes.Date) {
         return field.append("DATE").toString();
+      } else {
+        Log.info("not handle the type transform");
       }
-      // TODO - fix the method naming here
-      return parquetPrimitiveTypeName.convert(new PrimitiveType.PrimitiveTypeNameConverter<String, RuntimeException>() {
-        @Override
-        public String convertBOOLEAN(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "boolean";
-        }
-
-        @Override
-        public String convertINT32(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "int";
-        }
-
-        @Override
-        public String convertINT64(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "bigint";
-        }
-
-        @Override
-        public String convertINT96(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "timestamp-millis";
-        }
-
-        @Override
-        public String convertFLOAT(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "float";
-        }
-
-        @Override
-        public String convertDOUBLE(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "double";
-        }
-
-        @Override
-        public String convertFIXED_LEN_BYTE_ARRAY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "binary";
-        }
-
-        @Override
-        public String convertBINARY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          if (originalType == OriginalType.UTF8 || originalType == OriginalType.ENUM) {
-            return "string";
-          } else {
-            return "binary";
-          }
+    }
+    if (type.equals(Schema.Type.BOOLEAN)) {

Review comment:
       +1

##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##########
@@ -327,20 +284,14 @@ private static String createHiveMap(String keyType, String valueType) {
   /**
    * Create an Array Hive schema from equivalent parquet list type.
    */
-  private static String createHiveArray(Type elementType, String elementName) {
+  private static String createHiveArrayFromAvro(String schemaName, Schema elementType) {
     StringBuilder array = new StringBuilder();
     array.append("ARRAY< ");
-    if (elementType.isPrimitive()) {
-      array.append(convertField(elementType));
+    if (elementType.getType().equals(Schema.Type.RECORD) && (elementType.getFields().size() == 1
+        && !elementType.getName().equals("array") && !elementType.getName().endsWith("_tuple"))) {

Review comment:
       I am not very conversant with these conversions. But if we can ensure we have full coverage of tests for all scenarios, should be easy to validate if these are required. 

##########
File path: hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##########
@@ -510,7 +514,7 @@ public void testReadSchemaForMOR(boolean useJdbc) throws Exception {
     assertEquals(hiveClientRT.getTableSchema(snapshotTableName).size(),
         SchemaTestUtil.getSimpleSchema().getFields().size() + HiveTestUtil.hiveSyncConfig.partitionFields.size()
             + HoodieRecord.HOODIE_META_COLUMNS.size(),
-        "Hive Schema should match the table schema + partition field");
+        "Hive Schema should matcHoodieDLAClient.javah the table schema + partition field");

Review comment:
       minor typo. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] lw309637554 commented on pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

Posted by GitBox <gi...@apache.org>.
lw309637554 commented on pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#issuecomment-787467468


   > @lw309637554 : Can you please check the feedback and address them. would be nice to have this in.
   
   okay


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 commented on a change in pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on a change in pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#discussion_r513820996



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##########
@@ -155,113 +150,75 @@ private static boolean isFieldExistsInSchema(Map<String, String> newTableSchema,
    * @param parquetType : Single paruet field
    * @return : Equivalent sHive schema
    */
-  private static String convertField(final Type parquetType) {
+  private static String convertFieldFromAvro(final Schema schema) {
     StringBuilder field = new StringBuilder();
-    if (parquetType.isPrimitive()) {
-      final PrimitiveType.PrimitiveTypeName parquetPrimitiveTypeName =
-          parquetType.asPrimitiveType().getPrimitiveTypeName();
-      final OriginalType originalType = parquetType.getOriginalType();
-      if (originalType == OriginalType.DECIMAL) {
-        final DecimalMetadata decimalMetadata = parquetType.asPrimitiveType().getDecimalMetadata();
-        return field.append("DECIMAL(").append(decimalMetadata.getPrecision()).append(" , ")
-            .append(decimalMetadata.getScale()).append(")").toString();
-      } else if (originalType == OriginalType.DATE) {
+    Schema.Type type = schema.getType();
+    LogicalType logicalType = schema.getLogicalType();
+    if (logicalType != null) {
+      if (logicalType instanceof LogicalTypes.Decimal) {
+        return field.append("DECIMAL(").append(((LogicalTypes.Decimal) logicalType).getPrecision()).append(" , ")
+            .append(((LogicalTypes.Decimal) logicalType).getScale()).append(")").toString();
+      } else if (logicalType instanceof LogicalTypes.Date) {
         return field.append("DATE").toString();
+      } else {
+        Log.info("not handle the type transform");
       }
-      // TODO - fix the method naming here
-      return parquetPrimitiveTypeName.convert(new PrimitiveType.PrimitiveTypeNameConverter<String, RuntimeException>() {
-        @Override
-        public String convertBOOLEAN(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "boolean";
-        }
-
-        @Override
-        public String convertINT32(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "int";
-        }
-
-        @Override
-        public String convertINT64(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "bigint";
-        }
-
-        @Override
-        public String convertINT96(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "timestamp-millis";
-        }
-
-        @Override
-        public String convertFLOAT(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "float";
-        }
-
-        @Override
-        public String convertDOUBLE(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "double";
-        }
-
-        @Override
-        public String convertFIXED_LEN_BYTE_ARRAY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "binary";
-        }
-
-        @Override
-        public String convertBINARY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          if (originalType == OriginalType.UTF8 || originalType == OriginalType.ENUM) {
-            return "string";
-          } else {
-            return "binary";
-          }
+    }
+    if (type.equals(Schema.Type.BOOLEAN)) {
+      return field.append("boolean").toString();
+    } else if (type.equals(Schema.Type.INT)) {
+      return field.append("int").toString();
+    } else if (type.equals(Schema.Type.LONG)) {
+      return field.append("bigint").toString();
+    } else if (type.equals(Schema.Type.FLOAT)) {
+      return field.append("float").toString();
+    } else if (type.equals(Schema.Type.DOUBLE)) {
+      return field.append("double").toString();
+    } else if (type.equals(Schema.Type.BYTES)) {
+      return field.append("binary").toString();
+    } else if (type.equals(Schema.Type.STRING)) {
+      return field.append("string").toString();
+    } else if (type.equals(Schema.Type.RECORD)) {
+      List<Pair<String, Schema>> noNullSchemaFields = new ArrayList<>();
+      for (Schema.Field fieldItem : schema.getFields()) {
+        if (fieldItem.schema().getType().equals(Schema.Type.NULL)) {
+          continue; // Avro nulls are not encoded, unless they are null unions
         }
-      });
-    } else {
-      GroupType parquetGroupType = parquetType.asGroupType();
-      OriginalType originalType = parquetGroupType.getOriginalType();
-      if (originalType != null) {
-        switch (originalType) {
-          case LIST:
-            if (parquetGroupType.getFieldCount() != 1) {
-              throw new UnsupportedOperationException("Invalid list type " + parquetGroupType);
-            }
-            Type elementType = parquetGroupType.getType(0);
-            if (!elementType.isRepetition(Type.Repetition.REPEATED)) {
-              throw new UnsupportedOperationException("Invalid list type " + parquetGroupType);
-            }
-            return createHiveArray(elementType, parquetGroupType.getName());
-          case MAP:
-            if (parquetGroupType.getFieldCount() != 1 || parquetGroupType.getType(0).isPrimitive()) {
-              throw new UnsupportedOperationException("Invalid map type " + parquetGroupType);
-            }
-            GroupType mapKeyValType = parquetGroupType.getType(0).asGroupType();
-            if (!mapKeyValType.isRepetition(Type.Repetition.REPEATED)
-                || !mapKeyValType.getOriginalType().equals(OriginalType.MAP_KEY_VALUE)
-                || mapKeyValType.getFieldCount() != 2) {
-              throw new UnsupportedOperationException("Invalid map type " + parquetGroupType);
-            }
-            Type keyType = mapKeyValType.getType(0);
-            if (!keyType.isPrimitive()
-                || !keyType.asPrimitiveType().getPrimitiveTypeName().equals(PrimitiveType.PrimitiveTypeName.BINARY)
-                || !keyType.getOriginalType().equals(OriginalType.UTF8)) {
-              throw new UnsupportedOperationException("Map key type must be binary (UTF8): " + keyType);
-            }
-            Type valueType = mapKeyValType.getType(1);
-            return createHiveMap(convertField(keyType), convertField(valueType));
-          case ENUM:
-          case UTF8:
-            return "string";
-          case MAP_KEY_VALUE:
-            // MAP_KEY_VALUE was supposed to be used to annotate key and
-            // value group levels in a
-            // MAP. However, that is always implied by the structure of
-            // MAP. Hence, PARQUET-113
-            // dropped the requirement for having MAP_KEY_VALUE.
-          default:
-            throw new UnsupportedOperationException("Cannot convert Parquet type " + parquetType);
+        noNullSchemaFields.add(new ImmutablePair<String, Schema>(fieldItem.name(), fieldItem.schema()));
+      }
+      return createHiveStructFromAvro(noNullSchemaFields);
+    } else if (type.equals(Schema.Type.ENUM)) {
+      return field.append("string").toString();
+    } else if (type.equals(Schema.Type.ARRAY)) {
+      Schema elementType = schema.getElementType();
+      String schemaName = schema.getName();
+      return createHiveArrayFromAvro(schemaName, elementType);
+    } else if (type.equals(Schema.Type.MAP)) {
+      String keyType = "string";

Review comment:
       Why fix the key type for Map datatype as strings, instead of converting the original key type ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#issuecomment-751317488


   @vinothchandar : do you think we need to tag this as release blocker for upcoming release? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 commented on a change in pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on a change in pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#discussion_r512377446



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##########
@@ -155,113 +150,75 @@ private static boolean isFieldExistsInSchema(Map<String, String> newTableSchema,
    * @param parquetType : Single paruet field
    * @return : Equivalent sHive schema
    */
-  private static String convertField(final Type parquetType) {
+  private static String convertFieldFromAvro(final Schema schema) {
     StringBuilder field = new StringBuilder();
-    if (parquetType.isPrimitive()) {
-      final PrimitiveType.PrimitiveTypeName parquetPrimitiveTypeName =
-          parquetType.asPrimitiveType().getPrimitiveTypeName();
-      final OriginalType originalType = parquetType.getOriginalType();
-      if (originalType == OriginalType.DECIMAL) {
-        final DecimalMetadata decimalMetadata = parquetType.asPrimitiveType().getDecimalMetadata();
-        return field.append("DECIMAL(").append(decimalMetadata.getPrecision()).append(" , ")
-            .append(decimalMetadata.getScale()).append(")").toString();
-      } else if (originalType == OriginalType.DATE) {
+    Schema.Type type = schema.getType();
+    LogicalType logicalType = schema.getLogicalType();
+    if (logicalType != null) {
+      if (logicalType instanceof LogicalTypes.Decimal) {
+        return field.append("DECIMAL(").append(((LogicalTypes.Decimal) logicalType).getPrecision()).append(" , ")
+            .append(((LogicalTypes.Decimal) logicalType).getScale()).append(")").toString();
+      } else if (logicalType instanceof LogicalTypes.Date) {
         return field.append("DATE").toString();
+      } else {
+        Log.info("not handle the type transform");
       }
-      // TODO - fix the method naming here
-      return parquetPrimitiveTypeName.convert(new PrimitiveType.PrimitiveTypeNameConverter<String, RuntimeException>() {
-        @Override
-        public String convertBOOLEAN(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "boolean";
-        }
-
-        @Override
-        public String convertINT32(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "int";
-        }
-
-        @Override
-        public String convertINT64(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "bigint";
-        }
-
-        @Override
-        public String convertINT96(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "timestamp-millis";
-        }
-
-        @Override
-        public String convertFLOAT(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "float";
-        }
-
-        @Override
-        public String convertDOUBLE(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "double";
-        }
-
-        @Override
-        public String convertFIXED_LEN_BYTE_ARRAY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "binary";
-        }
-
-        @Override
-        public String convertBINARY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          if (originalType == OriginalType.UTF8 || originalType == OriginalType.ENUM) {
-            return "string";
-          } else {
-            return "binary";
-          }
+    }
+    if (type.equals(Schema.Type.BOOLEAN)) {

Review comment:
       I think this is a good case where we should use `switch case` like how it was getting used, instead of if else. Can you change this to `switch case`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 commented on a change in pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on a change in pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#discussion_r512377446



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##########
@@ -155,113 +150,75 @@ private static boolean isFieldExistsInSchema(Map<String, String> newTableSchema,
    * @param parquetType : Single paruet field
    * @return : Equivalent sHive schema
    */
-  private static String convertField(final Type parquetType) {
+  private static String convertFieldFromAvro(final Schema schema) {
     StringBuilder field = new StringBuilder();
-    if (parquetType.isPrimitive()) {
-      final PrimitiveType.PrimitiveTypeName parquetPrimitiveTypeName =
-          parquetType.asPrimitiveType().getPrimitiveTypeName();
-      final OriginalType originalType = parquetType.getOriginalType();
-      if (originalType == OriginalType.DECIMAL) {
-        final DecimalMetadata decimalMetadata = parquetType.asPrimitiveType().getDecimalMetadata();
-        return field.append("DECIMAL(").append(decimalMetadata.getPrecision()).append(" , ")
-            .append(decimalMetadata.getScale()).append(")").toString();
-      } else if (originalType == OriginalType.DATE) {
+    Schema.Type type = schema.getType();
+    LogicalType logicalType = schema.getLogicalType();
+    if (logicalType != null) {
+      if (logicalType instanceof LogicalTypes.Decimal) {
+        return field.append("DECIMAL(").append(((LogicalTypes.Decimal) logicalType).getPrecision()).append(" , ")
+            .append(((LogicalTypes.Decimal) logicalType).getScale()).append(")").toString();
+      } else if (logicalType instanceof LogicalTypes.Date) {
         return field.append("DATE").toString();
+      } else {
+        Log.info("not handle the type transform");
       }
-      // TODO - fix the method naming here
-      return parquetPrimitiveTypeName.convert(new PrimitiveType.PrimitiveTypeNameConverter<String, RuntimeException>() {
-        @Override
-        public String convertBOOLEAN(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "boolean";
-        }
-
-        @Override
-        public String convertINT32(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "int";
-        }
-
-        @Override
-        public String convertINT64(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "bigint";
-        }
-
-        @Override
-        public String convertINT96(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "timestamp-millis";
-        }
-
-        @Override
-        public String convertFLOAT(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "float";
-        }
-
-        @Override
-        public String convertDOUBLE(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "double";
-        }
-
-        @Override
-        public String convertFIXED_LEN_BYTE_ARRAY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "binary";
-        }
-
-        @Override
-        public String convertBINARY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          if (originalType == OriginalType.UTF8 || originalType == OriginalType.ENUM) {
-            return "string";
-          } else {
-            return "binary";
-          }
+    }
+    if (type.equals(Schema.Type.BOOLEAN)) {

Review comment:
       Can you use switch case instead just for better code organization ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 commented on a change in pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on a change in pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#discussion_r513819850



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##########
@@ -327,20 +284,14 @@ private static String createHiveMap(String keyType, String valueType) {
   /**
    * Create an Array Hive schema from equivalent parquet list type.
    */
-  private static String createHiveArray(Type elementType, String elementName) {
+  private static String createHiveArrayFromAvro(String schemaName, Schema elementType) {
     StringBuilder array = new StringBuilder();
     array.append("ARRAY< ");
-    if (elementType.isPrimitive()) {
-      array.append(convertField(elementType));
+    if (elementType.getType().equals(Schema.Type.RECORD) && (elementType.getFields().size() == 1
+        && !elementType.getName().equals("array") && !elementType.getName().endsWith("_tuple"))) {

Review comment:
       Wondering if this conditions is really needed ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] umehrot2 commented on a change in pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on a change in pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#discussion_r512379798



##########
File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
##########
@@ -155,113 +150,75 @@ private static boolean isFieldExistsInSchema(Map<String, String> newTableSchema,
    * @param parquetType : Single paruet field
    * @return : Equivalent sHive schema
    */
-  private static String convertField(final Type parquetType) {
+  private static String convertFieldFromAvro(final Schema schema) {
     StringBuilder field = new StringBuilder();
-    if (parquetType.isPrimitive()) {
-      final PrimitiveType.PrimitiveTypeName parquetPrimitiveTypeName =
-          parquetType.asPrimitiveType().getPrimitiveTypeName();
-      final OriginalType originalType = parquetType.getOriginalType();
-      if (originalType == OriginalType.DECIMAL) {
-        final DecimalMetadata decimalMetadata = parquetType.asPrimitiveType().getDecimalMetadata();
-        return field.append("DECIMAL(").append(decimalMetadata.getPrecision()).append(" , ")
-            .append(decimalMetadata.getScale()).append(")").toString();
-      } else if (originalType == OriginalType.DATE) {
+    Schema.Type type = schema.getType();
+    LogicalType logicalType = schema.getLogicalType();
+    if (logicalType != null) {
+      if (logicalType instanceof LogicalTypes.Decimal) {
+        return field.append("DECIMAL(").append(((LogicalTypes.Decimal) logicalType).getPrecision()).append(" , ")
+            .append(((LogicalTypes.Decimal) logicalType).getScale()).append(")").toString();
+      } else if (logicalType instanceof LogicalTypes.Date) {
         return field.append("DATE").toString();
+      } else {
+        Log.info("not handle the type transform");
       }
-      // TODO - fix the method naming here
-      return parquetPrimitiveTypeName.convert(new PrimitiveType.PrimitiveTypeNameConverter<String, RuntimeException>() {
-        @Override
-        public String convertBOOLEAN(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "boolean";
-        }
-
-        @Override
-        public String convertINT32(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "int";
-        }
-
-        @Override
-        public String convertINT64(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "bigint";
-        }
-
-        @Override
-        public String convertINT96(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "timestamp-millis";
-        }
-
-        @Override
-        public String convertFLOAT(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "float";
-        }
-
-        @Override
-        public String convertDOUBLE(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "double";
-        }
-
-        @Override
-        public String convertFIXED_LEN_BYTE_ARRAY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          return "binary";
-        }
-
-        @Override
-        public String convertBINARY(PrimitiveType.PrimitiveTypeName primitiveTypeName) {
-          if (originalType == OriginalType.UTF8 || originalType == OriginalType.ENUM) {
-            return "string";
-          } else {
-            return "binary";
-          }
+    }
+    if (type.equals(Schema.Type.BOOLEAN)) {
+      return field.append("boolean").toString();
+    } else if (type.equals(Schema.Type.INT)) {
+      return field.append("int").toString();
+    } else if (type.equals(Schema.Type.LONG)) {
+      return field.append("bigint").toString();
+    } else if (type.equals(Schema.Type.FLOAT)) {
+      return field.append("float").toString();
+    } else if (type.equals(Schema.Type.DOUBLE)) {
+      return field.append("double").toString();
+    } else if (type.equals(Schema.Type.BYTES)) {
+      return field.append("binary").toString();
+    } else if (type.equals(Schema.Type.STRING)) {
+      return field.append("string").toString();
+    } else if (type.equals(Schema.Type.RECORD)) {
+      List<Pair<String, Schema>> noNullSchemaFields = new ArrayList<>();
+      for (Schema.Field fieldItem : schema.getFields()) {
+        if (fieldItem.schema().getType().equals(Schema.Type.NULL)) {

Review comment:
       I would suggest moving this logic inside `createHiveStructFromAvro` when you have a dedicated function just for that. This is how the code was organized earlier. So you are just pass `List<Schema.Field>` from here to `createHiveStructFromAvro` and let it do the handling.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org