You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/01/10 06:03:23 UTC

[GitHub] gianm commented on a change in pull request #6833: fix parquet parse performance issue

gianm commented on a change in pull request #6833: fix parquet parse performance issue 
URL: https://github.com/apache/incubator-druid/pull/6833#discussion_r246643395
 
 

 ##########
 File path: extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/ParquetGroupConverter.java
 ##########
 @@ -62,36 +62,39 @@ private static Object convertField(Group g, String fieldName, boolean binaryAsSt
 
     final int fieldIndex = g.getType().getFieldIndex(fieldName);
 
-    Type fieldType = g.getType().getFields().get(fieldIndex);
-
-    // primitive field
-    if (fieldType.isPrimitive()) {
-      // primitive list
-      if (fieldType.getRepetition().equals(Type.Repetition.REPEATED)) {
-        int repeated = g.getFieldRepetitionCount(fieldIndex);
-        List<Object> vals = new ArrayList<>();
-        for (int i = 0; i < repeated; i++) {
-          vals.add(convertPrimitiveField(g, fieldIndex, i, binaryAsString));
+    if (g.getFieldRepetitionCount(fieldIndex) > 0) {
+      Type fieldType = g.getType().getFields().get(fieldIndex);
+
+      // primitive field
+      if (fieldType.isPrimitive()) {
+        // primitive list
+        if (fieldType.getRepetition().equals(Type.Repetition.REPEATED)) {
+          int repeated = g.getFieldRepetitionCount(fieldIndex);
+          List<Object> vals = new ArrayList<>();
+          for (int i = 0; i < repeated; i++) {
+            vals.add(convertPrimitiveField(g, fieldIndex, i, binaryAsString));
+          }
+          return vals;
+        }
+        return convertPrimitiveField(g, fieldIndex, binaryAsString);
+      } else {
+        if (fieldType.isRepetition(Type.Repetition.REPEATED)) {
+          return convertRepeatedFieldToList(g, fieldIndex, binaryAsString);
         }
-        return vals;
-      }
-      return convertPrimitiveField(g, fieldIndex, binaryAsString);
-    } else {
-      if (fieldType.isRepetition(Type.Repetition.REPEATED)) {
-        return convertRepeatedFieldToList(g, fieldIndex, binaryAsString);
-      }
 
-      if (isLogicalMapType(fieldType)) {
-        return convertLogicalMap(g.getGroup(fieldIndex, 0), binaryAsString);
-      }
+        if (isLogicalMapType(fieldType)) {
+          return convertLogicalMap(g.getGroup(fieldIndex, 0), binaryAsString);
+        }
 
-      if (isLogicalListType(fieldType)) {
-        return convertLogicalList(g.getGroup(fieldIndex, 0), binaryAsString);
-      }
+        if (isLogicalListType(fieldType)) {
+          return convertLogicalList(g.getGroup(fieldIndex, 0), binaryAsString);
+        }
 
-      // not a list, but not a primtive, return the nested group type
-      return g.getGroup(fieldIndex, 0);
+        // not a list, but not a primtive, return the nested group type
+        return g.getGroup(fieldIndex, 0);
+      }
     }
+    return null;
 
 Review comment:
   nit, but, this would be clearer if it were in an `else` block with a comment about why we're returning null here. Otherwise it's not clear that the (relatively complex) block above is not meant to be able to fall through.
   
   Alternatively, flip things such that this block is first:
   
   ```java
   if (g.getFieldRepetitionCount(fieldIndex) == 0) {
     return null;
   }
   ```
   
   It's simple enough that it's clear that it won't fall through. It's also clear that it's a "skip if there is no data" kind of check.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org