You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2021/05/26 21:36:37 UTC

[GitHub] [drill] vdiravka commented on a change in pull request #2232: DRILL-7919: Fix reading parquet with decimal dictionary encoding

vdiravka commented on a change in pull request #2232:
URL: https://github.com/apache/drill/pull/2232#discussion_r640131705



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java
##########
@@ -318,6 +318,27 @@ protected void readField(long recordsToReadInThisPass) {
             }
           }
           break;
+        case FIXED_LEN_BYTE_ARRAY:
+        case BINARY:
+          if (usingDictionary) {
+            NullableVarDecimalVector.Mutator mutator = valueVec.getMutator();
+            for (int i = 0; i < recordsReadInThisIteration; i++) {
+              Binary currDictValToWrite = pageReader.dictionaryValueReader.readBytes();
+              mutator.setSafe(valuesReadInCurrentPass + i, currDictValToWrite.toByteBuffer().slice(), 0,
+                  currDictValToWrite.length());
+            }
+            // Set the write Index. The next page that gets read might be a page that does not use dictionary encoding
+            // and we will go into the else condition below. The readField method of the parent class requires the
+            // writer index to be set correctly.
+            int writerIndex = valueVec.getBuffer().writerIndex();
+            valueVec.getBuffer().setIndex(0, writerIndex + (int) readLength);
+          } else {
+            for (int i = 0; i < recordsToReadInThisPass; i++) {
+              Binary valueToWrite = pageReader.valueReader.readBytes();
+              valueVec.getMutator().setSafe(valuesReadInCurrentPass + i, valueToWrite.toByteBuffer().slice(), 0,

Review comment:
       Originally it was `valueVec.getMutator().setSafe(index, 1, start, start + dataTypeLengthInBytes, bytebuf);`. Why it is different now?

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java
##########
@@ -318,6 +318,27 @@ protected void readField(long recordsToReadInThisPass) {
             }
           }
           break;
+        case FIXED_LEN_BYTE_ARRAY:
+        case BINARY:
+          if (usingDictionary) {

Review comment:
       Am I right, this is the main code to solve the ticket issue and all other changes from this PR are just refactoring?

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java
##########
@@ -252,91 +222,110 @@
     }
   }
 
-  public static NullableColumnReader<?> getNullableColumnReader(ParquetRecordReader parentReader,
+  public static ColumnReader<?> getNullableColumnReader(ParquetRecordReader parentReader,

Review comment:
       The method is names as `getNullableColumnReader`, so why the returned type is not `NullableColumnReader` anymore?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org