You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2021/05/26 21:54:37 UTC

[GitHub] [drill] vvysotskyi commented on a change in pull request #2232: DRILL-7919: Fix reading parquet with decimal dictionary encoding

vvysotskyi commented on a change in pull request #2232:
URL: https://github.com/apache/drill/pull/2232#discussion_r640146168



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java
##########
@@ -318,6 +318,27 @@ protected void readField(long recordsToReadInThisPass) {
             }
           }
           break;
+        case FIXED_LEN_BYTE_ARRAY:
+        case BINARY:
+          if (usingDictionary) {
+            NullableVarDecimalVector.Mutator mutator = valueVec.getMutator();
+            for (int i = 0; i < recordsReadInThisIteration; i++) {
+              Binary currDictValToWrite = pageReader.dictionaryValueReader.readBytes();
+              mutator.setSafe(valuesReadInCurrentPass + i, currDictValToWrite.toByteBuffer().slice(), 0,
+                  currDictValToWrite.length());
+            }
+            // Set the write Index. The next page that gets read might be a page that does not use dictionary encoding
+            // and we will go into the else condition below. The readField method of the parent class requires the
+            // writer index to be set correctly.
+            int writerIndex = valueVec.getBuffer().writerIndex();
+            valueVec.getBuffer().setIndex(0, writerIndex + (int) readLength);
+          } else {
+            for (int i = 0; i < recordsToReadInThisPass; i++) {
+              Binary valueToWrite = pageReader.valueReader.readBytes();
+              valueVec.getMutator().setSafe(valuesReadInCurrentPass + i, valueToWrite.toByteBuffer().slice(), 0,

Review comment:
       Because it was a different reader. This class extends `NullableColumnReader`, but that one was `NullableConvertedReader`, so it uses a slightly different approach for calculating required indexes.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java
##########
@@ -252,91 +222,110 @@
     }
   }
 
-  public static NullableColumnReader<?> getNullableColumnReader(ParquetRecordReader parentReader,
+  public static ColumnReader<?> getNullableColumnReader(ParquetRecordReader parentReader,

Review comment:
       It is because of the workaround for `NullableBitReader`, that actually extends `ColumnReader` and doesn't work properly with `NullableColumnReader` because of its internal representation. Initially, I also tried to extend `NullableBitReader` from `NullableColumnReader`, but it creates a lot of issues and is actually out of the scope for this PR.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java
##########
@@ -318,6 +318,27 @@ protected void readField(long recordsToReadInThisPass) {
             }
           }
           break;
+        case FIXED_LEN_BYTE_ARRAY:
+        case BINARY:
+          if (usingDictionary) {

Review comment:
       Not only this change but also some changes in `ColumnReaderFactory`. Initially one of those methods returned the wrong reader for decimals with a dictionary encoding. But I have made additional cleanup to combine the logic of choosing the reader a little bit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org