You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "pavibhai (via GitHub)" <gi...@apache.org> on 2023/05/01 23:35:23 UTC
[GitHub] [orc] pavibhai commented on a diff in pull request #1482: ORC-1413 fix for ORC row level filter issue with ACID table

pavibhai commented on code in PR #1482:
URL: https://github.com/apache/orc/pull/1482#discussion_r1181965794


##########
java/core/src/java/org/apache/orc/impl/ParserUtils.java:
##########
@@ -459,10 +459,31 @@ public static ColumnVector[] findColumnVectors(TypeDescription schema,
                                                  boolean isCaseSensitive,
                                                  VectorizedRowBatch batch) {
     List<String> names = ParserUtils.splitName(source);
-    ColumnFinder result = new ColumnFinder(schema, batch, names.size());
+    ColumnFinder result = new ColumnFinder(schema, removeAcidColumnsFromVectorizedRowBatch(schema, batch), names.size());
     findColumn(removeAcid(schema), names, isCaseSensitive, result);
     return result.result;
   }
+  

Review Comment:
   I would think we should go a different route. My proposal would be something like.
   
   If the schema is acid schema then when you are looking up column names prefix them with `row.` and just use the full schema. This will ensure that we respect the nesting of the vectors under the struct column vector related to `row`



##########
java/core/src/test/org/apache/orc/TestOrcFilterContext.java:
##########
@@ -225,4 +230,12 @@ public void testRepeatingVector() {
     assertTrue(OrcFilterContext.isNull(vectorBranch, 1));
     assertTrue(OrcFilterContext.isNull(vectorBranch, 2));
   }
+  
+  @Test
+  public void testACIDTable(){

Review Comment:
   I would also add an actual filter test with data to ensure that the behavior is as expected.



##########
java/core/src/java/org/apache/orc/impl/ParserUtils.java:
##########
@@ -459,10 +459,31 @@ public static ColumnVector[] findColumnVectors(TypeDescription schema,
                                                  boolean isCaseSensitive,
                                                  VectorizedRowBatch batch) {
     List<String> names = ParserUtils.splitName(source);
-    ColumnFinder result = new ColumnFinder(schema, batch, names.size());
+    ColumnFinder result = new ColumnFinder(schema, removeAcidColumnsFromVectorizedRowBatch(schema, batch), names.size());
     findColumn(removeAcid(schema), names, isCaseSensitive, result);
     return result.result;
   }
+  
+  private static VectorizedRowBatch removeAcidColumnsFromVectorizedRowBatch(TypeDescription schema, VectorizedRowBatch batch) {
+    final int acidColumnOffset = 5;
+    if(SchemaEvolution.checkAcidSchema(schema)) {
+      VectorizedRowBatch vectorizedRowBatch = new VectorizedRowBatch(batch.cols.length - acidColumnOffset, batch.size);
+      System.arraycopy(batch.selected, 0, vectorizedRowBatch.selected, 0, vectorizedRowBatch.selected.length);
+      System.arraycopy(batch.projectedColumns, acidColumnOffset, vectorizedRowBatch.projectedColumns, 0, batch.projectedColumns.length - acidColumnOffset);
+      vectorizedRowBatch.setPartitionInfo(batch.getDataColumnCount(), batch.getPartitionColumnCount());
+      vectorizedRowBatch.selectedInUse = batch.selectedInUse;
+      vectorizedRowBatch.endOfFile = batch.endOfFile;
+
+      StructColumnVector structColumnVector = (StructColumnVector)(batch.cols[acidColumnOffset]);
+      ColumnVector[] columnVectors = new ColumnVector[structColumnVector.fields.length];
+      for (int i = 0; i < columnVectors.length; i++) {
+        columnVectors[i] = structColumnVector.fields[i];
+      }
+      vectorizedRowBatch.cols = columnVectors;
+      return vectorizedRowBatch;
+    }
+    return batch;

Review Comment:
   I might be missing the logic of where we are updating back the changes that are not referenced example batch.size back to the batch after the filter processing is complete.
   
   I go back to the earlier comment that we should do this differently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org