You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/11/01 03:24:59 UTC

[GitHub] [orc] guiyanakuang opened a new pull request #952: ORC-1004: Make orc writer support the selected vector

guiyanakuang opened a new pull request #952:
URL: https://github.com/apache/orc/pull/952

### What changes were proposed in this pull request?

This pr is aimed at making ORC Writer support write thed selected vector.

Added InternalVectorizedRowBatch and InternalColumnVector classes to encapsulate and replace VectorizedRowBatc and ColumnVector.
WholeVectorizedRowBatch/WholeColumnVector represents a batch/vector that does not use selected.
SelectedVectorizedRowBatch/SelectedColumnVector represents a batch/vector that does use selected.

During the write process we can use the same interface (`public int getValueOffset(int offset) `) to get the offsets, making the changes minimal and not doing anything redundant.

![SelectedColumnVector](https://user-images.githubusercontent.com/4069905/139616377-51eb8e21-fd21-4c59-b324-e1aefd3d83a1.png)

When writing to the encrypted column, a copy of the Selected Vector is made without modifying the maskData interface, resulting in an unused Selected Vector. Afterwards, the processing is the same as before.

### Why are the changes needed?

Currently the ORC writer doesn't support the selected vector. This could cause clients that expect it to be supported to get trash rows in the output.

### How was this patch tested?

Add UT TestSelectedVector.java

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org