You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/10 02:38:02 UTC

[GitHub] [iceberg] openinx commented on a change in pull request #2566: Flink : vectorized read of orc format in flink

openinx commented on a change in pull request #2566:
URL: https://github.com/apache/iceberg/pull/2566#discussion_r629012019



##########
File path: flink/src/main/java/org/apache/iceberg/flink/source/FlinkInputFormat.java
##########
@@ -91,11 +102,77 @@ public void configure(Configuration parameters) {
 
   @Override
   public void open(FlinkInputSplit split) {
+    boolean enableVectorizedRead = readableConfig.get(FlinkTableOptions.ENABLE_VECTORIZED_READ);
+
+    if (enableVectorizedRead) {
+      if (useOrcVectorizedRead()) {

Review comment:
       There are other required requisition so that we could apply vectorized read:
   1.   All those files from the `CombinedScanTask` are data files,  if there is a delete file, the current deletions apply process are compared row by row which will disable the vectorized read actually. 
   2.  All the files from `CombinedScanTask` must be ORC files.
   3.  All the columns to read should all be primitives,  that means all the byte width should be the same size.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org