You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@doris.apache.org by "kaka11chen (via GitHub)" <gi...@apache.org> on 2023/04/25 06:32:40 UTC

[GitHub] [doris] kaka11chen commented on a diff in pull request #19039: [optimize](parquet-reader) Skip whole row group in the parquet lazy read situation if data has been filtered out.

kaka11chen commented on code in PR #19039:
URL: https://github.com/apache/doris/pull/19039#discussion_r1176062026


##########
be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:
##########
@@ -450,6 +448,17 @@ Status RowGroupReader::_do_lazy_read(Block* block, size_t batch_size, size_t* re
                 block->get_by_name(col.first).column->assume_mutable()->clear();
             }
             Block::erase_useless_column(block, origin_column_num);
+
+            if (!pre_eof) {
+                // If continuous batches are skipped, we can cache them to skip a whole page
+                _cached_filtered_rows += pre_read_rows;
+            } else { // pre_eof
+                // If select_vector_ptr->filter_all() and pre_eof, we can skip whole row group.
+                *read_rows = 0;
+                *batch_eof = true;
+                _lazy_read_filtered_rows += pre_read_rows;

Review Comment:
   It has remove in above codes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org