You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/05/18 09:06:05 UTC

[GitHub] [incubator-doris] HappenLee opened a new pull request, #9666: [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner

HappenLee opened a new pull request, #9666:
URL: https://github.com/apache/incubator-doris/pull/9666

   1. fix bug of vjson scanner not support `range_from_file_path`
   2. fix bug of vjson/vbrocker scanner core dump by src/dest slot nullable is different
   3. fix bug of vparquest filter_block reference of column in not 1
   4. refactor code to simple all the code
   
   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on a diff in pull request #9666: [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #9666:
URL: https://github.com/apache/incubator-doris/pull/9666#discussion_r876704180


##########
be/src/vec/core/block.cpp:
##########
@@ -402,6 +402,20 @@ std::string Block::dump_data(size_t begin, size_t row_limit) const {
     return out.str();
 }
 
+std::string Block::dump_one_line(size_t row, int column_end) const {
+    assert(column_end < columns());
+    fmt::memory_buffer line;
+    for (int i = 0; i < column_end; ++i) {
+        if (LIKELY(i != 0)) {
+            // TODO: need more effective function of to string. now the impl is slow
+            fmt::format_to(line, " {}", data[i].to_string(row));

Review Comment:
   assert at 406 line and no null pointer should in block



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] carlvinhust2012 commented on a diff in pull request #9666: [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner

Posted by GitBox <gi...@apache.org>.
carlvinhust2012 commented on code in PR #9666:
URL: https://github.com/apache/incubator-doris/pull/9666#discussion_r876534567


##########
be/src/vec/exec/vbroker_scanner.cpp:
##########
@@ -44,18 +44,14 @@ VBrokerScanner::VBrokerScanner(RuntimeState* state, RuntimeProfile* profile,
     _text_converter.reset(new (std::nothrow) TextConverter('\\'));
 }
 
-VBrokerScanner::~VBrokerScanner() {}
+VBrokerScanner::~VBrokerScanner() = default;

Review Comment:
   why not move this line to "vbroker_scanner.h" ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei merged pull request #9666: [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner

Posted by GitBox <gi...@apache.org>.
yiguolei merged PR #9666:
URL: https://github.com/apache/incubator-doris/pull/9666


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] carlvinhust2012 commented on a diff in pull request #9666: [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner

Posted by GitBox <gi...@apache.org>.
carlvinhust2012 commented on code in PR #9666:
URL: https://github.com/apache/incubator-doris/pull/9666#discussion_r876556821


##########
be/src/vec/core/block.cpp:
##########
@@ -402,6 +402,20 @@ std::string Block::dump_data(size_t begin, size_t row_limit) const {
     return out.str();
 }
 
+std::string Block::dump_one_line(size_t row, int column_end) const {
+    assert(column_end < columns());
+    fmt::memory_buffer line;
+    for (int i = 0; i < column_end; ++i) {
+        if (LIKELY(i != 0)) {
+            // TODO: need more effective function of to string. now the impl is slow
+            fmt::format_to(line, " {}", data[i].to_string(row));

Review Comment:
   why not check whether the data[i] is validity like this "  if (data[i].column) { s = data[i].to_string(row_num); } " ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xiepengcheng01 commented on a diff in pull request #9666: [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner

Posted by GitBox <gi...@apache.org>.
xiepengcheng01 commented on code in PR #9666:
URL: https://github.com/apache/incubator-doris/pull/9666#discussion_r876574860


##########
be/src/exec/base_scanner.cpp:
##########
@@ -272,59 +293,132 @@ Status BaseScanner::_fill_dest_tuple(Tuple* dest_tuple, MemPool* mem_pool) {
         }
         void* slot = dest_tuple->get_slot(slot_desc->tuple_offset());
         RawValue::write(value, slot, slot_desc->type(), mem_pool);
-        continue;
     }
     _success = true;
     return Status::OK();
 }
 
-Status BaseScanner::filter_block(vectorized::Block* temp_block, size_t slot_num) {
+Status BaseScanner::_filter_src_block() {
+    auto origin_column_num = _src_block.columns();
     // filter block
     if (!_vpre_filter_ctxs.empty()) {
         for (auto _vpre_filter_ctx : _vpre_filter_ctxs) {
-            auto old_rows = temp_block->rows();
-            RETURN_IF_ERROR(
-                    vectorized::VExprContext::filter_block(_vpre_filter_ctx, temp_block, slot_num));
-            _counter->num_rows_unselected += old_rows - temp_block->rows();
+            auto old_rows = _src_block.rows();
+            RETURN_IF_ERROR(vectorized::VExprContext::filter_block(_vpre_filter_ctx, &_src_block,
+                                                                   origin_column_num));
+            _counter->num_rows_unselected += old_rows - _src_block.rows();
         }
     }
     return Status::OK();
 }
 
-Status BaseScanner::execute_exprs(vectorized::Block* output_block, vectorized::Block* temp_block) {
+Status BaseScanner::_materialize_dest_block(vectorized::Block* dest_block) {
     // Do vectorized expr here
-    Status status;
-    if (!_dest_vexpr_ctx.empty()) {
-        *output_block = vectorized::VExprContext::get_output_block_after_execute_exprs(
-                _dest_vexpr_ctx, *temp_block, status);
-        if (UNLIKELY(output_block->rows() == 0)) {
-            return status;
+    int ctx_idx = 0;
+    size_t rows = _src_block.rows();
+    auto filter_column = vectorized::ColumnUInt8::create(rows, 1);
+    auto& filter_map = filter_column->get_data();
+
+    for (auto slot_desc : _dest_tuple_desc->slots()) {
+        if (!slot_desc->is_materialized()) {
+            continue;
+        }
+        int dest_index = ctx_idx++;
+
+        auto* ctx = _dest_vexpr_ctx[dest_index];
+        int result_column_id = 0;
+        // PT1 => dest primitive type
+        RETURN_IF_ERROR(ctx->execute(&_src_block, &result_column_id));
+        auto column_ptr = _src_block.get_by_position(result_column_id).column;
+
+        if (column_ptr->is_nullable()) {
+            auto nullable_column =
+                    reinterpret_cast<const vectorized::ColumnNullable*>(column_ptr.get());
+            for (int i = 0; i < rows; ++i) {
+                if (filter_map[i] && nullable_column->is_null_at(i)) {
+                    if (_strict_mode && (_src_slot_descs_order_by_dest[ctx_idx]) &&
+                        !_src_block.get_by_position(ctx_idx).column->is_null_at(i)) {
+                        RETURN_IF_ERROR(_state->append_error_msg_to_file(
+                                [&]() -> std::string {
+                                    return _src_block.dump_one_line(i, _num_of_columns_from_file);
+                                },
+                                [&]() -> std::string {
+                                    // Type of the slot is must be Varchar in _temp_block.

Review Comment:
   should modify this note?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on a diff in pull request #9666: [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #9666:
URL: https://github.com/apache/incubator-doris/pull/9666#discussion_r876727851


##########
be/src/vec/exec/vbroker_scanner.cpp:
##########
@@ -44,18 +44,14 @@ VBrokerScanner::VBrokerScanner(RuntimeState* state, RuntimeProfile* profile,
     _text_converter.reset(new (std::nothrow) TextConverter('\\'));
 }
 
-VBrokerScanner::~VBrokerScanner() {}
+VBrokerScanner::~VBrokerScanner() = default;

Review Comment:
   just do clang tidy clean



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9666: [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9666:
URL: https://github.com/apache/incubator-doris/pull/9666#issuecomment-1132426465

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] cambyzju commented on a diff in pull request #9666: [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #9666:
URL: https://github.com/apache/incubator-doris/pull/9666#discussion_r875939891


##########
be/src/exec/base_scanner.cpp:
##########
@@ -272,59 +293,132 @@ Status BaseScanner::_fill_dest_tuple(Tuple* dest_tuple, MemPool* mem_pool) {
         }
         void* slot = dest_tuple->get_slot(slot_desc->tuple_offset());
         RawValue::write(value, slot, slot_desc->type(), mem_pool);
-        continue;
     }
     _success = true;
     return Status::OK();
 }
 
-Status BaseScanner::filter_block(vectorized::Block* temp_block, size_t slot_num) {
+Status BaseScanner::_filter_src_block() {
+    auto origin_column_num = _src_block.columns();
     // filter block
     if (!_vpre_filter_ctxs.empty()) {
         for (auto _vpre_filter_ctx : _vpre_filter_ctxs) {
-            auto old_rows = temp_block->rows();
-            RETURN_IF_ERROR(
-                    vectorized::VExprContext::filter_block(_vpre_filter_ctx, temp_block, slot_num));
-            _counter->num_rows_unselected += old_rows - temp_block->rows();
+            auto old_rows = _src_block.rows();
+            RETURN_IF_ERROR(vectorized::VExprContext::filter_block(_vpre_filter_ctx, &_src_block,
+                                                                   origin_column_num));
+            _counter->num_rows_unselected += old_rows - _src_block.rows();
         }
     }
     return Status::OK();
 }
 
-Status BaseScanner::execute_exprs(vectorized::Block* output_block, vectorized::Block* temp_block) {
+Status BaseScanner::_materialize_dest_block(vectorized::Block* dest_block) {
     // Do vectorized expr here
-    Status status;
-    if (!_dest_vexpr_ctx.empty()) {
-        *output_block = vectorized::VExprContext::get_output_block_after_execute_exprs(
-                _dest_vexpr_ctx, *temp_block, status);
-        if (UNLIKELY(output_block->rows() == 0)) {
-            return status;
+    int ctx_idx = 0;
+    size_t rows = _src_block.rows();
+    auto filter_column = vectorized::ColumnUInt8::create(rows, 1);
+    auto& filter_map = filter_column->get_data();
+
+    for (auto slot_desc : _dest_tuple_desc->slots()) {
+        if (!slot_desc->is_materialized()) {
+            continue;
+        }
+        int dest_index = ctx_idx++;
+
+        auto* ctx = _dest_vexpr_ctx[dest_index];
+        int result_column_id = 0;

Review Comment:
   ```suggestion
           int result_column_id = -1;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9666: [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9666:
URL: https://github.com/apache/incubator-doris/pull/9666#issuecomment-1132426486

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] cambyzju commented on a diff in pull request #9666: [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner

Posted by GitBox <gi...@apache.org>.
cambyzju commented on code in PR #9666:
URL: https://github.com/apache/incubator-doris/pull/9666#discussion_r875935509


##########
be/src/exec/base_scanner.cpp:
##########
@@ -71,6 +77,21 @@ Status BaseScanner::open() {
     _rows_read_counter = ADD_COUNTER(_profile, "RowsRead", TUnit::UNIT);
     _read_timer = ADD_TIMER(_profile, "TotalRawReadTime(*)");
     _materialize_timer = ADD_TIMER(_profile, "MaterializeTupleTime(*)");
+
+    const auto& range = _ranges[0];

Review Comment:
   DCHECK(!_ranges.empty()); ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] HappenLee commented on a diff in pull request #9666: [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #9666:
URL: https://github.com/apache/incubator-doris/pull/9666#discussion_r876702627


##########
be/src/exec/base_scanner.cpp:
##########
@@ -71,6 +77,21 @@ Status BaseScanner::open() {
     _rows_read_counter = ADD_COUNTER(_profile, "RowsRead", TUnit::UNIT);
     _read_timer = ADD_TIMER(_profile, "TotalRawReadTime(*)");
     _materialize_timer = ADD_TIMER(_profile, "MaterializeTupleTime(*)");
+
+    const auto& range = _ranges[0];

Review Comment:
   ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org