You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/10/21 02:21:36 UTC

[GitHub] [doris] Jibing-Li opened a new pull request, #13535: [Fix](multi-catalog)Fix partition external table query bug.

Jibing-Li opened a new pull request, #13535:
URL: https://github.com/apache/doris/pull/13535

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   The index for external table columns from path is incorrect in new scanner. This is fix for it.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13535: [Fix](multi-catalog)Fix partition external table query bug.

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13535:
URL: https://github.com/apache/doris/pull/13535#issuecomment-1288128227

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 40.24 seconds
    load time: 585 seconds
    storage size: 17154768685 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221023224013_clickbench_pr_32866.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on a diff in pull request #13535: [Fix](multi-catalog)Fix partition external table query bug.

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #13535:
URL: https://github.com/apache/doris/pull/13535#discussion_r1002428145


##########
be/src/vec/exec/scan/vfile_scanner.cpp:
##########
@@ -256,7 +256,20 @@ Status VFileScanner::_fill_columns_from_path(size_t rows) {
                 ss << "Unknown source slot descriptor, slot_id=" << slot_desc->id();
                 return Status::InternalError(ss.str());
             }
-            const std::string& column_from_path = range.columns_from_path[it->second];
+
+            // For external table query, find the index of column in path.
+            // Reason is _partition_slot_index_map may be incorrect for query
+            // Because query doesn't always search for all columns in a table and the select columns order is random.
+            vector<std::string>::const_iterator iit =

Review Comment:
   Better do this in prepare phase once. Otherwise, we have to find keys for each `_fill_columns_from_path()` call, which is inefficient.
   You can use the structure to save the mapping from `_partition_slot_descs` to the index of `columns_from_path_keys`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13535: [Fix](multi-catalog)Fix partition external table query bug.

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13535:
URL: https://github.com/apache/doris/pull/13535#issuecomment-1288135416

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 41.05 seconds
    load time: 584 seconds
    storage size: 17154823315 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221023151451_clickbench_pr_32868.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13535: [Fix](multi-catalog)Fix partition external table query bug.

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13535:
URL: https://github.com/apache/doris/pull/13535#issuecomment-1286421279

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 41.32 seconds
    load time: 587 seconds
    storage size: 17154743025 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221021114752_clickbench_pr_32301.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13535: [Fix](multi-catalog)Fix partition external table query bug.

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13535:
URL: https://github.com/apache/doris/pull/13535#issuecomment-1291416240

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 38.57 seconds
    load time: 571 seconds
    storage size: 17154887192 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221026103404_clickbench_pr_34023.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13535: [Fix](multi-catalog)Fix partition external table query bug.

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13535:
URL: https://github.com/apache/doris/pull/13535#issuecomment-1286656215

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 40.27 seconds
    load time: 588 seconds
    storage size: 17154823382 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221021164706_clickbench_pr_32457.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on a diff in pull request #13535: [Fix](multi-catalog)Fix partition external table query bug.

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #13535:
URL: https://github.com/apache/doris/pull/13535#discussion_r1002428145


##########
be/src/vec/exec/scan/vfile_scanner.cpp:
##########
@@ -256,7 +256,20 @@ Status VFileScanner::_fill_columns_from_path(size_t rows) {
                 ss << "Unknown source slot descriptor, slot_id=" << slot_desc->id();
                 return Status::InternalError(ss.str());
             }
-            const std::string& column_from_path = range.columns_from_path[it->second];
+
+            // For external table query, find the index of column in path.
+            // Reason is _partition_slot_index_map may be incorrect for query
+            // Because query doesn't always search for all columns in a table and the select columns order is random.
+            vector<std::string>::const_iterator iit =

Review Comment:
   Better do this in prepare phase once. Otherwise, we have to find keys for each `_fill_columns_from_path()` call, which is inefficient.
   You can use the structure to same the mapping from `_partition_slot_descs` to the index of `columns_from_path_keys`



##########
be/src/vec/exec/format/parquet/vparquet_group_reader.h:
##########
@@ -35,12 +35,16 @@ class RowGroupReader {
 
     ParquetColumnReader::Statistics statistics();
 
+private:
+    Status read_empty_batch(size_t batch_size, size_t* read_rows, bool* _batch_eof);

Review Comment:
   ```suggestion
       Status _read_empty_batch(size_t batch_size, size_t* read_rows, bool* _batch_eof);
   ```



##########
be/src/vec/exec/format/parquet/vparquet_reader.cpp:
##########
@@ -156,12 +156,13 @@ Status ParquetReader::_init_read_columns() {
             _missing_cols.push_back(file_col_name);
         }
     }
+    // It is legal to get empty include_column_ids in query task.
+    _read_columns.clear();

Review Comment:
   It seems we don't need to call this `clear` for `_read_columns`, because `_read_columns` is only initialized once



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13535: [Fix](multi-catalog)Fix partition external table query bug.

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13535:
URL: https://github.com/apache/doris/pull/13535#issuecomment-1287259357

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 40.4 seconds
    load time: 590 seconds
    storage size: 17154743814 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221022014117_clickbench_pr_32662.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on pull request #13535: [Fix](multi-catalog)Fix partition external table query bug.

Posted by GitBox <gi...@apache.org>.
morningman commented on PR #13535:
URL: https://github.com/apache/doris/pull/13535#issuecomment-1287813336

   Please rebase to merge with #13515


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman merged pull request #13535: [Fix](multi-catalog)Fix partition external table query bug.

Posted by GitBox <gi...@apache.org>.
morningman merged PR #13535:
URL: https://github.com/apache/doris/pull/13535


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org