You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/12/11 17:19:50 UTC

[GitHub] [doris] Jibing-Li opened a new pull request, #14997: [fix](multi catalog)Return emtpy block while external table scanner couldn't find the file.

Jibing-Li opened a new pull request, #14997:
URL: https://github.com/apache/doris/pull/14997

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   FE file path cache for external table may out of date. In this case, BE may fail to find the not exist file from FE cache. This pr is to handle this case: instead of throw an error message to the user, we return empty result set to the user.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14997: [fix](multi catalog)Return emtpy block while external table scanner couldn't find the file.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14997:
URL: https://github.com/apache/doris/pull/14997#issuecomment-1348055981

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman merged pull request #14997: [fix](multi catalog)Return emtpy block while external table scanner couldn't find the file.

Posted by GitBox <gi...@apache.org>.
morningman merged PR #14997:
URL: https://github.com/apache/doris/pull/14997


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14997: [fix](multi catalog)Return emtpy block while external table scanner couldn't find the file.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14997:
URL: https://github.com/apache/doris/pull/14997#issuecomment-1352674972

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14997: [fix](multi catalog)Return emtpy block while external table scanner couldn't find the file.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14997:
URL: https://github.com/apache/doris/pull/14997#issuecomment-1353035333

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #14997: [fix](multi catalog)Return emtpy block while external table scanner couldn't find the file.

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #14997:
URL: https://github.com/apache/doris/pull/14997#issuecomment-1345614079

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 36.28 seconds
    load time: 466 seconds
    storage size: 17123356827 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221211174125_clickbench_pr_61671.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14997: [fix](multi catalog)Return emtpy block while external table scanner couldn't find the file.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14997:
URL: https://github.com/apache/doris/pull/14997#issuecomment-1345610574

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #14997: [fix](multi catalog)Return emtpy block while external table scanner couldn't find the file.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #14997:
URL: https://github.com/apache/doris/pull/14997#discussion_r1049200544


##########
be/src/vec/exec/scan/vfile_scanner.cpp:
##########
@@ -531,6 +531,9 @@ Status VFileScanner::_get_next_reader() {
         if (init_status.is<END_OF_FILE>()) {
             continue;
         } else if (!init_status.ok()) {
+            if (init_status.is_not_found()) {

Review Comment:
   warning: no member named 'is_not_found' in 'doris::Status' [clang-diagnostic-error]
   ```cpp
               if (init_status.is_not_found()) {
                               ^
   ```
   



##########
be/src/vec/exec/scan/scanner_scheduler.cpp:
##########
@@ -232,12 +233,18 @@ void ScannerScheduler::_scanner_scan(ScannerScheduler* scheduler, ScannerContext
         auto block = ctx->get_free_block(&get_free_block);
         status = scanner->get_block(state, block, &eos);
         VLOG_ROW << "VOlapScanNode input rows: " << block->rows() << ", eos: " << eos;
-        if (!status.ok()) {
+        // The VFileScanner for external table may try to open not exist files,
+        // Because FE file cache for external table may out of date.
+        if (!status.ok() && (typeid(*scanner) == typeid(doris::vectorized::VFileScanner) &&
+                             !status.is_not_found())) {

Review Comment:
   warning: no member named 'is_not_found' in 'doris::Status' [clang-diagnostic-error]
   ```cpp
                                !status.is_not_found())) {
                                        ^
   ```
   



##########
be/src/vec/exec/scan/scanner_scheduler.cpp:
##########
@@ -232,12 +233,18 @@
         auto block = ctx->get_free_block(&get_free_block);
         status = scanner->get_block(state, block, &eos);
         VLOG_ROW << "VOlapScanNode input rows: " << block->rows() << ", eos: " << eos;
-        if (!status.ok()) {
+        // The VFileScanner for external table may try to open not exist files,
+        // Because FE file cache for external table may out of date.
+        if (!status.ok() && (typeid(*scanner) == typeid(doris::vectorized::VFileScanner) &&
+                             !status.is_not_found())) {
             LOG(WARNING) << "Scan thread read VOlapScanner failed: " << status.to_string();
             // Add block ptr in blocks, prevent mem leak in read failed
             blocks.push_back(block);
             break;
         }
+        if (status.is_not_found()) {

Review Comment:
   warning: no member named 'is_not_found' in 'doris::Status' [clang-diagnostic-error]
   ```cpp
           if (status.is_not_found()) {
                      ^
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14997: [fix](multi catalog)Return emtpy block while external table scanner couldn't find the file.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14997:
URL: https://github.com/apache/doris/pull/14997#issuecomment-1353079082

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #14997: [fix](multi catalog)Return emtpy block while external table scanner couldn't find the file.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14997:
URL: https://github.com/apache/doris/pull/14997#issuecomment-1345761052

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on a diff in pull request #14997: [fix](multi catalog)Return emtpy block while external table scanner couldn't find the file.

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #14997:
URL: https://github.com/apache/doris/pull/14997#discussion_r1046780512


##########
be/src/vec/exec/scan/scanner_scheduler.cpp:
##########
@@ -232,12 +233,18 @@ void ScannerScheduler::_scanner_scan(ScannerScheduler* scheduler, ScannerContext
         auto block = ctx->get_free_block(&get_free_block);
         status = scanner->get_block(state, block, &eos);
         VLOG_ROW << "VOlapScanNode input rows: " << block->rows() << ", eos: " << eos;
-        if (!status.ok()) {
+        // The VFileScanner for external table may try to open not exist files,
+        // Because FE file cache for external table may out of date.
+        if (!status.ok() && (typeid(*scanner) == typeid(doris::vectorized::VFileScanner) &&
+                             !status.is_not_found())) {
             LOG(WARNING) << "Scan thread read VOlapScanner failed: " << status.to_string();
             // Add block ptr in blocks, prevent mem leak in read failed
             blocks.push_back(block);
             break;
         }
+        if (status.is_not_found()) {

Review Comment:
   I think we should move this logic into the `VFileScanner::_get_next_reader()`?
   And `VFileScanner::_get_block_impl` will return `eos = true` when file is not found



##########
be/src/io/hdfs_file_reader.cpp:
##########
@@ -72,6 +72,9 @@ Status HdfsFileReader::open() {
 
     RETURN_IF_ERROR(HdfsFsCache::instance()->get_connection(_hdfs_params, &_fs_handle));
     _hdfs_fs = _fs_handle->hdfs_fs;
+    if (!hdfsExists(_hdfs_fs, _path.c_str())) {

Review Comment:
   Also need to modify other file readers



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org