You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/11/07 17:09:52 UTC

[GitHub] [incubator-doris] xinyiZzz opened a new pull request #7032: [Heartbeat] Support cancel outdated query after FE restart

xinyiZzz opened a new pull request #7032:
URL: https://github.com/apache/incubator-doris/pull/7032


   ## Proposed changes
   
   Support cancel outdated query after FE restart, through three steps:
   1. Add startTime to Frontend Info and pass it in the heartbeat between Frontend.
   2. Add the startTime of all Frontends to the heartbeat between Master and Backend.
   3. Backend identifies whether the query is out of date according to the startTime of Frontend.
   
   Recognizable obsolete select query: Master restart, other Frontend stop or restart.
   Recognizable obsolete load query: Master restart.
   
   In the future, it is expected to support Backend to cancel outdated queries after failing to receive Master's heartbeat for a long time.
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   _Put an `x` in the boxes that apply_
   
   - [ ] Bugfix (non-breaking change which fixes an issue)
   - [x] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
   - [ ] Documentation Update (if none of the other choices apply)
   - [ ] Code refactor (Modify the code structure, format the code, etc...)
   - [ ] Optimization. Including functional usability improvements and performance improvements.
   - [ ] Dependency. Such as changes related to third-party components.
   - [ ] Other.
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code._
   
   - [x] I have created an issue on (Fix #ISSUE) and described the bug/feature there in detail
   - [x] Compiling and unit tests pass locally with my changes
   - [x] I have added tests that prove my fix is effective or that my feature works
   - [ ] If these changes need document changes, I have updated the document
   - [ ] Any dependent changes have been merged
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #7032: [Heartbeat] Support cancel outdated query after FE restart

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on a change in pull request #7032:
URL: https://github.com/apache/incubator-doris/pull/7032#discussion_r751896740



##########
File path: be/src/agent/heartbeat_server.cpp
##########
@@ -78,6 +78,32 @@ void HeartbeatServer::heartbeat(THeartbeatResult& heartbeat_result,
 Status HeartbeatServer::_heartbeat(const TMasterInfo& master_info) {
     std::lock_guard<std::mutex> lk(_hb_mtx);
 
+    if (master_info.__isset.frontends_info) {
+        std::stringstream ss;
+        ss << "Heartbeat frontends info len: " << master_info.frontends_info.size();
+        for (auto info: master_info.frontends_info) {
+            ss << "; host:" << info.network_address.hostname
+            << ", port:" << info.network_address.port
+            << ", fe_start_time:" << info.fe_start_time
+            << ", is_alive:" << info.is_alive;
+            std::string coord_addr_str = info.network_address.hostname + ":" + std::to_string(info.network_address.port);
+            auto fsi_ptr = _exec_env->frontends_start_time().find(coord_addr_str);
+            if (LIKELY(fsi_ptr != _exec_env->frontends_start_time().end())) {
+                fsi_ptr->second->is_alive = info.is_alive;

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on pull request #7032: [Heartbeat] Support cancel outdated query after FE restart

Posted by GitBox <gi...@apache.org>.
morningman commented on pull request #7032:
URL: https://github.com/apache/incubator-doris/pull/7032#issuecomment-963904504


   And FE ut failed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #7032: [Heartbeat] Support cancel outdated query after FE restart

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on a change in pull request #7032:
URL: https://github.com/apache/incubator-doris/pull/7032#discussion_r745535484



##########
File path: gensrc/thrift/FrontendService.thrift
##########
@@ -695,6 +695,7 @@ struct TFrontendPingFrontendResult {
     4: required i32 rpcPort
     5: required i64 replayedJournalId
     6: required string version
+    7: required i64 startTime

Review comment:
       done

##########
File path: gensrc/thrift/HeartbeatService.thrift
##########
@@ -31,6 +31,13 @@ struct TMasterInfo {
     8: optional i64 backend_id
 }
 
+// Contain master
+struct TFrontendInfo {
+    1: required Types.TNetworkAddress network_address

Review comment:
       done

##########
File path: gensrc/thrift/HeartbeatService.thrift
##########
@@ -46,5 +53,5 @@ struct THeartbeatResult {
 }
 
 service HeartbeatService {
-    THeartbeatResult heartbeat(1:TMasterInfo master_info);
-}
+    THeartbeatResult heartbeat(1:TMasterInfo master_info, 2:list<TFrontendInfo> frontend_info);

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xinyiZzz commented on pull request #7032: [Heartbeat] Support cancel outdated query after FE restart

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on pull request #7032:
URL: https://github.com/apache/incubator-doris/pull/7032#issuecomment-965005560


   > And FE ut failed.
   
   done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #7032: [Heartbeat] Support cancel outdated query after FE restart

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on a change in pull request #7032:
URL: https://github.com/apache/incubator-doris/pull/7032#discussion_r746277869



##########
File path: be/src/runtime/exec_env.h
##########
@@ -219,6 +225,9 @@ class ExecEnv {
     HeartbeatFlags* _heartbeat_flags = nullptr;
 
     PluginMgr* _plugin_mgr = nullptr;
+
+    std::map<std::string, FrontendStartInfo*> _frontends_start_time;

Review comment:
       done, Thats right




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #7032: [Heartbeat] Support cancel outdated query after FE restart

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on a change in pull request #7032:
URL: https://github.com/apache/incubator-doris/pull/7032#discussion_r745379083



##########
File path: be/src/runtime/exec_env.h
##########
@@ -219,6 +225,9 @@ class ExecEnv {
     HeartbeatFlags* _heartbeat_flags = nullptr;
 
     PluginMgr* _plugin_mgr = nullptr;
+
+    std::map<std::string, FrontendStartInfo*> _frontends_start_time;

Review comment:
       Thanks for your advice,
   But `_frontends_start_time` I understand that there is no need to lock, it will only be modified when the BE is processing the heartbeat, and a BE process has only one thread to process the heartbeat.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #7032: [Heartbeat] Support cancel outdated query after FE restart

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #7032:
URL: https://github.com/apache/incubator-doris/pull/7032#discussion_r749890518



##########
File path: be/src/agent/heartbeat_server.cpp
##########
@@ -78,6 +78,32 @@ void HeartbeatServer::heartbeat(THeartbeatResult& heartbeat_result,
 Status HeartbeatServer::_heartbeat(const TMasterInfo& master_info) {
     std::lock_guard<std::mutex> lk(_hb_mtx);
 
+    if (master_info.__isset.frontends_info) {
+        std::stringstream ss;
+        ss << "Heartbeat frontends info len: " << master_info.frontends_info.size();
+        for (auto info: master_info.frontends_info) {

Review comment:
       ```suggestion
           for (auto& info: master_info.frontends_info) {
   ```

##########
File path: be/src/agent/heartbeat_server.h
##########
@@ -34,9 +35,22 @@ class StorageEngine;
 class Status;
 class ThriftServer;
 
+struct FrontendStartInfo {
+    int64_t start_time;
+    bool is_alive;
+    DateTimeValue* last_heartbeat; // Invalid time of Info

Review comment:
       why not just use int64?

##########
File path: be/src/agent/heartbeat_server.cpp
##########
@@ -78,6 +78,32 @@ void HeartbeatServer::heartbeat(THeartbeatResult& heartbeat_result,
 Status HeartbeatServer::_heartbeat(const TMasterInfo& master_info) {
     std::lock_guard<std::mutex> lk(_hb_mtx);
 
+    if (master_info.__isset.frontends_info) {
+        std::stringstream ss;
+        ss << "Heartbeat frontends info len: " << master_info.frontends_info.size();
+        for (auto info: master_info.frontends_info) {
+            ss << "; host:" << info.network_address.hostname
+            << ", port:" << info.network_address.port
+            << ", fe_start_time:" << info.fe_start_time
+            << ", is_alive:" << info.is_alive;
+            std::string coord_addr_str = info.network_address.hostname + ":" + std::to_string(info.network_address.port);
+            auto fsi_ptr = _exec_env->frontends_start_time().find(coord_addr_str);
+            if (LIKELY(fsi_ptr != _exec_env->frontends_start_time().end())) {
+                fsi_ptr->second->is_alive = info.is_alive;

Review comment:
       Need lock to update




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #7032: [Heartbeat] Support cancel outdated query after FE restart

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on a change in pull request #7032:
URL: https://github.com/apache/incubator-doris/pull/7032#discussion_r751867135



##########
File path: be/src/agent/heartbeat_server.h
##########
@@ -34,9 +35,22 @@ class StorageEngine;
 class Status;
 class ThriftServer;
 
+struct FrontendStartInfo {
+    int64_t start_time;
+    bool is_alive;
+    DateTimeValue* last_heartbeat; // Invalid time of Info

Review comment:
       Doris use int_64 uniformly?

##########
File path: be/src/agent/heartbeat_server.cpp
##########
@@ -78,6 +78,32 @@ void HeartbeatServer::heartbeat(THeartbeatResult& heartbeat_result,
 Status HeartbeatServer::_heartbeat(const TMasterInfo& master_info) {
     std::lock_guard<std::mutex> lk(_hb_mtx);
 
+    if (master_info.__isset.frontends_info) {
+        std::stringstream ss;
+        ss << "Heartbeat frontends info len: " << master_info.frontends_info.size();
+        for (auto info: master_info.frontends_info) {

Review comment:
       done, Thats good




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] xinyiZzz commented on a change in pull request #7032: [Heartbeat] Support cancel outdated query after FE restart

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on a change in pull request #7032:
URL: https://github.com/apache/incubator-doris/pull/7032#discussion_r751956364



##########
File path: be/src/agent/heartbeat_server.h
##########
@@ -34,9 +35,22 @@ class StorageEngine;
 class Status;
 class ThriftServer;
 
+struct FrontendStartInfo {
+    int64_t start_time;
+    bool is_alive;
+    DateTimeValue* last_heartbeat; // Invalid time of Info

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #7032: [Heartbeat] Support cancel outdated query after FE restart

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #7032:
URL: https://github.com/apache/incubator-doris/pull/7032#discussion_r745358245



##########
File path: gensrc/thrift/HeartbeatService.thrift
##########
@@ -46,5 +53,5 @@ struct THeartbeatResult {
 }
 
 service HeartbeatService {
-    THeartbeatResult heartbeat(1:TMasterInfo master_info);
-}
+    THeartbeatResult heartbeat(1:TMasterInfo master_info, 2:list<TFrontendInfo> frontend_info);

Review comment:
       This may not be compatible.

##########
File path: gensrc/thrift/FrontendService.thrift
##########
@@ -695,6 +695,7 @@ struct TFrontendPingFrontendResult {
     4: required i32 rpcPort
     5: required i64 replayedJournalId
     6: required string version
+    7: required i64 startTime

Review comment:
       Should be optional

##########
File path: be/src/runtime/exec_env.h
##########
@@ -219,6 +225,9 @@ class ExecEnv {
     HeartbeatFlags* _heartbeat_flags = nullptr;
 
     PluginMgr* _plugin_mgr = nullptr;
+
+    std::map<std::string, FrontendStartInfo*> _frontends_start_time;

Review comment:
       use unordered map.
   And this structure need to be protected by lock, you can try using `phmap::parallel_flat_hash_map`, which is a concurrent hash map of C++. You can refer to `be/src/util/brpc_stub_cache.h`

##########
File path: gensrc/thrift/HeartbeatService.thrift
##########
@@ -31,6 +31,13 @@ struct TMasterInfo {
     8: optional i64 backend_id
 }
 
+// Contain master
+struct TFrontendInfo {
+    1: required Types.TNetworkAddress network_address

Review comment:
       use optional




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org