You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pegasus.apache.org by "empiredan (via GitHub)" <gi...@apache.org> on 2023/04/18 04:17:07 UTC

[GitHub] [incubator-pegasus] empiredan opened a new pull request, #1453: fix(ddl_client): after receiving ERR_BUSY_CREATING or ERR_BUSY_DROPPING sleep for a while before retry

empiredan opened a new pull request, #1453:
URL: https://github.com/apache/incubator-pegasus/pull/1453

   https://github.com/apache/incubator-pegasus/issues/1449


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] acelyc111 commented on a diff in pull request #1453: fix(ddl_client): sleep for a while before retry once meta server is busy

Posted by "acelyc111 (via GitHub)" <gi...@apache.org>.
acelyc111 commented on code in PR #1453:
URL: https://github.com/apache/incubator-pegasus/pull/1453#discussion_r1174854643


##########
src/client/replication_ddl_client.cpp:
##########
@@ -1434,23 +1422,52 @@ bool replication_ddl_client::valid_app_char(int c)
     return (bool)std::isalnum(c) || c == '_' || c == '.' || c == ':';
 }
 
+namespace {
+
+bool need_retry(uint32_t attempt_count, const dsn::error_code &err)
+{
+    // For successful request, no need to retry.
+    if (err == dsn::ERR_OK) {
+        return false;
+    }
+
+    // As long as the max attempt count has not been reached, just do retry;
+    // otherwise, do not attempt again.
+    return attempt_count < FLAGS_ddl_client_max_attempt_count;
+}
+
+} // anonymous namespace
+
 void replication_ddl_client::end_meta_request(const rpc_response_task_ptr &callback,
-                                              int retry_times,
-                                              error_code err,
+                                              uint32_t attempt_count,

Review Comment:
   How about `attempted_count`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] levy5307 commented on a diff in pull request #1453: fix(ddl_client): after receiving ERR_BUSY_CREATING or ERR_BUSY_DROPPING sleep for a while before retry

Posted by "levy5307 (via GitHub)" <gi...@apache.org>.
levy5307 commented on code in PR #1453:
URL: https://github.com/apache/incubator-pegasus/pull/1453#discussion_r1172034079


##########
src/client/replication_ddl_client.cpp:
##########
@@ -1434,23 +1445,83 @@ bool replication_ddl_client::valid_app_char(int c)
     return (bool)std::isalnum(c) || c == '_' || c == '.' || c == ':';
 }
 
+namespace {
+
+inline bool is_busy(const dsn::error_code &err)

Review Comment:
   inline bool is_busy(const dsn::error_code &err) const



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] empiredan commented on a diff in pull request #1453: fix(ddl_client): after receiving ERR_BUSY_CREATING or ERR_BUSY_DROPPING sleep for a while before retry

Posted by "empiredan (via GitHub)" <gi...@apache.org>.
empiredan commented on code in PR #1453:
URL: https://github.com/apache/incubator-pegasus/pull/1453#discussion_r1172627671


##########
src/client/replication_ddl_client.cpp:
##########
@@ -1434,23 +1445,83 @@ bool replication_ddl_client::valid_app_char(int c)
     return (bool)std::isalnum(c) || c == '_' || c == '.' || c == ':';
 }
 
+namespace {
+
+inline bool is_busy(const dsn::error_code &err)

Review Comment:
   Well, actually this function is not a member of a class ~



##########
src/client/test/CMakeLists.txt:
##########
@@ -0,0 +1,37 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+set(MY_PROJ_NAME dsn_client_test)

Review Comment:
   OK, sure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] empiredan commented on a diff in pull request #1453: fix(ddl_client): after receiving ERR_BUSY_CREATING or ERR_BUSY_DROPPING sleep for a while before retry

Posted by "empiredan (via GitHub)" <gi...@apache.org>.
empiredan commented on code in PR #1453:
URL: https://github.com/apache/incubator-pegasus/pull/1453#discussion_r1169961465


##########
src/client/replication_ddl_client.cpp:
##########
@@ -1434,23 +1446,83 @@ bool replication_ddl_client::valid_app_char(int c)
     return (bool)std::isalnum(c) || c == '_' || c == '.' || c == ':';
 }
 
+namespace {
+
+bool is_busy(const dsn::error_code &err)
+{
+    return err == dsn::ERR_BUSY_CREATING || err == dsn::ERR_BUSY_DROPPING;
+}
+
+bool need_retry(uint32_t attempt_count, uint32_t busy_attempt_count, const dsn::error_code &err)
+{
+    if (err == dsn::ERR_OK) {
+        return false;
+    }
+
+    if (attempt_count < FLAGS_ddl_client_max_attempt_count) {
+        return true;
+    }
+
+    if (!is_busy(err)) {
+        return false;
+    }
+
+    return busy_attempt_count < FLAGS_ddl_client_max_attempt_count;
+}
+
+bool should_sleep_before_retry(const dsn::error_code &err, uint32_t &sleep_ms)
+{
+    if (is_busy(err)) {
+        sleep_ms = FLAGS_ddl_client_busy_retry_interval_ms;
+        return true;
+    }
+
+    return false;
+}
+
+} // anonymous namespace
+
 void replication_ddl_client::end_meta_request(const rpc_response_task_ptr &callback,
-                                              int retry_times,
-                                              error_code err,
+                                              uint32_t attempt_count,
+                                              uint32_t busy_attempt_count,
+                                              const error_code &err,
                                               dsn::message_ex *request,
                                               dsn::message_ex *resp)
 {
-    if (err != dsn::ERR_OK && retry_times < 2) {
-        rpc::call(_meta_server,
-                  request,
-                  &_tracker,
-                  [this, retry_times, callback](
-                      error_code err, dsn::message_ex *request, dsn::message_ex *response) mutable {
-                      end_meta_request(callback, retry_times + 1, err, request, response);
-                  });
-    } else {
-        callback->enqueue(err, (message_ex *)resp);
+    ++attempt_count;
+    if (is_busy(err)) {
+        ++busy_attempt_count;
     }
+
+    LOG_INFO("attempt {}: attempt_count={}, busy_attempt_count={}, max_replica_count={}, err={}",
+             request->local_rpc_code,
+             attempt_count,
+             busy_attempt_count,
+             FLAGS_ddl_client_max_attempt_count,
+             err);
+
+    if (!need_retry(attempt_count, busy_attempt_count, err)) {
+        callback->enqueue(err, (message_ex *)resp);
+        return;
+    }
+
+    uint32_t sleep_ms = 0;

Review Comment:
   OK, I'll use `FLAGS_ddl_client_busy_retry_interval_ms` instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] acelyc111 commented on a diff in pull request #1453: fix(ddl_client): after receiving ERR_BUSY_CREATING or ERR_BUSY_DROPPING sleep for a while before retry

Posted by "acelyc111 (via GitHub)" <gi...@apache.org>.
acelyc111 commented on code in PR #1453:
URL: https://github.com/apache/incubator-pegasus/pull/1453#discussion_r1171492086


##########
src/client/test/CMakeLists.txt:
##########
@@ -0,0 +1,37 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+set(MY_PROJ_NAME dsn_client_test)

Review Comment:
   The new test should be added to CIs.



##########
src/client/replication_ddl_client.h:
##########
@@ -346,6 +356,22 @@ class replication_ddl_client
     dsn::task_tracker _tracker;
     uint32_t _max_wait_secs = 3600; // Wait at most 1 hour by default.
 
+#ifdef PEGASUS_UNIT_TEST
+    FRIEND_TEST(DDLClientTest, RetryEndMetaRequest);
+    void set_mock_errors(const std::vector<dsn::error_code> &mock_errors)
+    {
+        _mock_errors.assign(mock_errors.begin(), mock_errors.end());
+    }
+    dsn::error_code get_mock_error()

Review Comment:
   It seems `FAIL_POINT_INJECT_F` is able to achieve this aim?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] acelyc111 commented on a diff in pull request #1453: fix(ddl_client): after receiving ERR_BUSY_CREATING or ERR_BUSY_DROPPING sleep for a while before retry

Posted by "acelyc111 (via GitHub)" <gi...@apache.org>.
acelyc111 commented on code in PR #1453:
URL: https://github.com/apache/incubator-pegasus/pull/1453#discussion_r1169705893


##########
src/client/replication_ddl_client.cpp:
##########
@@ -1434,23 +1446,83 @@ bool replication_ddl_client::valid_app_char(int c)
     return (bool)std::isalnum(c) || c == '_' || c == '.' || c == ':';
 }
 
+namespace {
+
+bool is_busy(const dsn::error_code &err)
+{
+    return err == dsn::ERR_BUSY_CREATING || err == dsn::ERR_BUSY_DROPPING;
+}
+
+bool need_retry(uint32_t attempt_count, uint32_t busy_attempt_count, const dsn::error_code &err)
+{
+    if (err == dsn::ERR_OK) {
+        return false;
+    }
+
+    if (attempt_count < FLAGS_ddl_client_max_attempt_count) {
+        return true;
+    }
+
+    if (!is_busy(err)) {
+        return false;
+    }
+
+    return busy_attempt_count < FLAGS_ddl_client_max_attempt_count;
+}
+
+bool should_sleep_before_retry(const dsn::error_code &err, uint32_t &sleep_ms)
+{
+    if (is_busy(err)) {
+        sleep_ms = FLAGS_ddl_client_busy_retry_interval_ms;
+        return true;
+    }
+
+    return false;
+}
+
+} // anonymous namespace
+
 void replication_ddl_client::end_meta_request(const rpc_response_task_ptr &callback,
-                                              int retry_times,
-                                              error_code err,
+                                              uint32_t attempt_count,
+                                              uint32_t busy_attempt_count,
+                                              const error_code &err,
                                               dsn::message_ex *request,
                                               dsn::message_ex *resp)
 {
-    if (err != dsn::ERR_OK && retry_times < 2) {
-        rpc::call(_meta_server,
-                  request,
-                  &_tracker,
-                  [this, retry_times, callback](
-                      error_code err, dsn::message_ex *request, dsn::message_ex *response) mutable {
-                      end_meta_request(callback, retry_times + 1, err, request, response);
-                  });
-    } else {
-        callback->enqueue(err, (message_ex *)resp);
+    ++attempt_count;
+    if (is_busy(err)) {
+        ++busy_attempt_count;
     }
+
+    LOG_INFO("attempt {}: attempt_count={}, busy_attempt_count={}, max_replica_count={}, err={}",
+             request->local_rpc_code,
+             attempt_count,
+             busy_attempt_count,
+             FLAGS_ddl_client_max_attempt_count,
+             err);
+
+    if (!need_retry(attempt_count, busy_attempt_count, err)) {
+        callback->enqueue(err, (message_ex *)resp);
+        return;
+    }
+
+    uint32_t sleep_ms = 0;

Review Comment:
   This variable seems not very useful, how about remove it and use FLAGS_ddl_client_busy_retry_interval_ms directly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] empiredan commented on a diff in pull request #1453: fix(ddl_client): after receiving ERR_BUSY_CREATING or ERR_BUSY_DROPPING sleep for a while before retry

Posted by "empiredan (via GitHub)" <gi...@apache.org>.
empiredan commented on code in PR #1453:
URL: https://github.com/apache/incubator-pegasus/pull/1453#discussion_r1172640085


##########
src/client/replication_ddl_client.h:
##########
@@ -346,6 +356,22 @@ class replication_ddl_client
     dsn::task_tracker _tracker;
     uint32_t _max_wait_secs = 3600; // Wait at most 1 hour by default.
 
+#ifdef PEGASUS_UNIT_TEST
+    FRIEND_TEST(DDLClientTest, RetryEndMetaRequest);
+    void set_mock_errors(const std::vector<dsn::error_code> &mock_errors)
+    {
+        _mock_errors.assign(mock_errors.begin(), mock_errors.end());
+    }
+    dsn::error_code get_mock_error()

Review Comment:
   OK, good idea. Actually I've tried to use macro `#ifdef` to introduce extra member variables for `replication_ddl_client` which is dedicated to unit tests, however many strange errors happened (for example, a `std::deque` typed member variable was uninitialized and had a super large size()), and it also violated [ODR](https://en.cppreference.com/w/cpp/language/definition).
   
   Thus I decide to use `FAIL_POINT_INJECT_NOT_RETURN_F` instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] empiredan merged pull request #1453: fix(ddl_client): sleep for a while before retry once meta server is busy

Posted by "empiredan (via GitHub)" <gi...@apache.org>.
empiredan merged PR #1453:
URL: https://github.com/apache/incubator-pegasus/pull/1453


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org