You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@apisix.apache.org by GitBox <gi...@apache.org> on 2021/09/29 10:35:30 UTC

[GitHub] [apisix] shuaijinchao opened a new pull request #5158: bugfix: etcd cluster single node failure APISIX startup failure

shuaijinchao opened a new pull request #5158:
URL: https://github.com/apache/apisix/pull/5158


   ### What this PR does / why we need it:
   FIX #5115 
   
   ### Pre-submission checklist:
   
   * [x] Did you explain what problem does this PR solve? Or what new features have been added?
   * [x] Have you added corresponding test cases?
   * [ ] Have you modified the corresponding document?
   * [x] Is this PR backward compatible? **If it is not backward compatible, please discuss on the [mailing list](https://github.com/apache/apisix/tree/master#community) first**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r728604884



##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,

Review comment:
       OK thx~




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
spacewander commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r728586349



##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,

Review comment:
       I have added a new suggestion




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
spacewander commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r728562595



##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0

Review comment:
       But it is still a health check test.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
spacewander commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r727948592



##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            io_stderr:write(str_format("request etcd endpoint \'%s\' error, %s\n", version_url,
+                    err))
         end
     end
 
+    if #etcd_healthy_hosts <= 0 then
+        util.die("all etcd nodes are unavailable\n")
+    end
+
+    if host_count >= 2 and (#etcd_healthy_hosts / host_count * 100) < 50 then

Review comment:
       Is `host_count >= 2` necessary?

##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,

Review comment:
       Bad indent

##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0

Review comment:
       Should we merge it into test_etcd_healthcheck?

##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            io_stderr:write(str_format("request etcd endpoint \'%s\' error, %s\n", version_url,
+                    err))
         end
     end
 
+    if #etcd_healthy_hosts <= 0 then
+        util.die("all etcd nodes are unavailable\n")
+    end
+
+    if host_count >= 2 and (#etcd_healthy_hosts / host_count * 100) < 50 then
+        util.die("the etcd cluster needs at least 50% and above healthy nodes\n")

Review comment:
       We need > 50%, not just >= 50%




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
spacewander commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r728584212



##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ",

Review comment:
       ```suggestion
                            " is less than the required version ",
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r727983403



##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +208,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ",
+                        env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            print(str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err))

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
spacewander commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r720660409



##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +208,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ",
+                        env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            print(str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err))

Review comment:
       Should write to stderr

##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +208,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ",
+                        env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            print(str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err))
         end
     end
 
+    if #etcd_healthy_hosts <= 0 then
+        util.die("all etcd nodes are unavailable\n")
+    end
+
+    if host_count >= 2 and #etcd_healthy_hosts < 2 then

Review comment:
       Better to check if more than half of the nodes are healthy. 

##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +208,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ",

Review comment:
       Bad indent




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r728649749



##########
File path: apisix/cli/etcd.lua
##########
@@ -218,24 +219,22 @@ function _M.init(env, args)
 
             local cluster_version = body["etcdcluster"]
             if compare_semantic_version(cluster_version, env.min_etcd_version) then
-                util.die("etcd cluster version ", cluster_version,
-                        " is less than the required version ",
-                        env.min_etcd_version,
-                        ", please upgrade your etcd cluster\n")
+                util.die("etcd cluster version ", cluster_version, " is less than the required version ",
+                        env.min_etcd_version, ", please upgrade your etcd cluster\n")

Review comment:
       OK, thx~




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r727983110



##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0
+ETCD_NAME_1=etcd1
+ETCD_NAME_2=etcd2
+
+echo '
+etcd:
+  host:
+    - "http://127.0.0.1:23790"
+    - "http://127.0.0.1:23791"
+    - "http://127.0.0.1:23792"
+' > conf/config.yaml
+
+docker-compose -f ./t/cli/docker-compose-etcd-cluster.yaml up -d
+
+# case 1: stop one etcd nodes (result: start successful)

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r727982860



##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0
+ETCD_NAME_1=etcd1
+ETCD_NAME_2=etcd2
+
+echo '
+etcd:
+  host:
+    - "http://127.0.0.1:23790"
+    - "http://127.0.0.1:23791"
+    - "http://127.0.0.1:23792"
+' > conf/config.yaml
+
+docker-compose -f ./t/cli/docker-compose-etcd-cluster.yaml up -d
+
+# case 1: stop one etcd nodes (result: start successful)
+docker stop ${ETCD_NAME_0}
+
+out=$(make init 2>&1)
+if echo "$out" | grep "23790" | grep "connection refused"; then
+    echo "passed: APISIX successfully to start, stop only one etcd node"
+else
+    echo "failed: stop only one etcd node APISIX should start normally"
+    exit 1
+fi
+
+# case 2: stop two etcd nodes (result: start failure)
+docker stop ${ETCD_NAME_1}
+
+out=$(make init 2>&1)
+if echo "$out" | grep "23791" | grep "connection refused"; then
+    echo "passed: APISIX failed to start, etcd cluster must have two or more healthy nodes"
+else
+    echo "failed: etcd has stopped two nodes, APISIX should fail to start"

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
spacewander commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r728562595



##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0

Review comment:
       But it is still a health node check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on pull request #5158: bugfix: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#issuecomment-930070731


   This PR requires lua-resty-etcd to support round robin check when health check is not enabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r727982709



##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,

Review comment:
       I don't really understand here, how is the correct way of indentation?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
spacewander commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r728587511



##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0

Review comment:
       Maybe we can rename test_etcd_healthcheck to test_etcd_ha, and add this test in it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] tokers commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
tokers commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r727910452



##########
File path: apisix/cli/etcd.lua
##########
@@ -218,24 +219,22 @@ function _M.init(env, args)
 
             local cluster_version = body["etcdcluster"]
             if compare_semantic_version(cluster_version, env.min_etcd_version) then
-                util.die("etcd cluster version ", cluster_version,
-                        " is less than the required version ",
-                        env.min_etcd_version,
-                        ", please upgrade your etcd cluster\n")
+                util.die("etcd cluster version ", cluster_version, " is less than the required version ",
+                        env.min_etcd_version, ", please upgrade your etcd cluster\n")

Review comment:
       ```suggestion
                            env.min_etcd_version, ", please upgrade your etcd cluster\n")
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r727980750



##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0

Review comment:
       I don't think there is a need to merge, This test file only needs to test the part of the ETCD CLI, does not use resty.etcd and resty.health_check clients, and does not need to perform e2e tests like test_etcd_healthcheck.sh.

##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            io_stderr:write(str_format("request etcd endpoint \'%s\' error, %s\n", version_url,
+                    err))
         end
     end
 
+    if #etcd_healthy_hosts <= 0 then
+        util.die("all etcd nodes are unavailable\n")
+    end
+
+    if host_count >= 2 and (#etcd_healthy_hosts / host_count * 100) < 50 then
+        util.die("the etcd cluster needs at least 50% and above healthy nodes\n")

Review comment:
       OK

##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            io_stderr:write(str_format("request etcd endpoint \'%s\' error, %s\n", version_url,
+                    err))
         end
     end
 
+    if #etcd_healthy_hosts <= 0 then
+        util.die("all etcd nodes are unavailable\n")
+    end
+
+    if host_count >= 2 and (#etcd_healthy_hosts / host_count * 100) < 50 then

Review comment:
       yes, it can be deleted

##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,

Review comment:
       I don't really understand here, how is the correct way of indentation?

##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0
+ETCD_NAME_1=etcd1
+ETCD_NAME_2=etcd2
+
+echo '
+etcd:
+  host:
+    - "http://127.0.0.1:23790"
+    - "http://127.0.0.1:23791"
+    - "http://127.0.0.1:23792"
+' > conf/config.yaml
+
+docker-compose -f ./t/cli/docker-compose-etcd-cluster.yaml up -d
+
+# case 1: stop one etcd nodes (result: start successful)
+docker stop ${ETCD_NAME_0}
+
+out=$(make init 2>&1)
+if echo "$out" | grep "23790" | grep "connection refused"; then
+    echo "passed: APISIX successfully to start, stop only one etcd node"
+else
+    echo "failed: stop only one etcd node APISIX should start normally"
+    exit 1
+fi
+
+# case 2: stop two etcd nodes (result: start failure)
+docker stop ${ETCD_NAME_1}
+
+out=$(make init 2>&1)
+if echo "$out" | grep "23791" | grep "connection refused"; then
+    echo "passed: APISIX failed to start, etcd cluster must have two or more healthy nodes"
+else
+    echo "failed: etcd has stopped two nodes, APISIX should fail to start"

Review comment:
       done

##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0
+ETCD_NAME_1=etcd1
+ETCD_NAME_2=etcd2
+
+echo '
+etcd:
+  host:
+    - "http://127.0.0.1:23790"
+    - "http://127.0.0.1:23791"
+    - "http://127.0.0.1:23792"
+' > conf/config.yaml
+
+docker-compose -f ./t/cli/docker-compose-etcd-cluster.yaml up -d
+
+# case 1: stop one etcd nodes (result: start successful)

Review comment:
       done

##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +208,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ",
+                        env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            print(str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err))

Review comment:
       done

##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +208,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ",
+                        env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            print(str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err))
         end
     end
 
+    if #etcd_healthy_hosts <= 0 then
+        util.die("all etcd nodes are unavailable\n")
+    end
+
+    if host_count >= 2 and #etcd_healthy_hosts < 2 then

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r727980750



##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0

Review comment:
       I don't think there is a need to merge, This test file only needs to test the part of the ETCD CLI, does not use resty.etcd and resty.health_check clients, and does not need to perform e2e tests like test_etcd_healthcheck.sh.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] tokers commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
tokers commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r727910452



##########
File path: apisix/cli/etcd.lua
##########
@@ -218,24 +219,22 @@ function _M.init(env, args)
 
             local cluster_version = body["etcdcluster"]
             if compare_semantic_version(cluster_version, env.min_etcd_version) then
-                util.die("etcd cluster version ", cluster_version,
-                        " is less than the required version ",
-                        env.min_etcd_version,
-                        ", please upgrade your etcd cluster\n")
+                util.die("etcd cluster version ", cluster_version, " is less than the required version ",
+                        env.min_etcd_version, ", please upgrade your etcd cluster\n")

Review comment:
       ```suggestion
                            env.min_etcd_version, ", please upgrade your etcd cluster\n")
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
spacewander commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r727948592



##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            io_stderr:write(str_format("request etcd endpoint \'%s\' error, %s\n", version_url,
+                    err))
         end
     end
 
+    if #etcd_healthy_hosts <= 0 then
+        util.die("all etcd nodes are unavailable\n")
+    end
+
+    if host_count >= 2 and (#etcd_healthy_hosts / host_count * 100) < 50 then

Review comment:
       Is `host_count >= 2` necessary?

##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,

Review comment:
       Bad indent

##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0

Review comment:
       Should we merge it into test_etcd_healthcheck?

##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            io_stderr:write(str_format("request etcd endpoint \'%s\' error, %s\n", version_url,
+                    err))
         end
     end
 
+    if #etcd_healthy_hosts <= 0 then
+        util.die("all etcd nodes are unavailable\n")
+    end
+
+    if host_count >= 2 and (#etcd_healthy_hosts / host_count * 100) < 50 then
+        util.die("the etcd cluster needs at least 50% and above healthy nodes\n")

Review comment:
       We need > 50%, not just >= 50%




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander merged pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
spacewander merged pull request #5158:
URL: https://github.com/apache/apisix/pull/5158


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r728594956



##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0

Review comment:
       Let's use test_etcd_healthcheck first. I merged the case of test_etcd_ha into test_etcd_healthcheck. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] okaybase commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
okaybase commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r720650204



##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0
+ETCD_NAME_1=etcd1
+ETCD_NAME_2=etcd2
+
+echo '
+etcd:
+  host:
+    - "http://127.0.0.1:23790"
+    - "http://127.0.0.1:23791"
+    - "http://127.0.0.1:23792"
+' > conf/config.yaml
+
+docker-compose -f ./t/cli/docker-compose-etcd-cluster.yaml up -d
+
+# case 1: stop one etcd nodes (result: start successful)
+docker stop ${ETCD_NAME_0}
+
+out=$(make init 2>&1)
+if echo "$out" | grep "23790" | grep "connection refused"; then
+    echo "passed: APISIX successfully to start, stop only one etcd node"
+else
+    echo "failed: stop only one etcd node APISIX should start normally"
+    exit 1
+fi
+
+# case 2: stop two etcd nodes (result: start failure)
+docker stop ${ETCD_NAME_1}
+
+out=$(make init 2>&1)
+if echo "$out" | grep "23791" | grep "connection refused"; then
+    echo "passed: APISIX failed to start, etcd cluster must have two or more healthy nodes"
+else
+    echo "failed: etcd has stopped two nodes, APISIX should fail to start"

Review comment:
       ```suggestion
       echo "failed: two etcd nodes have been stopped, APISIX should fail to start"
   ```

##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0
+ETCD_NAME_1=etcd1
+ETCD_NAME_2=etcd2
+
+echo '
+etcd:
+  host:
+    - "http://127.0.0.1:23790"
+    - "http://127.0.0.1:23791"
+    - "http://127.0.0.1:23792"
+' > conf/config.yaml
+
+docker-compose -f ./t/cli/docker-compose-etcd-cluster.yaml up -d
+
+# case 1: stop one etcd nodes (result: start successful)

Review comment:
       ```suggestion
   # case 1: stop one etcd node (result: start successful)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r728649340



##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0

Review comment:
       fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r727981487



##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            io_stderr:write(str_format("request etcd endpoint \'%s\' error, %s\n", version_url,
+                    err))
         end
     end
 
+    if #etcd_healthy_hosts <= 0 then
+        util.die("all etcd nodes are unavailable\n")
+    end
+
+    if host_count >= 2 and (#etcd_healthy_hosts / host_count * 100) < 50 then

Review comment:
       yes, it can be deleted




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r727980875



##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ", env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            io_stderr:write(str_format("request etcd endpoint \'%s\' error, %s\n", version_url,
+                    err))
         end
     end
 
+    if #etcd_healthy_hosts <= 0 then
+        util.die("all etcd nodes are unavailable\n")
+    end
+
+    if host_count >= 2 and (#etcd_healthy_hosts / host_count * 100) < 50 then
+        util.die("the etcd cluster needs at least 50% and above healthy nodes\n")

Review comment:
       OK




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on pull request #5158: bugfix: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#issuecomment-930070731


   This PR requires lua-resty-etcd to support round robin check when health check is not enabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r728604699



##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ",

Review comment:
       OK thx~




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r728610392



##########
File path: t/cli/test_etcd_ha.sh
##########
@@ -0,0 +1,68 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+ETCD_NAME_0=etcd0

Review comment:
       @spacewander After merging the test cases, the util.die in the cli will issue an exit signal and cause the Makefile to terminate abnormally. Do you have a suggested solution?
   https://github.com/apache/apisix/pull/5158/checks?check_run_id=3890204660#step:8:2226




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r727983525



##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +208,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ",
+                        env.min_etcd_version,
+                        ", please upgrade your etcd cluster\n")
+            end
 
-        local cluster_version = body["etcdcluster"]
-        if compare_semantic_version(cluster_version, env.min_etcd_version) then
-            util.die("etcd cluster version ", cluster_version,
-                     " is less than the required version ",
-                     env.min_etcd_version,
-                     ", please upgrade your etcd cluster\n")
+            table_insert(etcd_healthy_hosts, host)
+        else
+            print(str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err))
         end
     end
 
+    if #etcd_healthy_hosts <= 0 then
+        util.die("all etcd nodes are unavailable\n")
+    end
+
+    if host_count >= 2 and #etcd_healthy_hosts < 2 then

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] shuaijinchao commented on a change in pull request #5158: feat: etcd cluster single node failure APISIX startup failure

Posted by GitBox <gi...@apache.org>.
shuaijinchao commented on a change in pull request #5158:
URL: https://github.com/apache/apisix/pull/5158#discussion_r728604732



##########
File path: apisix/cli/etcd.lua
##########
@@ -206,29 +209,38 @@ function _M.init(env, args)
                              version_url, err, retry_time))
         end
 
-        if not res then
-            errmsg = str_format("request etcd endpoint \'%s\' error, %s\n", version_url, err)
-            util.die(errmsg)
-        end
+        if res then
+            local body, _, err = dkjson.decode(res)
+            if err or (body and not body["etcdcluster"]) then
+                errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
+                        version_url)
+                util.die(errmsg)
+            end
 
-        local body, _, err = dkjson.decode(res)
-        if err or (body and not body["etcdcluster"]) then
-            errmsg = str_format("got malformed version message: \"%s\" from etcd \"%s\"\n", res,
-                                version_url)
-            util.die(errmsg)
-        end
+            local cluster_version = body["etcdcluster"]
+            if compare_semantic_version(cluster_version, env.min_etcd_version) then
+                util.die("etcd cluster version ", cluster_version,
+                        " is less than the required version ",

Review comment:
       Fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org