You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@apisix.apache.org by GitBox <gi...@apache.org> on 2022/09/14 01:35:39 UTC

[GitHub] [apisix] SylviaBABY commented on a diff in pull request #7906: docs: add an FAQ item for apisix high latency due to etcd

SylviaBABY commented on code in PR #7906:
URL: https://github.com/apache/apisix/pull/7906#discussion_r970212495


##########
docs/zh/latest/FAQ.md:
##########
@@ -627,6 +627,58 @@ curl http://127.0.0.1:9180/apisix/admin/routes/health-info \
 
 :::
 
+## APISIX 与 ETCD 相关的延迟较高的问题有哪些,如何修复?
+
+ETCD 作为 APISIX 的数据存储组件,它的稳定性关乎 APISIX 的稳定性。在实际场景中,如果 APISIX 使用证书通过 HTTPS 的方式连接 ETCD,可能会出现以下 2 种数据查询或写入延迟较高的问题:
+
+1. 通过接口操作 APISIX Admin API 进行数据的查询或写入,延迟较高。
+2. 在监控系统中,Prometheus 抓取 APISIX 数据面 Metrics 接口超时。
+
+这些延迟问题,严重影响了 APISIX 的服务稳定性,而之所以会出现这类问题,主要是因为 ETCD 对外提供了 2 种操作方式:HTTP(HTTPS)、gRPC。而 APISIX 是基于 HTTP(HTTPS)协议来操作 ETCD 的。
+在这个场景中,ETCD 存在一个关于 HTTP/2 的 BUG:如果通过 HTTPS 操作 ETCD(HTTP 不受影响),HTTP/2 的连接数上限为 Golang 默认的 `250` 个。
+所以,当 APISIX 数据面节点数较多时,一旦所有 APISIX 节点与 ETCD 连接数超过这个上限,则 APISIX 的接口响应会非常的慢。
+
+Golang 中,默认的 HTTP/2 上限为 `250`,代码如下:
+
+```go
+package http2
+
+import ...
+
+const (
+	prefaceTimeout         = 10 * time.Second
+	firstSettingsTimeout   = 2 * time.Second // should be in-flight with preface anyway
+	handlerChunkWriteSize  = 4 << 10
+	defaultMaxStreams      = 250 // TODO: make this 100 as the GFE seems to?
+	maxQueuedControlFrames = 10000
+)
+
+```
+
+目前,ETCD 官方主要维护了 `3.4` 和 `3.5` 2 个主要版本。
+而 `3.4` 已有近期发布的 `3.4.20` 修复了这个问题。
+至于 `3.5` 版本,其实,官方很早之前就在筹备发布 `3.5.5` 版本了,但截止目前(2022.09.13)也尚未发布。所以,如果你使用的是 ETCD 的版本小于 `3.5.5`,可以有几种方式解决这个问题:
+
+1. APISIX 与 ETCD 的通讯方式,由 HTTPS 改为 HTTP。
+2. 回退版本到 `3.4.20`。

Review Comment:
   ```suggestion
   2. 将 ETCD 版本回退到 `3.4.20`。
   ```



##########
docs/zh/latest/FAQ.md:
##########
@@ -627,6 +627,58 @@ curl http://127.0.0.1:9180/apisix/admin/routes/health-info \
 
 :::
 
+## APISIX 与 ETCD 相关的延迟较高的问题有哪些,如何修复?
+
+ETCD 作为 APISIX 的数据存储组件,它的稳定性关乎 APISIX 的稳定性。在实际场景中,如果 APISIX 使用证书通过 HTTPS 的方式连接 ETCD,可能会出现以下 2 种数据查询或写入延迟较高的问题:
+
+1. 通过接口操作 APISIX Admin API 进行数据的查询或写入,延迟较高。
+2. 在监控系统中,Prometheus 抓取 APISIX 数据面 Metrics 接口超时。
+
+这些延迟问题,严重影响了 APISIX 的服务稳定性,而之所以会出现这类问题,主要是因为 ETCD 对外提供了 2 种操作方式:HTTP(HTTPS)、gRPC。而 APISIX 是基于 HTTP(HTTPS)协议来操作 ETCD 的。
+在这个场景中,ETCD 存在一个关于 HTTP/2 的 BUG:如果通过 HTTPS 操作 ETCD(HTTP 不受影响),HTTP/2 的连接数上限为 Golang 默认的 `250` 个。
+所以,当 APISIX 数据面节点数较多时,一旦所有 APISIX 节点与 ETCD 连接数超过这个上限,则 APISIX 的接口响应会非常的慢。
+
+Golang 中,默认的 HTTP/2 上限为 `250`,代码如下:
+
+```go
+package http2
+
+import ...
+
+const (
+	prefaceTimeout         = 10 * time.Second
+	firstSettingsTimeout   = 2 * time.Second // should be in-flight with preface anyway
+	handlerChunkWriteSize  = 4 << 10
+	defaultMaxStreams      = 250 // TODO: make this 100 as the GFE seems to?
+	maxQueuedControlFrames = 10000
+)
+
+```
+
+目前,ETCD 官方主要维护了 `3.4` 和 `3.5` 2 个主要版本。
+而 `3.4` 已有近期发布的 `3.4.20` 修复了这个问题。
+至于 `3.5` 版本,其实,官方很早之前就在筹备发布 `3.5.5` 版本了,但截止目前(2022.09.13)也尚未发布。所以,如果你使用的是 ETCD 的版本小于 `3.5.5`,可以有几种方式解决这个问题:
+
+1. APISIX 与 ETCD 的通讯方式,由 HTTPS 改为 HTTP。

Review Comment:
   ```suggestion
   1. 将 APISIX 与 ETCD 的通讯方式由 HTTPS 改为 HTTP。
   ```



##########
docs/en/latest/FAQ.md:
##########
@@ -626,6 +626,59 @@ This method only detects whether the APISIX data plane is alive or not. It does
 
 :::
 
+## What are the scenarios with high APISIX latency related to ETCD and how to fix them?
+
+ETCD is the data storage component of apisix, and its stability is related to the stability of APISIX.
+
+In actual scenarios, if APISIX uses a certificate to connect to ETCD through HTTPS, the following two problems of high latency for data query or writing may occur:
+
+1. Query or write data through APISIX Admin API.
+2. In the monitoring scenario, Prometheus crawls the APISIX data plane Metrics API timeout.
+
+These problems related to higher latency seriously affect the service stability of APISIX, and the reason why such problems occur is mainly because ETCD provides two modes of operation: HTTP (HTTPS) and gRPC. And APISIX uses the HTTP (HTTPS) protocol to operate ETCD.
+In this scenario, ETCD has a bug about HTTP/2: if ETCD is operated over HTTPS (HTTP is not affected), the upper limit of HTTP/2 connections is the default `250` in Golang. Therefore, when the number of APISIX data plane nodes is large, once the number of connections between all APISIX nodes and ETCD exceeds this upper limit, the response of APISIX API interface will be very slow.
+
+In Golang, the default upper limit of HTTP/2 connections is `250`, the code is as follows:
+
+```go
+package http2
+
+import ...
+
+const (
+	prefaceTimeout         = 10 * time.Second
+	firstSettingsTimeout   = 2 * time.Second // should be in-flight with preface anyway
+	handlerChunkWriteSize  = 4 << 10
+	defaultMaxStreams      = 250 // TODO: make this 100 as the GFE seems to?
+	maxQueuedControlFrames = 10000
+)
+
+```
+
+At present, ETCD officially maintains two main branches, `3.4` and `3.5`.
+The `3.4` branch has the recently released `3.4.20` which fixes this issue.
+As for the `3.5` branch, in fact, the official is preparing to release the `3.5.5` version a long time ago, but it has not been released so far. So, if you are using a version of ETCD less than `3.5.5`, there are several ways to solve this problem:
+
+1. Change the communication method between APISIX and ETCD from HTTPS to HTTP.
+2. Fallback version to `3.4.20`.

Review Comment:
   ```suggestion
   2. Roll back the ETCD version to `3.4.20`.
   ```



##########
docs/en/latest/FAQ.md:
##########
@@ -626,6 +626,59 @@ This method only detects whether the APISIX data plane is alive or not. It does
 
 :::
 
+## What are the scenarios with high APISIX latency related to ETCD and how to fix them?
+
+ETCD is the data storage component of apisix, and its stability is related to the stability of APISIX.
+
+In actual scenarios, if APISIX uses a certificate to connect to ETCD through HTTPS, the following two problems of high latency for data query or writing may occur:
+
+1. Query or write data through APISIX Admin API.
+2. In the monitoring scenario, Prometheus crawls the APISIX data plane Metrics API timeout.
+
+These problems related to higher latency seriously affect the service stability of APISIX, and the reason why such problems occur is mainly because ETCD provides two modes of operation: HTTP (HTTPS) and gRPC. And APISIX uses the HTTP (HTTPS) protocol to operate ETCD.
+In this scenario, ETCD has a bug about HTTP/2: if ETCD is operated over HTTPS (HTTP is not affected), the upper limit of HTTP/2 connections is the default `250` in Golang. Therefore, when the number of APISIX data plane nodes is large, once the number of connections between all APISIX nodes and ETCD exceeds this upper limit, the response of APISIX API interface will be very slow.
+
+In Golang, the default upper limit of HTTP/2 connections is `250`, the code is as follows:
+
+```go
+package http2
+
+import ...
+
+const (
+	prefaceTimeout         = 10 * time.Second
+	firstSettingsTimeout   = 2 * time.Second // should be in-flight with preface anyway
+	handlerChunkWriteSize  = 4 << 10
+	defaultMaxStreams      = 250 // TODO: make this 100 as the GFE seems to?
+	maxQueuedControlFrames = 10000
+)
+
+```
+
+At present, ETCD officially maintains two main branches, `3.4` and `3.5`.

Review Comment:
   ```suggestion
   ETCD officially maintains two main versions, `3.4` and `3.5`. In the `3.4` series, the recently released `3.4.20` version has fixed this issue. As for the `3.5` version, the official was preparing to release the `3.5.5` version a long time ago, but it has not been released as of now (2022.09.13). So, if you are using ETCD version less than `3.5.5`, you can refer to the following ways to solve this problem:
   ```



##########
docs/zh/latest/FAQ.md:
##########
@@ -627,6 +627,58 @@ curl http://127.0.0.1:9180/apisix/admin/routes/health-info \
 
 :::
 
+## APISIX 与 ETCD 相关的延迟较高的问题有哪些,如何修复?
+
+ETCD 作为 APISIX 的数据存储组件,它的稳定性关乎 APISIX 的稳定性。在实际场景中,如果 APISIX 使用证书通过 HTTPS 的方式连接 ETCD,可能会出现以下 2 种数据查询或写入延迟较高的问题:
+
+1. 通过接口操作 APISIX Admin API 进行数据的查询或写入,延迟较高。
+2. 在监控系统中,Prometheus 抓取 APISIX 数据面 Metrics 接口超时。
+
+这些延迟问题,严重影响了 APISIX 的服务稳定性,而之所以会出现这类问题,主要是因为 ETCD 对外提供了 2 种操作方式:HTTP(HTTPS)、gRPC。而 APISIX 是基于 HTTP(HTTPS)协议来操作 ETCD 的。
+在这个场景中,ETCD 存在一个关于 HTTP/2 的 BUG:如果通过 HTTPS 操作 ETCD(HTTP 不受影响),HTTP/2 的连接数上限为 Golang 默认的 `250` 个。
+所以,当 APISIX 数据面节点数较多时,一旦所有 APISIX 节点与 ETCD 连接数超过这个上限,则 APISIX 的接口响应会非常的慢。

Review Comment:
   ```suggestion
   这些延迟问题,严重影响了 APISIX 的服务稳定性。之所以会出现这类问题,主要是因为 ETCD 对外提供了 2 种操作方式:HTTP(HTTPS)、gRPC,而 APISIX 是基于 HTTP(HTTPS)协议来操作 ETCD 的。
   
   在上述场景中,ETCD 存在一个关于 HTTP/2 的 BUG:如果通过 HTTPS 操作 ETCD(HTTP 不受影响),HTTP/2 的连接数上限为 Golang 默认的 `250` 个。
   
   所以,当 APISIX 数据面节点数较多时,一旦所有 APISIX 节点与 ETCD 连接数超过这个上限,则 APISIX 的接口响应会变得非常慢。
   ```



##########
docs/zh/latest/FAQ.md:
##########
@@ -627,6 +627,58 @@ curl http://127.0.0.1:9180/apisix/admin/routes/health-info \
 
 :::
 
+## APISIX 与 ETCD 相关的延迟较高的问题有哪些,如何修复?
+
+ETCD 作为 APISIX 的数据存储组件,它的稳定性关乎 APISIX 的稳定性。在实际场景中,如果 APISIX 使用证书通过 HTTPS 的方式连接 ETCD,可能会出现以下 2 种数据查询或写入延迟较高的问题:
+
+1. 通过接口操作 APISIX Admin API 进行数据的查询或写入,延迟较高。
+2. 在监控系统中,Prometheus 抓取 APISIX 数据面 Metrics 接口超时。
+
+这些延迟问题,严重影响了 APISIX 的服务稳定性,而之所以会出现这类问题,主要是因为 ETCD 对外提供了 2 种操作方式:HTTP(HTTPS)、gRPC。而 APISIX 是基于 HTTP(HTTPS)协议来操作 ETCD 的。
+在这个场景中,ETCD 存在一个关于 HTTP/2 的 BUG:如果通过 HTTPS 操作 ETCD(HTTP 不受影响),HTTP/2 的连接数上限为 Golang 默认的 `250` 个。
+所以,当 APISIX 数据面节点数较多时,一旦所有 APISIX 节点与 ETCD 连接数超过这个上限,则 APISIX 的接口响应会非常的慢。
+
+Golang 中,默认的 HTTP/2 上限为 `250`,代码如下:
+
+```go
+package http2
+
+import ...
+
+const (
+	prefaceTimeout         = 10 * time.Second
+	firstSettingsTimeout   = 2 * time.Second // should be in-flight with preface anyway
+	handlerChunkWriteSize  = 4 << 10
+	defaultMaxStreams      = 250 // TODO: make this 100 as the GFE seems to?
+	maxQueuedControlFrames = 10000
+)
+
+```
+
+目前,ETCD 官方主要维护了 `3.4` 和 `3.5` 2 个主要版本。

Review Comment:
   ```suggestion
   目前,ETCD 官方主要维护了 `3.4` 和 `3.5` 这两个主要版本。在 `3.4` 系列中,近期发布的 `3.4.20` 版本已修复了这个问题。至于 `3.5` 版本,其实,官方很早之前就在筹备发布 `3.5.5` 版本了,但截止目前(2022.09.13)仍尚未发布。所以,如果你使用的是 ETCD 的版本小于 `3.5.5`,可以参考以下几种方式解决这个问题:
   ```



##########
docs/zh/latest/FAQ.md:
##########
@@ -627,6 +627,58 @@ curl http://127.0.0.1:9180/apisix/admin/routes/health-info \
 
 :::
 
+## APISIX 与 ETCD 相关的延迟较高的问题有哪些,如何修复?
+
+ETCD 作为 APISIX 的数据存储组件,它的稳定性关乎 APISIX 的稳定性。在实际场景中,如果 APISIX 使用证书通过 HTTPS 的方式连接 ETCD,可能会出现以下 2 种数据查询或写入延迟较高的问题:
+
+1. 通过接口操作 APISIX Admin API 进行数据的查询或写入,延迟较高。
+2. 在监控系统中,Prometheus 抓取 APISIX 数据面 Metrics 接口超时。
+
+这些延迟问题,严重影响了 APISIX 的服务稳定性,而之所以会出现这类问题,主要是因为 ETCD 对外提供了 2 种操作方式:HTTP(HTTPS)、gRPC。而 APISIX 是基于 HTTP(HTTPS)协议来操作 ETCD 的。
+在这个场景中,ETCD 存在一个关于 HTTP/2 的 BUG:如果通过 HTTPS 操作 ETCD(HTTP 不受影响),HTTP/2 的连接数上限为 Golang 默认的 `250` 个。
+所以,当 APISIX 数据面节点数较多时,一旦所有 APISIX 节点与 ETCD 连接数超过这个上限,则 APISIX 的接口响应会非常的慢。
+
+Golang 中,默认的 HTTP/2 上限为 `250`,代码如下:
+
+```go
+package http2
+
+import ...
+
+const (
+	prefaceTimeout         = 10 * time.Second
+	firstSettingsTimeout   = 2 * time.Second // should be in-flight with preface anyway
+	handlerChunkWriteSize  = 4 << 10
+	defaultMaxStreams      = 250 // TODO: make this 100 as the GFE seems to?
+	maxQueuedControlFrames = 10000
+)
+
+```
+
+目前,ETCD 官方主要维护了 `3.4` 和 `3.5` 2 个主要版本。
+而 `3.4` 已有近期发布的 `3.4.20` 修复了这个问题。
+至于 `3.5` 版本,其实,官方很早之前就在筹备发布 `3.5.5` 版本了,但截止目前(2022.09.13)也尚未发布。所以,如果你使用的是 ETCD 的版本小于 `3.5.5`,可以有几种方式解决这个问题:

Review Comment:
   ```suggestion
   ```



##########
docs/en/latest/FAQ.md:
##########
@@ -626,6 +626,59 @@ This method only detects whether the APISIX data plane is alive or not. It does
 
 :::
 
+## What are the scenarios with high APISIX latency related to ETCD and how to fix them?
+
+ETCD is the data storage component of apisix, and its stability is related to the stability of APISIX.
+
+In actual scenarios, if APISIX uses a certificate to connect to ETCD through HTTPS, the following two problems of high latency for data query or writing may occur:
+
+1. Query or write data through APISIX Admin API.
+2. In the monitoring scenario, Prometheus crawls the APISIX data plane Metrics API timeout.
+
+These problems related to higher latency seriously affect the service stability of APISIX, and the reason why such problems occur is mainly because ETCD provides two modes of operation: HTTP (HTTPS) and gRPC. And APISIX uses the HTTP (HTTPS) protocol to operate ETCD.
+In this scenario, ETCD has a bug about HTTP/2: if ETCD is operated over HTTPS (HTTP is not affected), the upper limit of HTTP/2 connections is the default `250` in Golang. Therefore, when the number of APISIX data plane nodes is large, once the number of connections between all APISIX nodes and ETCD exceeds this upper limit, the response of APISIX API interface will be very slow.
+
+In Golang, the default upper limit of HTTP/2 connections is `250`, the code is as follows:
+
+```go
+package http2
+
+import ...
+
+const (
+	prefaceTimeout         = 10 * time.Second
+	firstSettingsTimeout   = 2 * time.Second // should be in-flight with preface anyway
+	handlerChunkWriteSize  = 4 << 10
+	defaultMaxStreams      = 250 // TODO: make this 100 as the GFE seems to?
+	maxQueuedControlFrames = 10000
+)
+
+```
+
+At present, ETCD officially maintains two main branches, `3.4` and `3.5`.
+The `3.4` branch has the recently released `3.4.20` which fixes this issue.
+As for the `3.5` branch, in fact, the official is preparing to release the `3.5.5` version a long time ago, but it has not been released so far. So, if you are using a version of ETCD less than `3.5.5`, there are several ways to solve this problem:

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org