You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@apisix.apache.org by GitBox <gi...@apache.org> on 2022/06/29 10:13:08 UTC

[GitHub] [apisix] hansedong opened a new issue, #7353: the prometheus metrics API is tool slow

hansedong opened a new issue, #7353:
URL: https://github.com/apache/apisix/issues/7353

   ### Description
   
   I use APISIX in our microservice platform, there are thousands of microservices, that is, there are thousands of Route and Upstream resources in APISIX.
   
   When I switched the online traffic to APISIX and the monitoring platform Prometheus fetched time series data from APISIX's metrics API, APISIX's response took a long time, which in turn caused Prometheus to fetch data timeout.
   
   In order to check the network reasons, on the APISIX node, it is very slow to get metrics data through curl, so the root of the problem lies in APISIX itself.
   
   ```shell
   curl "http://127.0.0.1:9091/apisix/prometheus/metrics" > /tmp/metrics
     % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                    Dload  Upload   Total   Spent    Left  Speed
   100 7092k    0 7092k    0     0   351k      0 --:--:--  0:00:22 --:--:-- 1710k
   ```
   
   As shown above:
   1. the metrics data is only 8MB in size.
   2. The response time of APISIX metrics API is 22 seconds.
   
   How should I troubleshoot this issue?
   
   ### Environment
   
   - APISIX version (run `apisix version`): 2.13.2
   - Operating system (run `uname -a`): Linux knode10-132-14-174 4.19.206 #1 SMP Wed Sep 15 16:18:07 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
   - OpenResty / Nginx version (run `openresty -V` or `nginx -V`):
   ```shell
   nginx version: openresty/1.21.4.1
   built by gcc 9.3.1 20200408 (Red Hat 9.3.1-2) (GCC)
   built with OpenSSL 1.1.1n  15 Mar 2022
   TLS SNI support enabled
   configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt='-O2 -DAPISIX_BASE_VER=1.21.4.1.0 -DNGX_LUA_ABORT_AT_PANIC -I/usr/local/openresty/zlib/include -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl111/include' --add-module=../ngx_devel_kit-0.3.1 --add-module=../echo-nginx-module-0.62 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.33 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.09 --add-module=../srcache-nginx-module-0.32 --add-module=../ngx_lua-0.10.21 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.9 --add-module=../ngx_stream_lua-0.0.11 --with-ld-opt='-Wl,-rpath,/usr/local/openresty/luajit/lib -Wl,-rpath,/usr/local/openresty/wasmtime-c-ap
 i/lib -L/usr/local/openresty/zlib/lib -L/usr/local/openresty/pcre/lib -L/usr/local/openresty/openssl111/lib -Wl,-rpath,/usr/local/openresty/zlib/lib:/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl111/lib' --add-module=/tmp/tmp.jxGTHHB5bC/openresty-1.21.4.1/../mod_dubbo-1.0.2 --add-module=/tmp/tmp.jxGTHHB5bC/openresty-1.21.4.1/../ngx_multi_upstream_module-1.1.0 --add-module=/tmp/tmp.jxGTHHB5bC/openresty-1.21.4.1/../apisix-nginx-module-1.9.0 --add-module=/tmp/tmp.jxGTHHB5bC/openresty-1.21.4.1/../apisix-nginx-module-1.9.0/src/stream --add-module=/tmp/tmp.jxGTHHB5bC/openresty-1.21.4.1/../apisix-nginx-module-1.9.0/src/meta --add-module=/tmp/tmp.jxGTHHB5bC/openresty-1.21.4.1/../wasm-nginx-module-0.6.1 --add-module=/tmp/tmp.jxGTHHB5bC/openresty-1.21.4.1/../lua-var-nginx-module-v0.5.2 --with-poll_module --with-pcre-jit --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_v2_module --without-mail_pop3_module --without-mail_imap_module --without-mail_
 smtp_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_auth_request_module --with-http_secure_link_module --with-http_random_index_module --with-http_gzip_static_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --with-compat --with-stream --with-http_ssl_module
   ```
   - etcd version, if relevant (run `curl http://127.0.0.1:9090/v1/server_info`): 3.5.4
   - APISIX Dashboard version, if relevant: 2.13
   - Plugin runner version, for issues related to plugin runners: no
   - LuaRocks version, for installation issues (run `luarocks --version`): no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] tokers commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
tokers commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1241396117

   > 
   
   Oops, you're right, that's the correct place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] hansedong commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
hansedong commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1175852845

   @xuminwlt In my opinion, there is historical data in the Promethues plugin, and disable the plugin does not solve the problem, and even if the Node of upstream changes, the Prometheus plugin will always keep the historical Node data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] tzssangglass commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
tzssangglass commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1176022509

   > Etcd version 3.5.5 will fix this issue. And, I rebuilt and deployed Etcd based on its fixed PR and tested, and it has been confirmed that this problem can be fixed.
   
   Thank you for your research!
   
   > 2\. Because the APISIX Prometheus plugin will query the Modified Index of Etcd in real time every time, the response time of the Prometheus plugin will also increase significantly. Therefore, Prometheus will time out when fetching metrics.
   
    This was due to too many metrics (tens of thousands), and I had done some optimizations to the upstream nginx-lua-prometheus, but it didn't solve the problem completely.
   
   ref: https://github.com/knyar/nginx-lua-prometheus/pull/139
   
   one idea in the community now is to provide options to control the type of metrics to reduce the total number of metrics, see: https://github.com/apache/apisix/issues/7211#issuecomment-1165669868


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] zuiyangqingzhou commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
zuiyangqingzhou commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1177008843

   This problem does exist when the amount of metrics data is large, and we found that it also leads to abnormally high CPU usage. 
   
   so we modified the prometheus plugin to record only the necessary information, and the streamlined prometheus plugin works well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] tzssangglass commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
tzssangglass commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1170145158

   > I plan to change the exporter.lua script to reduce the metrics dataset and see if the problem can be solved. I
   
   Here's what I've seen work so far: reduce the number of metrics that aren't needed. ref: #4273
    But no one has claimed it yet, so if you want, you can try it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] xuminwlt commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
xuminwlt commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1175773141

   Prometheus plugin used in global config, switch enabled true, then it can collector router level metrics.
   But when i switch it to false, the router level metrics is also in /promethes endpoint, and the number is also huge.
   What i can do is only restart the apisix service. Is that a expect result?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] hansedong commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
hansedong commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1175831285

   @tzssangglass 
   
   I did a further follow up on this issue. The root of this problem is on Etcd.
   
   When APISIX nodes establish more connections (such as more than 200) to the same Etcd node, and APISIX communicates with Etcd through TLS certificates, this problem can be reproduced.
   
   This problem leads to 2 points:
   
   1. When APISIX initiates an HTTP request to Etcd, Etcd's response time will increase significantly (up to tens of seconds).
   2. Because the APISIX Prometheus plugin will query the Modified Index of Etcd in real time every time, the response time of the Prometheus plugin will also increase significantly. Therefore, Prometheus will time out when fetching metrics.
   
   I have created a related issue in the Etcd project and reproduced the problem. Related issues: #7078 
    https://github.com/etcd-io/etcd/issues/14185
   
   Etcd version 3.5.5 will fix this issue. And, I rebuilt and deployed Etcd based on its fixed PR and tested, and it has been confirmed that this problem can be fixed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] hansedong commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
hansedong commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1240466041

   @tokers Thanks a lot, I'll give it a try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] tokers commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
tokers commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1240450874

   > @tzssangglass @tokers Sorry for taking so long to reply. The root cause of this problem is Etcd's bug. Etcd's HTTP2-based https connections are limited. The official version 3.5 has not yet been released, but it has been fixed in version 3.4 and a new version has been released. For version 3.5, I have hacked Etcd's source code, recompiled it, and ran it in production stably for nearly a month. This bug of Etcd can refer to: [etcd-io/etcd#14185](https://github.com/etcd-io/etcd/issues/14185)
   
   @hansedong Would you like to submit a PR to add this important fact to the FAQ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] hansedong commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
hansedong commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1240508742

   @tokers I'm a little confused, I see that the content of the FAQ page doesn't seem to be in the apache/apisix-website project, but in apache/apisix, specifically https://github.com/apache/apisix/blob/master/docs/ en/latest/FAQ.md?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] hansedong closed issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
hansedong closed issue #7353: the prometheus metrics API is tool slow 
URL: https://github.com/apache/apisix/issues/7353


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] hansedong commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
hansedong commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1169873693

   @tzssangglass 
   
   Thanks for the reply, I have read these 2 issues, and they have some help for me to solve the problem.
   
   However, in the case of APISIX, it did not solve the problem at the source. I plan to change the exporter.lua script to reduce the metrics dataset and see if the problem can be solved. In other words, to solve this problem, users need to change the code to try to solve it, which is not a good way to solve the problem.
   
   The reason I had this problem was actually because I was changing the gateway, from Envoy to APISIX. As far as Envoy is concerned, even when the metrics data is as large as several hundred MB, the response of Envoy will not be slow.
   
   I think, as mentioned in other issues, APISIX's Prometheus plugin could try to provide some customizable options to help people with similar problems.
   
   Thank you very much for your reply, if there is progress, I will reply in this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] tokers commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
tokers commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1240458651

   > @tokers I'd love to do this, please, how do I add this to the FAQ?
   
   The FAQ page is https://apisix.apache.org/docs/apisix/FAQ/, and you can submit a PR to apisix-website: https://github.com/apache/apisix-website


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] hansedong commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
hansedong commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1245129573

   @tokers I've added an FAQ item #7906 , can you help review it?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] tzssangglass commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
tzssangglass commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1169835573

   This is a known issue, same as #7211 and #5755


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] tokers commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
tokers commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1177310090

   @zuiyangqingzhou Have you tried the nginx-lua-prometheus optimization introduced by @tzssangglass ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] tzssangglass commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
tzssangglass commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1178458483

   > @zuiyangqingzhou Have you tried the nginx-lua-prometheus optimization introduced by @tzssangglass ?
   
   This optimization is also limited in that some processes cannot be removed, such as sorting tens of thousands of keys, and regular and string splicing. 😅


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] hansedong commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
hansedong commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1240148358

   @tzssangglass @tokers 
   Sorry for taking so long to reply. The root cause of this problem is Etcd's bug. Etcd's HTTP2-based https connections are limited. The official version 3.5 has not yet been released, but it has been fixed in version 3.4 and a new version has been released.  For version 3.5, I have hacked Etcd's source code, recompiled it, and ran it in production stably for nearly a month.
   This bug of Etcd can refer to: https://github.com/etcd-io/etcd/issues/14185


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [apisix] hansedong commented on issue #7353: the prometheus metrics API is tool slow

Posted by GitBox <gi...@apache.org>.
hansedong commented on issue #7353:
URL: https://github.com/apache/apisix/issues/7353#issuecomment-1240456863

   @tokers I'd love to do this, please, how do I add this to the FAQ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org