You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@apisix.apache.org by GitBox <gi...@apache.org> on 2021/06/22 06:22:59 UTC

[GitHub] [apisix] kuberxy opened a new issue #4461: request help: How much lua_max_pending_timers should be set to

kuberxy opened a new issue #4461:
URL: https://github.com/apache/apisix/issues/4461


   ### Issue description
   My current settings are:
   ```
   lua_max_pending_timers 40960;
   lua_max_running_timers 256;
   ```
   But I encountered the following error. our traffic is not large, and there are only about 50 users online at the same time.
   ```
   2021/06/22 12:22:21 [error] 2115#2115: *59914 [lua] balancer.lua:96: create_obj_fun(): fail to create healthcheck instance: failed to create 'healthy' timer: too many pending timers while connecting to upstream
   ```
   What I want to know is: Is my setting too small? Or did you not release the timer resources you created in time?
   If my setting is too small, how should I calculate a reasonable value?
   
   ### Environment
   * apisix version (cmd: `apisix version`): 2.2
   * OS (cmd: `uname -a`): 4.15.0-111-generic Ubuntu 18.04
   * OpenResty / Nginx version (cmd: `nginx -V` or `openresty -V`):  openresty/1.15.8.3
   * etcd version, if have (cmd: run `curl http://127.0.0.1:9090/v1/server_info` to get the info from server-info API): Stand-Alone
   * luarocks version, if the issue is about installation (cmd: `luarocks --version`): 2.4.2
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] tokers commented on issue #4461: request help: How much lua_max_pending_timers should be set to

Posted by GitBox <gi...@apache.org>.
tokers commented on issue #4461:
URL: https://github.com/apache/apisix/issues/4461#issuecomment-877561729


   > The problem is whether APISIX creates too many timers.
   > I remember that old APISIX has a bug that stale health checkers are not removed in time. It may cause extra timers.
   > 
   > We can run APISIX 2.7 & APISIX 2.2 under a similar environment and compare the timer they create.
   
   By the way, the consul-kv module creates a timer without delay, especially when APISIX cannot connect to the consul servers, the timer creation speed will be high (also the deletion is quick).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] kuberxy commented on issue #4461: request help: How much lua_max_pending_timers should be set to

Posted by GitBox <gi...@apache.org>.
kuberxy commented on issue #4461:
URL: https://github.com/apache/apisix/issues/4461#issuecomment-867263274


   > You can port the patch mechanism to 2.2, for example, put it at the top of apisix/init.lua.
   
   Thanks. I try it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander commented on issue #4461: request help: How much lua_max_pending_timers should be set to

Posted by GitBox <gi...@apache.org>.
spacewander commented on issue #4461:
URL: https://github.com/apache/apisix/issues/4461#issuecomment-867261639


   You can port the patch mechanism to 2.2, for example, put it at the top of apisix/init.lua.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander commented on issue #4461: request help: How much lua_max_pending_timers should be set to

Posted by GitBox <gi...@apache.org>.
spacewander commented on issue #4461:
URL: https://github.com/apache/apisix/issues/4461#issuecomment-876181048


   Can you analyze the stack trace and find out each type of timer caller and their numbers?
   BTW, your stack trace doesn't match the `balancer.lua` from the v2.2 version. It seems you have modified some part of the code? Can this problem be reproduced in v2.7?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] kuberxy commented on issue #4461: request help: How much lua_max_pending_timers should be set to

Posted by GitBox <gi...@apache.org>.
kuberxy commented on issue #4461:
URL: https://github.com/apache/apisix/issues/4461#issuecomment-876273174


   I only caught the exception refer this #3169. Because without it, my user will get a 500 error when the timer is insufficient.
   
   I not use v2.7. I reproduced this problem in v2.3. In addition,  I found that the code of balancer.lua has been rewritten in v2.3. In the error log, I only saw one line of error message:
   ```
   2021/07/08 16:46:17 [error] 31019#31019: *166383 [lua] upstream.lua:88: fetch_healthchecker(): fail to create healthcheck instance: failed to create 'healthy' timer: too many pending timers, client: 127.0.0.1, server: , request: "POST /rpc/account HTTP/1.1", host: "test.xxx.com", referrer: "https://test.xxx.com/circle/"
   ```
   
   The way I reproduce the problem:
   1. Reduce the number of timers. like this
   ```
       lua_max_pending_timers 15;
       lua_max_running_timers 12;
   ```
   2.Open a page with a large number of requests, and then constantly force a refresh


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] kuberxy commented on issue #4461: request help: How much lua_max_pending_timers should be set to

Posted by GitBox <gi...@apache.org>.
kuberxy commented on issue #4461:
URL: https://github.com/apache/apisix/issues/4461#issuecomment-867253962


   My version is 2.2, it seems that there is no patch.lua file


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] kuberxy closed issue #4461: request help: How much lua_max_pending_timers should be set to

Posted by GitBox <gi...@apache.org>.
kuberxy closed issue #4461:
URL: https://github.com/apache/apisix/issues/4461


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander commented on issue #4461: request help: How much lua_max_pending_timers should be set to

Posted by GitBox <gi...@apache.org>.
spacewander commented on issue #4461:
URL: https://github.com/apache/apisix/issues/4461#issuecomment-876871336


   The problem is whether APISIX creates too many timers.
   I remember that old APISIX has a bug that stale health checkers are not removed in time. It may cause extra timers.
   
   We can run APISIX 2.7 & APISIX 2.2 under a similar environment and compare the timer they create.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] kuberxy commented on issue #4461: request help: How much lua_max_pending_timers should be set to

Posted by GitBox <gi...@apache.org>.
kuberxy commented on issue #4461:
URL: https://github.com/apache/apisix/issues/4461#issuecomment-876102589


   I get this infomations in error log after I use patch mechanism. 
   ```
   2021/07/08 11:37:31 [error] 21507#21507: *116264 [lua] balancer.lua:96: create_obj_fun(): fail to create healthcheck instance: nil while connecting to upstream, client: 127.0.0.1, server: , request: "GET /res/static/ktv/images/close.png HTTP/1.1", host: "test.xxx.com", referrer: "https://test.xxx.com/circle/"
   2021/07/08 11:37:31 [alert] 21509#21509: *116288 [lua] patch.lua:326: gctimer(): pending_timer 128 stack: stack traceback:
   	/usr/local/share/lua/5.1/apisix/patch.lua:326: in function 'gctimer'
   	/usr/local/share/lua/5.1/resty/healthcheck.lua:1138: in function 'start'
   	/usr/local/share/lua/5.1/resty/healthcheck.lua:1427: in function 'new'
   	/usr/local/share/lua/5.1/apisix/balancer.lua:90: in function 'create_obj_fun'
   	/usr/local/share/lua/5.1/apisix/core/lrucache.lua:92: in function 'lrucache_checker'
   	/usr/local/share/lua/5.1/apisix/balancer.lua:139: in function 'fetch_healthchecker'
   	/usr/local/share/lua/5.1/apisix/balancer.lua:212: in function 'pick_server'
   	/usr/local/share/lua/5.1/apisix/balancer.lua:276: in function 'load_balancer'
   	/usr/local/share/lua/5.1/apisix/init.lua:759: in function 'http_balancer_phase'
   	balancer_by_lua:2: in main chunk while connecting to upstream, client: 127.0.0.1, server: , request: "GET /res/static/circle/images/music_default.jpg HTTP/1.1", host: "test.xxx.com", referrer: "https://test.xxx.com/circle/"
   2021/07/08 11:37:31 [error] 21509#21509: *116288 [lua] balancer.lua:96: create_obj_fun(): fail to create healthcheck instance: nil while connecting to upstream, client: 127.0.0.1, server: , request: "GET /res/static/circle/images/music_default.jpg HTTP/1.1", host: "test.xxx.com", referrer: "https://test.xxx.com/circle/"
   2021/07/08 11:37:31 [alert] 21509#21509: *116336 [lua] patch.lua:326: gctimer(): pending_timer 128 stack: stack traceback:
   	/usr/local/share/lua/5.1/apisix/patch.lua:326: in function 'gctimer'
   	/usr/local/share/lua/5.1/resty/healthcheck.lua:1019: in function </usr/local/share/lua/5.1/resty/healthcheck.lua:972>, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:808
   2021/07/08 11:37:31 [alert] 21509#21509: *116357 [lua] patch.lua:326: gctimer(): pending_timer 128 stack: stack traceback:
   	/usr/local/share/lua/5.1/apisix/patch.lua:326: in function 'gctimer'
   	/usr/local/share/lua/5.1/resty/healthcheck.lua:1019: in function </usr/local/share/lua/5.1/resty/healthcheck.lua:972>, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:808
   2021/07/08 11:37:31 [error] 21509#21509: *116348 [lua] balancer.lua:96: create_obj_fun(): fail to create healthcheck instance: nil while connecting to upstream, client: 127.0.0.1, server: , request: "GET /res/static/circle/images/live/record_stop.png HTTP/1.1", host: "test.xxx.com", referrer: "https://test.xxx.com/circle/"
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] tokers commented on issue #4461: request help: How much lua_max_pending_timers should be set to

Posted by GitBox <gi...@apache.org>.
tokers commented on issue #4461:
URL: https://github.com/apache/apisix/issues/4461#issuecomment-876350778


   > I only caught the exception refer this #3169. Because without it, my user will get a 500 error when the timer is insufficient.
   > 
   > I not use v2.7. I reproduced this problem in v2.3. In addition, I found that the code of balancer.lua has been rewritten in v2.3. In the error log, I only saw one line of error message:
   > 
   > ```
   > 2021/07/08 16:46:17 [error] 31019#31019: *166383 [lua] upstream.lua:88: fetch_healthchecker(): fail to create healthcheck instance: failed to create 'healthy' timer: too many pending timers, client: 127.0.0.1, server: , request: "POST /rpc/account HTTP/1.1", host: "test.xxx.com", referrer: "https://test.xxx.com/circle/"
   > ```
   > 
   > The way I reproduce the problem:
   > 
   > 1. Reduce the number of timers. like this
   > 
   > ```
   >     lua_max_pending_timers 15;
   >     lua_max_running_timers 12;
   > ```
   > 
   > 2.Open a page with a large number of requests, and then constantly force a refresh
   
   Seems that `15` is too small for APISIX as the max pending number of timers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@apisix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [apisix] spacewander commented on issue #4461: request help: How much lua_max_pending_timers should be set to

Posted by GitBox <gi...@apache.org>.
spacewander commented on issue #4461:
URL: https://github.com/apache/apisix/issues/4461#issuecomment-865780706


   Maybe you can inject counting to see which part creates the most timers, by applying the patch below.
   
   ```diff
   diff --git apisix/patch.lua apisix/patch.lua
   index a291110b..e610f32a 100644
   --- apisix/patch.lua
   +++ apisix/patch.lua
   @@ -267,6 +267,8 @@ local function luasocket_tcp()
    end
   
   
   +local pending_timer = 0
   +
    function _M.patch()
        -- make linter happy
        -- luacheck: ignore
   @@ -278,6 +280,27 @@ function _M.patch()
   
            return luasocket_tcp()
        end
   +
   +    local old_timer_at = ngx.timer.at
   +    ngx.timer.at = function (delay, f, ...)
   +        pending_timer = pending_timer + 1
   +        if pending_timer % 128 == 0 then
   +            ngx.log(ngx.ALERT, "pending_timer ", pending_timer, " stack: ", debug.traceback())
   +        end
   +        return old_timer_at(delay, function(...)
   +            pending_timer = pending_timer - 1
   +            return f(...)
   +        end, ...)
   +    end
   +
   +    local old_timer_every = ngx.timer.every
   +    ngx.timer.every = function (delay, f, ...)
   +        pending_timer = pending_timer + 1
   +        if pending_timer % 128 == 0 then
   +            ngx.log(ngx.ALERT, "pending_timer ", pending_timer, " stack: ", debug.traceback())
   +        end
   +        return old_timer_every(delay, f, ...)
   +    end
    end
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org