You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Dan Udey <da...@communicate.com> on 2008/11/14 00:58:12 UTC
[users@httpd] Proxy balancing weirdness with bybusyness

Hey all,

We're having a strange problem with our 64-bit Apache 2.2.9 + bybusy  
patch proxy-balancing to mongrel app servers. What seems to happen is  
that Apache will forget (or ignore) workers that it knows about  
indefinitely. You can see this best from ps output:

         deploy   27326 14.3  3.2 271728 130632 ?       Sl   23:23    
1:03 mongrel_rails [8000/0/289]: idle
         deploy   27329 15.4  3.7 298428 150368 ?       Sl   23:23    
1:08 mongrel_rails [8001/0/289]: idle
         deploy   27332 16.6  3.8 296292 154976 ?       Sl   23:23    
1:13 mongrel_rails [8002/0/288]: idle
         deploy   27335 15.5  3.3 279404 136820 ?       Sl   23:23    
1:08 mongrel_rails [8003/0/289]: idle
         deploy   27338 16.6  3.4 280396 139452 ?       Sl   23:23    
1:13 mongrel_rails [8004/0/290]: idle
         deploy   27341 13.6  3.3 275600 134724 ?       Sl   23:23    
1:00 mongrel_rails [8005/0/288]: idle
         deploy   27344  1.1  1.5 155708 62616 ?        Sl   23:23    
0:04 mongrel_rails [8006/0/7]: idle
         deploy   27347 16.2  3.7 299976 153908 ?       Sl   23:23    
1:11 mongrel_rails [8007/0/287]: idle
         deploy   27350  1.3  2.5 241708 104364 ?       Sl   23:23    
0:05 mongrel_rails [8008/0/5]: idle
         deploy   27354  1.4  2.6 246368 109044 ?       Sl   23:23    
0:06 mongrel_rails [8009/0/4]: idle
         deploy   27359  1.0  1.4 151124 58096 ?        Sl   23:23    
0:04 mongrel_rails [8010/0/0]: idle
         deploy   27362  0.9  1.4 151140 58112 ?        Sl   23:23    
0:04 mongrel_rails [8011/0/0]: idle

The format of the tuple in the mongrel-rails line is [port/pending/ 
handled] - 'pending' being mongrel's internal pending request cache,  
which should always be 0 or 1, and 'handled' being the number of  
requests that mongrel has handled up until now.

As you can see from the output, seven of the mongrel processes have  
served ~290 requests each, while five of them have served <10. This  
matches up with balancer-manager's status (taken from a few minutes  
later, so the numbers aren't the same):

               Worker URL       Route RouteRedir Factor Set Status  
Elected  To  From
         http://cimbar:8000                      1      0   Ok      
415     315K 22M
         http://cimbar:8001                      1      0   Ok      
416     324K 22M
         http://cimbar:8002                      1      0   Ok      
484     392K 27M
         http://cimbar:8003                      1      0   Ok      
483     381K 26M
         http://cimbar:8004                      1      0   Ok      
484     379K 26M
         http://cimbar:8005                      1      0   Ok      
484     374K 25M
         http://cimbar:8006                      1      0   Ok      
52      44K  2.6M
         http://cimbar:8007                      1      0   Ok      
608     474K 34M
         http://cimbar:8008                      1      0   Ok      
53      41K  2.6M
         http://cimbar:8009                      1      0   Ok      
53      43K  2.9M
         http://cimbar:8010                      1      0   Ok      
5       1.1K 6.6K
         http://cimbar:8011                      1      0   Ok      
7       1.2K 62K

My first guess was that they were being disabled, but balancer-manager  
says 'Ok'. Next, I looked at the logfiles to see if there was anything  
amiss, and that's when I found something very odd - lbstatus seems to  
be skewing itself pretty dramatically, but I can't tell why. For  
example, here are the last entries for lbstatus for each port:

         dan@waterdeep:/var/log/apache2$ for port in {8000..8011}; do  
fgrep "bybusyness selected worker \"http://cimbar:${port}" /tmp/ 
logfile | tail -n1; done
         [Thu Nov 13 23:32:39 2008] [debug]  
mod_proxy_balancer.c(1173): proxy: bybusyness selected worker "http://cimbar:8000 
" : busy 2 : lbstatus -1922
         [Thu Nov 13 23:32:45 2008] [debug]  
mod_proxy_balancer.c(1173): proxy: bybusyness selected worker "http://cimbar:8001 
" : busy 2 : lbstatus -1910
         [Thu Nov 13 23:34:24 2008] [debug]  
mod_proxy_balancer.c(1173): proxy: bybusyness selected worker "http://cimbar:8002 
" : busy 2 : lbstatus -2233
         [Thu Nov 13 23:34:25 2008] [debug]  
mod_proxy_balancer.c(1173): proxy: bybusyness selected worker "http://cimbar:8003 
" : busy 2 : lbstatus -2236
         [Thu Nov 13 23:34:23 2008] [debug]  
mod_proxy_balancer.c(1173): proxy: bybusyness selected worker "http://cimbar:8004 
" : busy 2 : lbstatus -2234
         [Thu Nov 13 23:34:24 2008] [debug]  
mod_proxy_balancer.c(1173): proxy: bybusyness selected worker "http://cimbar:8005 
" : busy 2 : lbstatus -2236
         [Thu Nov 13 23:32:45 2008] [debug]  
mod_proxy_balancer.c(1173): proxy: bybusyness selected worker "http://cimbar:8006 
" : busy 3 : lbstatus 2468
         [Thu Nov 13 23:34:25 2008] [debug]  
mod_proxy_balancer.c(1173): proxy: bybusyness selected worker "http://cimbar:8007 
" : busy 1 : lbstatus -3444
         [Thu Nov 13 23:33:54 2008] [debug]  
mod_proxy_balancer.c(1173): proxy: bybusyness selected worker "http://cimbar:8008 
" : busy 3 : lbstatus 2724
         [Thu Nov 13 23:32:43 2008] [debug]  
mod_proxy_balancer.c(1173): proxy: bybusyness selected worker "http://cimbar:8009 
" : busy 3 : lbstatus 2459
         [Thu Nov 13 23:32:39 2008] [debug]  
mod_proxy_balancer.c(1173): proxy: bybusyness selected worker "http://cimbar:8010 
" : busy 3 : lbstatus 2987
         [Thu Nov 13 23:32:45 2008] [debug]  
mod_proxy_balancer.c(1173): proxy: bybusyness selected worker "http://cimbar:8011 
" : busy 3 : lbstatus 2983

I'm wondering if the 'busy 3' notice is the reason why. I see in  
mod_proxy_balancer.c it is incremented in proxy_balancer_pre_request()  
and decremented in proxy_balancer_post_request() - is it possible that  
it's not being decremented properly? The code looks straightforward,  
but I'll have to have another look at it.

Another thought I had was that maybe Rainer Jung's comment on bug  
#45501 (the bybusy patch: https://issues.apache.org/bugzilla/show_bug.cgi?id=45501) 
  applied to this situation as well. He refers to counters in mod_jk  
skewing negative for some reason under load on 64-bit machines.

Our balancer config is here, but everything seems pretty  
straightforward, so it should be working properly, I would imagine.  
We've got four mongrels running on the Apache machine as well for  
extra load and handling requests when mongrel is restarted on cimbar.  
The lbset configuration works great, but this was happening before we  
added that, maxattempts, and timeout to the configuration.

         <Proxy balancer://nuperfume lbmethod=bybusyness maxattempts=3  
timeout=5>
                 BalancerMember http://127.0.0.1:8000 retry=2 lbset=1
                 BalancerMember http://127.0.0.1:8001 retry=2 lbset=1
                 BalancerMember http://127.0.0.1:8002 retry=2 lbset=1
                 BalancerMember http://127.0.0.1:8003 retry=2 lbset=1

                 BalancerMember http://cimbar:8000 retry=2 lbset=0
                 BalancerMember http://cimbar:8001 retry=2 lbset=0
                 BalancerMember http://cimbar:8002 retry=2 lbset=0
                 BalancerMember http://cimbar:8003 retry=2 lbset=0
                 BalancerMember http://cimbar:8004 retry=2 lbset=0
                 BalancerMember http://cimbar:8005 retry=2 lbset=0
                 BalancerMember http://cimbar:8006 retry=2 lbset=0
                 BalancerMember http://cimbar:8007 retry=2 lbset=0
                 BalancerMember http://cimbar:8008 retry=2 lbset=0
                 BalancerMember http://cimbar:8009 retry=2 lbset=0
                 BalancerMember http://cimbar:8010 retry=2 lbset=0
                 BalancerMember http://cimbar:8011 retry=2 lbset=0
         </Proxy>

I've even gone so far as to bring over Apache 2.2.10's mod_proxy code  
directly and compile that in, and while it works fine, nothing changes.

Any ideas?

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org