You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2017/11/07 09:56:32 UTC

[Bug 61733] New: lb_mult changes caused by status worker changing a subworker's factor are not propagated to all processes correctly.

https://bz.apache.org/bugzilla/show_bug.cgi?id=61733

            Bug ID: 61733
           Summary: lb_mult changes caused by status worker changing a
                    subworker's factor are not propagated to all processes
                    correctly.
           Product: Tomcat Connectors
           Version: 1.2.42
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Common
          Assignee: dev@tomcat.apache.org
          Reporter: jonathan.oddy@transferwise.com
  Target Milestone: ---

Created attachment 35500
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35500&action=edit
Patch to recalculate lb_mult after pulling from shm

Steps to reproduce:
Alter the lbfactor of one subworker through jkstatus to a value that causes the
lbmult to need to be updated for other subworkers. You need multiple subworkers
configured and multiple web server processes to be running at the point the
status change is made.

Expected result:
All processes pull in the new lbfactor for the changed subworker, and lbmult
for all the other updated subworkers.
Requests are balanced appropriately for the new factor configuration.

Actual result:
All processes pull in the new lbfactor for the changed subworker, however only
the process processing the status change sees the lbmult values change for the
other subworkers. All other processes keep the original lbmult values.
Requests are not correctly balanced.

This happens because lbmult is expected to be calculated by the status worker
when it updates the factor, and then pushed to shm. The status worker, however,
only updates the sequence for the subworker being edited and not those
receiving new lbmult values. As a result the other processes do not observe a
change to the other subworkers, and do not pull the lbmult change.

This patch moves responsibility for calculating the local mult values to the
jk_lb_pull function. Processes will now recalculate their own view of the mult
values after pulling from shm, rather than obtaining it from shm. Since the
values are based on the current weights all processes will reach the same
conclusion.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61733] lb_mult changes caused by status worker changing a subworker's factor are not propagated to all processes correctly.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61733

--- Comment #9 from Jonathan Oddy <jo...@transferwise.com> ---
We've been successfully running my original 1-liner fix for some time, so
testing an alternative fix wasn't that high up the priority list at the end of
the year. I'll try to get a new build with your fix tested next week.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61733] lb_mult changes caused by status worker changing a subworker's factor are not propagated to all processes correctly.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61733

--- Comment #6 from Jonathan Oddy <jo...@transferwise.com> ---
Hi,
The "Force update all subworkers for multiplier change" from my previous
comment also includes the reordering. Updating the sequence numbers to -1 for
all subworkers, as in my patch, is consistent with how updates are already
pushed by the status worker. It does feel rather ugly though.

Your change to jk_lb_push feels like it might lead to a cleaner implementation,
but if so it'd be good to also get rid of the existing sequence = -1 usage for
lb and wr in a similar fashion. Also you seem to have deleted a wr->sequence =
-1. Without that I think explicit changes to a worker that don't trigger a
multiplier change won't be pushed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61733] lb_mult changes caused by status worker changing a subworker's factor are not propagated to all processes correctly.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61733

--- Comment #7 from Rainer Jung <ra...@kippdata.de> ---
Created attachment 35550
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35550&action=edit
Like update_shm_mult.patch, but fix missing sequence nummber setting in single
member update case

Thanks for pointing at the missing sequence number update in the single member
update case. I had removed it intentionally when I originally set
push_all_members always to JK_TRUE there, but then forgot to add it back for
the JK_FALSE case.

Can you actually please test the (v2) patch?

Concerning other places of using sequence number value "-1" as a side effect, I
currently do not plan to work on removing this opaque feature. mod_jk is kept
quite stable since a few years so I think most energy will go into fixing bugs,
not improving code quality.

Thanks again,

Rainer

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61733] lb_mult changes caused by status worker changing a subworker's factor are not propagated to all processes correctly.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61733

Rainer Jung <ra...@kippdata.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #8 from Rainer Jung <ra...@kippdata.de> ---
I have applied version two of my patch in r1820195. It will be part of 1.2.43.
Thanks for your help on this.

It would still be usefull if you could also do a test before the release.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61733] lb_mult changes caused by status worker changing a subworker's factor are not propagated to all processes correctly.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61733

--- Comment #3 from Mark Thomas <ma...@apache.org> ---
That sounds reasonable. That does seem to be consistent with the general
approach which looks to be calculate / update stuff once, put in in shared
memory, have everything that needs to update from shared memory.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61733] lb_mult changes caused by status worker changing a subworker's factor are not propagated to all processes correctly.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61733

--- Comment #5 from Rainer Jung <ra...@kippdata.de> ---
Created attachment 35549
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35549&action=edit
Reorder update actions and introduce flag for demanding updates for all members

I think both of your analysis are right:

- we only sync via shared memory lb_mult for the worker whose factor changed
- the order of steps in case only one member is changes is wrong

The attached patch should fix both issues, correct the order - similar to what
Mark suggests - plus letting jk_lb_push know, that it should not only push none
or one member, but instead needs to push all members of an lb. To make this
possible we change the signature of jk_lb_push and introduce a new flag.

The patch compiles, but I'm short of time to aczually test it. it would be
great if one or both of you could give it a try.

Comments very welcome.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61733] lb_mult changes caused by status worker changing a subworker's factor are not propagated to all processes correctly.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61733

--- Comment #2 from Jonathan Oddy <jo...@transferwise.com> ---
I hadn't spotted that, but I don't think it helps. jk_lb_push() only pushes
changes for subworkers where the sequence number has changed. update_worker()
sets the sequence number to -1 for the subworker that had its weight explicitly
changed, however update_mult() affects the other subworkers as well.

I guess an alternative solution would be for update_worker() to set sequence to
-1 for all subworkers if JK_STATUS_NEEDS_UPDATE_MULT?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61733] lb_mult changes caused by status worker changing a subworker's factor are not propagated to all processes correctly.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61733

--- Comment #4 from Jonathan Oddy <jo...@transferwise.com> ---
Created attachment 35547
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35547&action=edit
Force update all subworkers for multiplier change

Here's an attempt at doing that.

This hasn't been as heavily tested as my original patch.

However, I'm now wondering if it ought to be treated like lb_value, and read
straight from shm rather than being copied to lb_sub_worker. That'd be simpler,
and also get rid of the lag on lb_mult changes being picked up everywhere. Are
there any advantages to it being synced to lb_sub_worker? I wondered if there
was some performance implication, but if that's the case then access to
lb_value would surely be much more of a bottleneck already.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61733] lb_mult changes caused by status worker changing a subworker's factor are not propagated to all processes correctly.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61733

Jonathan Oddy <jo...@transferwise.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|enhancement                 |normal

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: [Bug 61733] lb_mult changes caused by status worker changing a subworker's factor are not propagated to all processes correctly.

Posted by Rainer Jung <ra...@kippdata.de>.
Will look this evening.

Rainer

Am 22.11.2017 um 11:59 schrieb Mark Thomas:
> Hi,
> 
> A review from someone who knows the jk code better than I do would be
> appreciated.
> 
> Does my alternative patch look appropriate? If not, is the OP's patch
> appropriate? If neither is appropriate, what might an appropriate patch
> look like and I'll see what I can do.
> 
> Thanks,
> 
> Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: [Bug 61733] lb_mult changes caused by status worker changing a subworker's factor are not propagated to all processes correctly.

Posted by Mark Thomas <ma...@apache.org>.
Hi,

A review from someone who knows the jk code better than I do would be
appreciated.

Does my alternative patch look appropriate? If not, is the OP's patch
appropriate? If neither is appropriate, what might an appropriate patch
look like and I'll see what I can do.

Thanks,

Mark


On 22/11/17 10:56, bugzilla@apache.org wrote:
> https://bz.apache.org/bugzilla/show_bug.cgi?id=61733
> 
> --- Comment #1 from Mark Thomas <ma...@apache.org> ---
> Created attachment 35544
>   --> https://bz.apache.org/bugzilla/attachment.cgi?id=35544&action=edit
> Possible alternative solution
> 
> Having reviewed the source code it appears that after the lb_factor is changed
> the shared memory is updated before the lb_mult is recalculated. I wonder if
> swapping the order of those actions would be an alternative fix? I have
> attached a possible patch.
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 61733] lb_mult changes caused by status worker changing a subworker's factor are not propagated to all processes correctly.

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=61733

--- Comment #1 from Mark Thomas <ma...@apache.org> ---
Created attachment 35544
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35544&action=edit
Possible alternative solution

Having reviewed the source code it appears that after the lb_factor is changed
the shared memory is updated before the lb_mult is recalculated. I wonder if
swapping the order of those actions would be an alternative fix? I have
attached a possible patch.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org