You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Dr James Smith <js...@sanger.ac.uk> on 2020/03/18 16:02:38 UTC

Re: [users@httpd] Bizarre problem with Apache HTTPD, a number of Tomcats, mod_proxy_balancer and mod_jk - any ideas where to look for the root cause welcome [EXT]

Do you see anything different between the users that work and the users 
that don't.. Do they use a different browser (useragent) or HTTP protocol?

On 18/03/2020 12:40, "Jürgen Göres" wrote:
> Hi all,
>
> we are currently observing a really bizarre problem on a customer system.
> Our software runs a number of microservices on individual Tomcats, which we front with an Apache HTTPD (2.4.x) reverse proxy using mod_jk to route the requests by context. There is one exception, though: one of the microservices which we added to the stack at a later point in time uses websocksets, which are not supported through the AJP protocol, so we are using mod_proxy_balancer here.
> We put the ProxyPass etc. rules for mod_proxy_balancer in front of the directives related to mod_jk and we have been mostly fine with this approach for a few years now. We have two sets of balancer specifications for mod_proxy_balancer and their associated rules, one for regular http traffic, the other for websocket traffic ("ws:" resp. "wss:").
>
> Let's name the microservices that are handled by mod_jk A, B, and C,  and let's name the one handled by mod_proxy_balancer Z. Let's further assume that their request contexts are /a, /b, /c and /z, respectively.
>
> Now about the current customer problem: the customer started experiencing very erratic system behaviour. In particular requests that were meant for one of the microservices A-C handled by mod_jk would randomly give 404 responses. Usually, this situation would persist for an affected user for a few seconds and reloading wouldn't resolve it. At the same time, other users accessing the very same microservice didn't have a problem. Pretty much all users were affected from time to time.
>
> We did several troubleshooting sessions that turned up nothing. At some point, we started to monitor all kinds of traffic between HTTPD and the Tomcats with TCPdump, and here we found the bizarre thing:
> When we ran TCP dump and filtered it to only show traffic between HTTPD and the microservice Z (handled by mod_proxy_balancer), we sometimes saw requests that were clearly meant for one of the OTHER microservices (A-C) based on the request URL (a, /b, /c) that would show up in the traffic to the microservice Z, and naturally microservice Z has no idea of what to do with these requests and responds with 404.
>
> What else might be relevant:
> - our microservices are stateless, so we an scale horizontally if we want. On that particular system, we have at least two instances of each microservice (A-C and Z)
> - the installation is spread across multiple nodes
> - the nodes run on Linux
> - Docker is not used ;-)
> - we have never seen this problem on any other system
> - we haven't seen this problem on the customer's test system, but here usage patterns are different
> - the requests with 404 responses wouldn't show up in the HTTPD's access log (where "normal" 404 requests DO show).
> - the customer had recently updated from a version of our product that uses Apache 2.4.34 to one using 2.4.41
> - disabling the microservice Z (= no more balancer workers for mod_proxy_balancer) would resolve the problem
> - putting the rules for mod_proxy_balancer after those of mod_jk (and adding an exclusion for /z there, cause on of the other microservices is actually listening on the root context) would NOT change a thing
>
>  From experience, we are pretty sure that the problem is somewhere on our side. ;-)
>
> - One thing we thought is that maybe a bug in microservice Z that is only triggered by this customer's use of our product causes the erratic behaviour of the HTTPD/MPB? Maybe something we do wrong messing up the connection keepalive between Apache and Tomcat, causing requests to go the wrong way?
> - Or maybe it is related to the Apache version update (2.4.34 to 2.4.41)? But why are other installations with the same version not affected?
>
> Any ideas where we should start looking?
>
> Regards
>
> J
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Aw: Re: [users@httpd] Bizarre problem with Apache HTTPD, a number of Tomcats, mod_proxy_balancer and mod_jk - any ideas where to look for the root cause welcome [EXT]

Posted by Jürgen Göres <jg...@gmx.de>.
Hi,
 
users are using Chrome and Firefox, no pattern here.
What I didn't mention: this is all on AWS, and traffic goes through an ALB, where HTTP/2 was enabled. The ALB translates the requests tio HTTP 1.1 to talk to Apache. But that is the same on similar environments of other customers, which do not face this problem.
 
Regards
 
J
 
 

>Gesendet: Mittwoch, 18. März 2020 um 17:02 Uhr
>Von: "Dr James Smith" <js...@sanger.ac.uk>
>An: users@httpd.apache.org
>Betreff: Re: [users@httpd] Bizarre problem with Apache HTTPD, a number of Tomcats, mod_proxy_balancer and mod_jk - any ideas where to look for the root cause welcome [EXT]
>Do you see anything different between the users that work and the users
>that don't.. Do they use a different browser (useragent) or HTTP protocol?
>
>On 18/03/2020 12:40, "Jürgen Göres" wrote:
>> Hi all,
>>
>> we are currently observing a really bizarre problem on a customer system.
>> Our software runs a number of microservices on individual Tomcats, which we front with an Apache HTTPD (2.4.x) reverse proxy using mod_jk to route the requests by context. There is one exception, though: one of the microservices which we added to the stack at a later point in time uses websocksets, which are not supported through the AJP protocol, so we are using mod_proxy_balancer here.
>> We put the ProxyPass etc. rules for mod_proxy_balancer in front of the directives related to mod_jk and we have been mostly fine with this approach for a few years now. We have two sets of balancer specifications for mod_proxy_balancer and their associated rules, one for regular http traffic, the other for websocket traffic ("ws:" resp. "wss:").
>>
>> Let's name the microservices that are handled by mod_jk A, B, and C, and let's name the one handled by mod_proxy_balancer Z. Let's further assume that their request contexts are /a, /b, /c and /z, respectively.
>>
>> Now about the current customer problem: the customer started experiencing very erratic system behaviour. In particular requests that were meant for one of the microservices A-C handled by mod_jk would randomly give 404 responses. Usually, this situation would persist for an affected user for a few seconds and reloading wouldn't resolve it. At the same time, other users accessing the very same microservice didn't have a problem. Pretty much all users were affected from time to time.
>>
>> We did several troubleshooting sessions that turned up nothing. At some point, we started to monitor all kinds of traffic between HTTPD and the Tomcats with TCPdump, and here we found the bizarre thing:
>> When we ran TCP dump and filtered it to only show traffic between HTTPD and the microservice Z (handled by mod_proxy_balancer), we sometimes saw requests that were clearly meant for one of the OTHER microservices (A-C) based on the request URL (a, /b, /c) that would show up in the traffic to the microservice Z, and naturally microservice Z has no idea of what to do with these requests and responds with 404.
>>
>> What else might be relevant:
>> - our microservices are stateless, so we an scale horizontally if we want. On that particular system, we have at least two instances of each microservice (A-C and Z)
>> - the installation is spread across multiple nodes
>> - the nodes run on Linux
>> - Docker is not used ;-)
>> - we have never seen this problem on any other system
>> - we haven't seen this problem on the customer's test system, but here usage patterns are different
>> - the requests with 404 responses wouldn't show up in the HTTPD's access log (where "normal" 404 requests DO show).
>> - the customer had recently updated from a version of our product that uses Apache 2.4.34 to one using 2.4.41
>> - disabling the microservice Z (= no more balancer workers for mod_proxy_balancer) would resolve the problem
>> - putting the rules for mod_proxy_balancer after those of mod_jk (and adding an exclusion for /z there, cause on of the other microservices is actually listening on the root context) would NOT change a thing
>>
>> From experience, we are pretty sure that the problem is somewhere on our side. ;-)
>>
>> - One thing we thought is that maybe a bug in microservice Z that is only triggered by this customer's use of our product causes the erratic behaviour of the HTTPD/MPB? Maybe something we do wrong messing up the connection keepalive between Apache and Tomcat, causing requests to go the wrong way?
>> - Or maybe it is related to the Apache version update (2.4.34 to 2.4.41)? But why are other installations with the same version not affected?
>>
>> Any ideas where we should start looking?
>>
>> Regards
>>
>> J
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>> For additional commands, e-mail: users-help@httpd.apache.org
>>

--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org