You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@httpd.apache.org by Suvendu Sekhar Mondal <su...@gmail.com> on 2017/11/01 15:39:18 UTC

[users@httpd] Apache "marking down" a back-end server

Hello Everyone,

I am seeing one interesting behavior of Apache httpd.

We have multiple Apache httpds in front of set of Tomcat JVMs. I found
that sometimes *one of the httpds marking one of the JVMs down* for
180 Sec("retry" value). As a result, users logged on that JVM are
getting 5xx error. First, I suspected that long GCs are causing it but
it was not the case. We have 5 Sec of "ping" timeout and GCs during
problem period was 500ms-700ms. Also there were plenty of threads
available in the JVM to cater new requests. After some more drill-down
it was found that each of those "mark down" incidents are correlated
with some really long processing(800 Sec) on JVM which surpasses our
"ProxyTimeout" and "ttl" limits. Yes, some of the workflows of our app
can take that much time if they are processing large volume - we are
working on it.

My understanding is, these are not "ping" failure case where httpd
marks the JVM down. Being said that, can it happen that either
"ProxyTimeout" or "ttl" failure instructing httpd to mark the JVM
down? Or, do you think it is something else? Please let me know.

httpd version: 2.4.10

httpd setting:
ProxyTimeout 300

<Proxy balancer://mycluster>
ProxySet lbmethod=byrequests
ProxySet stickysession=JSESSIONID|jsessionid
ProxySet scolonpathdelim=On
ProxySet growth=2
ProxySet nofailover=On

BalancerMember http://abc route=abc keepalive=on ttl=300 ping=5 retry=180

</proxy>

Excerpts from httpd Error log:
[Wed Nov 01 08:17:39.221276 2017] [proxy_http:error] [pid 31848:tid
9828] (OS 10060)A connection attempt failed because the connected
party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond.  :
[client 10.254.52.48:13964] AH01102: error reading status line from
remote server abc, referer: xxx
[Wed Nov 01 08:17:39.221276 2017] [proxy:error] [pid 31848:tid 9828]
[client 10.254.52.48:13964] AH00898: Timeout on 100-Continue returned
by /xxx
[Wed Nov 01 08:17:39.221276 2017] [proxy_balancer:error] [pid
31848:tid 9828] [client 10.254.52.48:13964] AH01167:
balancer://mycluster: All workers are in error state for route (abc),
referer: xxx
[Wed Nov 01 08:17:39.346281 2017] [proxy_balancer:error] [pid
31848:tid 9760] [client 10.254.52.48:17783] AH01167:
balancer://mycluster: All workers are in error state for route (abc)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org

[users@httpd] Re: Apache "marking down" a back-end server

Posted by Suvendu Sekhar Mondal <su...@gmail.com>.

On Wed, Nov 1, 2017 at 9:09 PM, Suvendu Sekhar Mondal <su...@gmail.com> wrote:
> Hello Everyone,
>
> I am seeing one interesting behavior of Apache httpd.
>
> We have multiple Apache httpds in front of set of Tomcat JVMs. I found
> that sometimes *one of the httpds marking one of the JVMs down* for
> 180 Sec("retry" value). As a result, users logged on that JVM are
> getting 5xx error. First, I suspected that long GCs are causing it but
> it was not the case. We have 5 Sec of "ping" timeout and GCs during
> problem period was 500ms-700ms. Also there were plenty of threads
> available in the JVM to cater new requests. After some more drill-down
> it was found that each of those "mark down" incidents are correlated
> with some really long processing(800 Sec) on JVM which surpasses our
> "ProxyTimeout" and "ttl" limits. Yes, some of the workflows of our app
> can take that much time if they are processing large volume - we are
> working on it.
>
> My understanding is, these are not "ping" failure case where httpd
> marks the JVM down. Being said that, can it happen that either
> "ProxyTimeout" or "ttl" failure instructing httpd to mark the JVM
> down? Or, do you think it is something else? Please let me know.
>
> httpd version: 2.4.10
>
> httpd setting:
> ProxyTimeout 300
>
> <Proxy balancer://mycluster>
> ProxySet lbmethod=byrequests
> ProxySet stickysession=JSESSIONID|jsessionid
> ProxySet scolonpathdelim=On
> ProxySet growth=2
> ProxySet nofailover=On
>
> BalancerMember http://abc route=abc keepalive=on ttl=300 ping=5 retry=180
>
> </proxy>
>
> Excerpts from httpd Error log:
> [Wed Nov 01 08:17:39.221276 2017] [proxy_http:error] [pid 31848:tid
> 9828] (OS 10060)A connection attempt failed because the connected
> party did not properly respond after a period of time, or established
> connection failed because connected host has failed to respond.  :
> [client 10.254.52.48:13964] AH01102: error reading status line from
> remote server abc, referer: xxx
> [Wed Nov 01 08:17:39.221276 2017] [proxy:error] [pid 31848:tid 9828]
> [client 10.254.52.48:13964] AH00898: Timeout on 100-Continue returned
> by /xxx
> [Wed Nov 01 08:17:39.221276 2017] [proxy_balancer:error] [pid
> 31848:tid 9828] [client 10.254.52.48:13964] AH01167:
> balancer://mycluster: All workers are in error state for route (abc),
> referer: xxx
> [Wed Nov 01 08:17:39.346281 2017] [proxy_balancer:error] [pid
> 31848:tid 9760] [client 10.254.52.48:17783] AH01167:
> balancer://mycluster: All workers are in error state for route (abc)

Hello Everyone,

After some investigation I found that Apache is “marking down” a JVM
once ProxyTimeout elapsed. This is what happens:
 1. A process got kicked off on a JVM. Let’s assume it is going to
take lots of time(10 min) to complete.
 2. While this processing is halfway, ProxyTimeout(5 min) elapsed.
 3. Then Apache completely ignores default failontimeout=off setting
and marks the JVM down for next 180 Sec(retry value).
 4. Problem started!

This behavior sounds like a bug(?) to me because:
 - If you forcefully failed a HTTP GET request by elapsing
ProxyTimeout, Apache *do not* mark the JVM down. It only fails that
long running request with 502 error. That is expected.
 - If you do the same thing for a HTTP POST request, Apache *mark the
JVM down*. This is *NOT* a desired behavior.

I can reproduce the issue with Apache/2.4.25 also. Can I open a bug
for this behavior? Or, Is it already resolved? Please let me know.

Thanks!
Suvendu

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org