You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Brian Hayward <bh...@gmail.com> on 2007/05/04 17:37:08 UTC

mod proxy disabling workers after a single error

We are experiencing intermittent "connect timeout" errors.  The remote
service is really "ok", but it still results in the worker being
disabled.

Setting the proxypass disable period using "retry=1" is not good
enough for the application I'm working with (given the number of
requests we do per second).

I couldn't find an option to turn this off completely, so I started
looking at the proxy source code.

I found PROXY_WORKER_IGNORE_ERRORS in proxy_util.c.  I'm not sure
where this should be set though.  Instead I just added a "FALSE" to
the if statment as follows:

    /*
     * Put the entire worker to error state if
     * the PROXY_WORKER_IGNORE_ERRORS flag is not set.
     * Altrough some connections may be alive
     * no further connections to the worker could be made
     */
    if (FALSE && !connected && PROXY_WORKER_IS_USABLE(worker) &&
        !(worker->s->status & PROXY_WORKER_IGNORE_ERRORS)) {
        worker->s->status |= PROXY_WORKER_IN_ERROR;
        worker->s->error_time = apr_time_now();
        ap_log_error(APLOG_MARK, APLOG_ERR, 0, s,
            "ap_proxy_connect_backend disabling worker for (%s)",
            worker->hostname);
    }
    else {
        worker->s->error_time = 0;
        worker->s->retries = 0;
    }

I have 2 questions:
1) What are the negative implications of disabling this?
2) Is there a cleaner way to accomplish this?

Thanks,
Brian Hayward

Re: mod proxy disabling workers after a single error

Posted by Ruediger Pluem <rp...@apache.org>.
On 05.05.2007 04:25, Brian Hayward wrote:


> BTW, I did test my patch when 1 host was down in a balancer
> configuration.  It still seemed to work well.

I would think so. My point was more about that with this setting the
response times of your reverse proxy will increase as it may try all failed
workers first before it finds a working one which could be a time consuming
operation escpecially if you have many failed workers.

> At this time, we lose one or two connections out of thousands.
> According to the network sniffer for these connections, packets are
> getting lost during the setup of the connection.  (For example - our
> sniffer sees a SYN go out, get retransmitted a couple times, but the
> SYN_ACK never returns for that one connection).  It could have
> something todo with the HW load balancer, the NAT firewall between us
> and the destination, remote web servers, or something else.  We're
> still working on the analysis.

Ouch. These are nasty errors.I wish you good luck in tracking them down.

> 
> As requested, here is the error:
> 
> [Tue May 01 14:36:30 2007] [error] (145)Connection timed out: proxy:
> HTTPS: attempt to connect to 1.2.3.4:443 (hostname.com) failed

Just one hint: If you do not have special needs that require
you to encrypt the connection to your backend you should not do this
for the following reasons:

1. Assuming that you are also doing https on the frontend side of your
reverse proxy doing encryption to the backend means that you need to
encrypt / decrypt your data twice: Once to the client and once to the
backend.

2. Currently you cannot have keepalive connections with mod_proxy to
a backend. This means that you effectively have *no* connection pooling
in this case which is bad in general. Even worse: You need to do a full
and expensive SSL handshake for *every* request to your backend.

Regards

RĂ¼diger


Re: mod proxy disabling workers after a single error

Posted by Brian Hayward <bh...@gmail.com>.
> With the following patch from trunk you are able to set retry to 0, which
> should fix your actual problem:

Thanks!

> But I think in general it is not advisable to do this at least if you are
> load balancing your backend / having a failover configuration. And even if
> you do not have such a configuration from my perspectivce it looks like to
> be important to find out why you get these connection timeouts.
> So what protocol do you speak with your backend? HTTP? What are the error
> messages you see in the error log?

BTW, I did test my patch when 1 host was down in a balancer
configuration.  It still seemed to work well.

I agree that there is a trade off for this.  If the remote system is
really slow or hung, connections can build up and overload Apache
(e.g. you run out of worker threads, run out of file descriptors,
etc).  As soon as we fix the network problem, we'll probably want to
use a higher retry value.

At this time, we lose one or two connections out of thousands.
According to the network sniffer for these connections, packets are
getting lost during the setup of the connection.  (For example - our
sniffer sees a SYN go out, get retransmitted a couple times, but the
SYN_ACK never returns for that one connection).  It could have
something todo with the HW load balancer, the NAT firewall between us
and the destination, remote web servers, or something else.  We're
still working on the analysis.

As requested, here is the error:

[Tue May 01 14:36:30 2007] [error] (145)Connection timed out: proxy:
HTTPS: attempt to connect to 1.2.3.4:443 (hostname.com) failed
[Tue May 01 14:36:30 2007] [error] ap_proxy_connect_backend disabling
worker for (hostname.com)

Thanks again for your help!
Brian

Re: mod proxy disabling workers after a single error

Posted by Ruediger Pluem <rp...@apache.org>.
On 04.05.2007 20:16, Brian Hayward wrote:
> Yea, as it currently stands, one timeout is causing us to lose up to
> 10 more transactions during the next second (with retry=1)

With the following patch from trunk you are able to set retry to 0, which
should fix your actual problem:

http://svn.apache.org/viewvc?view=rev&revision=451575
svn diff -r451574:451575 http://svn.apache.org/repos/asf/httpd/httpd/trunk


But I think in general it is not advisable to do this at least if you are
load balancing your backend / having a failover configuration. And even if
you do not have such a configuration from my perspectivce it looks like to
be important to find out why you get these connection timeouts.
So what protocol do you speak with your backend? HTTP? What are the error
messages you see in the error log?


Regards

RĂ¼diger


Re: mod proxy disabling workers after a single error

Posted by Jim Jagielski <ji...@jaguNET.com>.
Well, I doubt completely bypassing setting works to
being in the error state is something you want to do
lightly. :)

PROXY_WORKER_IGNORE_ERRORS is used when setting up
the "generic" forward and reverse workers, since they
are "shared" for all requests and not "specific"
to a balancer/url. I guess allowing someone
to set that as an option might be a potential
enhancement...

On May 4, 2007, at 2:16 PM, Brian Hayward wrote:

> Yea, as it currently stands, one timeout is causing us to lose up to
> 10 more transactions during the next second (with retry=1)
>
> Thanks,
> Brian Hayward
>
>
>
>
> On 5/4/07, Jim Jagielski <ji...@jagunet.com> wrote:
>>
>> On May 4, 2007, at 11:37 AM, Brian Hayward wrote:
>>
>> >
>> > I have 2 questions:
>> > 1) What are the negative implications of disabling this?
>> > 2) Is there a cleaner way to accomplish this?
>>
>> So you just want to setup Apache so that even if it
>> thinks there's an error, to just ignore it?
>>
>>
>


Re: mod proxy disabling workers after a single error

Posted by Brian Hayward <bh...@gmail.com>.
Yea, as it currently stands, one timeout is causing us to lose up to
10 more transactions during the next second (with retry=1)

Thanks,
Brian Hayward




On 5/4/07, Jim Jagielski <ji...@jagunet.com> wrote:
>
> On May 4, 2007, at 11:37 AM, Brian Hayward wrote:
>
> >
> > I have 2 questions:
> > 1) What are the negative implications of disabling this?
> > 2) Is there a cleaner way to accomplish this?
>
> So you just want to setup Apache so that even if it
> thinks there's an error, to just ignore it?
>
>

Re: mod proxy disabling workers after a single error

Posted by Jim Jagielski <ji...@jaguNET.com>.
On May 4, 2007, at 11:37 AM, Brian Hayward wrote:

>
> I have 2 questions:
> 1) What are the negative implications of disabling this?
> 2) Is there a cleaner way to accomplish this?

So you just want to setup Apache so that even if it
thinks there's an error, to just ignore it?