You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Arkadiusz Miśkiewicz <ar...@maven.pl> on 2013/04/11 13:43:03 UTC

2.4.4 graceful restart taking long time

Hello,

On apache 2.2.22, 2.2.23 and 2.4.4 I'm able to reproduce a problem
where graceful restart takes very long time. Linux 3.7.10, glibc 2.17 here.

keepalive is off, ServerLimit  600; StartServers  64; MinSpareServers 32;
MaxSpareServers 64; MaxClients 600; MaxRequestsPerChild  2000;
minimal number of apache modules

I'm running ab where 192.168.1.121 is my local ip:
/usr/bin/ab -n 100000 -c 64 http://192.168.1.121/
(ususally I'm interrupting ab in first 1-2 seconds of its run, starting it again and leaving running)

and then doing graceful restart while ab is still running:
[Thu Apr 11 13:27:32.624875 2013] [mpm_prefork:notice] [pid 10202] AH00171: Graceful restart requested, doing restart
[Thu Apr 11 13:28:15.836068 2013] [mpm_prefork:notice] [pid 10202] AH00163: Apache/2.4.4 (Unix) configured -- resuming normal operations

It takes almost 1 minute to resume operations. Now if I don't do
ab "magic" then resume is immediate (1-2s).

The whole time is spend in prefork ap_mpm_pod_killpg

        ap_log_error(APLOG_MARK, APLOG_NOTICE, 0, ap_server_conf, APLOGNO(00171)
                    "Graceful restart requested, doing restart");

        /* kill off the idle ones */
        ap_mpm_pod_killpg(pod, retained->max_daemons_limit);

where ap_mpm_pod_killpg iterates over childrens (?) and that takes ~1 minute.

I also did testing with StartServers 128. ap_mpm_pod_killpg had to iterate
over 128 childrens where 72 of them were done in the same second while
the rest in over minute - log here http://pastebin.com/3tP5hW1f.

Example strace of main httpd process while doing graceful restart:
http://pastebin.com/QFH5TjT6

The whole problem is that until ap_mpm_pod_killpg finishes apache
is not responding to new requests and the downtime is between 30s
to even 4 minutes.

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

Re: 2.4.4 graceful restart taking long time [PATCH]

Posted by Jeff Trawick <tr...@gmail.com>.
On Fri, Apr 12, 2013 at 7:36 AM, Arkadiusz Miśkiewicz <ar...@maven.pl>wrote:

> On Friday 12 of April 2013, Eric Covener wrote:
> > > diff -ur httpd-2.4.4.org/server/mpm_unix.chttpd-2.4.4/server/mpm_unix.c
> > > --- httpd-2.4.4.org/server/mpm_unix.c   2012-07-03 21:38:58.000000000
> > > +0200 +++ httpd-2.4.4/server/mpm_unix.c       2013-04-12
> > > 09:14:58.282929959 +0200 @@ -604,7 +604,17 @@
> > >
> > >          len = strlen(data);
> > >
> > >      }
> > >
> > > -    apr_socket_send(sock, data, &len);
> > > +    rv = apr_socket_send(sock, data, &len);
> > > +    if (rv == APR_SUCCESS) {
> > > +        char *buffer[10]; // dummy buffer to see if child is alive
> > > +
> > > +        len = sizeof(buffer);
> > > +        rv = apr_socket_recv(sock, buffer, &len);
> > > +        if (rv != APR_SUCCESS) {
> > > +            ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, ap_server_conf,
> > > APLOGNO(00057) +                         "receiving response to dummy
> > > data to listener on %pI", lp->bind_addr); +        }
> > > +    }
> > >
> > >      apr_socket_close(sock);
> > >      apr_pool_destroy(p);
>
> End of this is:
>
>     apr_socket_close(sock);
>     apr_pool_destroy(p);
>
>     return rv;
> >
> > Interesting analysis, I'm confused on how the patch helps though if it
> > doesn't actually do anything different when there is no reply (which
> > is more what I expected from the bullets)
>
> It returns error in rv


The implication of the recv was quite confusing to me initially when I
reviewed this again.  Rüdiger had reviewed an strace and determined that
connect() was taking longer and longer over time (presumably as the queue
in the kernel built up).  So with a timeout of 3 seconds you might have
connects that took increasing amounts of time all under 3 seconds that
could add up to extremely long delays.  Add the recv call and you cap the
total delay at essentially 3 seconds since getting connect+recv to work
within the timeout forces there to be a non-zombie, non-busy child on the
other end.

I don't see that Jim's proposal for the 2.4 branch matches Arkadiusz's
exact scenario/PR or suggested fix though I do understand that it could
quicken graceful restart for some scenarios.  I'll remove my note from
2.4.x/STATUS.


> which is propagated up to ap_mpm_pod_killpg
> which does
>     for (i = 0; i < num && rv == APR_SUCCESS; i++) {
>         rv = dummy_connection(pod);
>     }
>
> and on error will break the loop and stop doing dummy connections as there
> are
> no children anyway. graceful restart then can proceed without a delay.
>
> --
> Arkadiusz Miśkiewicz, arekm / maven.pl
>



-- 
Born in Roswell... married an alien...
http://emptyhammock.com/

Re: 2.4.4 graceful restart taking long time [PATCH]

Posted by Arkadiusz Miśkiewicz <ar...@maven.pl>.
On Friday 12 of April 2013, Eric Covener wrote:
> > diff -ur httpd-2.4.4.org/server/mpm_unix.c httpd-2.4.4/server/mpm_unix.c
> > --- httpd-2.4.4.org/server/mpm_unix.c   2012-07-03 21:38:58.000000000
> > +0200 +++ httpd-2.4.4/server/mpm_unix.c       2013-04-12
> > 09:14:58.282929959 +0200 @@ -604,7 +604,17 @@
> > 
> >          len = strlen(data);
> >      
> >      }
> > 
> > -    apr_socket_send(sock, data, &len);
> > +    rv = apr_socket_send(sock, data, &len);
> > +    if (rv == APR_SUCCESS) {
> > +        char *buffer[10]; // dummy buffer to see if child is alive
> > +
> > +        len = sizeof(buffer);
> > +        rv = apr_socket_recv(sock, buffer, &len);
> > +        if (rv != APR_SUCCESS) {
> > +            ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, ap_server_conf,
> > APLOGNO(00057) +                         "receiving response to dummy
> > data to listener on %pI", lp->bind_addr); +        }
> > +    }
> > 
> >      apr_socket_close(sock);
> >      apr_pool_destroy(p);

End of this is:

    apr_socket_close(sock);
    apr_pool_destroy(p);

    return rv;
> 
> Interesting analysis, I'm confused on how the patch helps though if it
> doesn't actually do anything different when there is no reply (which
> is more what I expected from the bullets)

It returns error in rv which is propagated up to ap_mpm_pod_killpg
which does
    for (i = 0; i < num && rv == APR_SUCCESS; i++) {
        rv = dummy_connection(pod);
    }

and on error will break the loop and stop doing dummy connections as there are 
no children anyway. graceful restart then can proceed without a delay.

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

Re: 2.4.4 graceful restart taking long time [PATCH]

Posted by Eric Covener <co...@gmail.com>.
> diff -ur httpd-2.4.4.org/server/mpm_unix.c httpd-2.4.4/server/mpm_unix.c
> --- httpd-2.4.4.org/server/mpm_unix.c   2012-07-03 21:38:58.000000000 +0200
> +++ httpd-2.4.4/server/mpm_unix.c       2013-04-12 09:14:58.282929959 +0200
> @@ -604,7 +604,17 @@
>          len = strlen(data);
>      }
>
> -    apr_socket_send(sock, data, &len);
> +    rv = apr_socket_send(sock, data, &len);
> +    if (rv == APR_SUCCESS) {
> +        char *buffer[10]; // dummy buffer to see if child is alive
> +
> +        len = sizeof(buffer);
> +        rv = apr_socket_recv(sock, buffer, &len);
> +        if (rv != APR_SUCCESS) {
> +            ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, ap_server_conf, APLOGNO(00057)
> +                         "receiving response to dummy data to listener on %pI", lp->bind_addr);
> +        }
> +    }
>      apr_socket_close(sock);
>      apr_pool_destroy(p);

Interesting analysis, I'm confused on how the patch helps though if it
doesn't actually do anything different when there is no reply (which
is more what I expected from the bullets)

Re: 2.4.4 graceful restart taking long time [PATCH]

Posted by Arkadiusz Miśkiewicz <ar...@maven.pl>.
On Friday 12 of April 2013, Arkadiusz Miśkiewicz wrote:

> 6) possible solution - detect that sending dummy_connection data weren't
> processed by children by trying to read reply
> 
> diff -ur httpd-2.4.4.org/server/mpm_unix.c httpd-2.4.4/server/mpm_unix.c
> --- httpd-2.4.4.org/server/mpm_unix.c	2012-07-03 21:38:58.000000000 +0200
> +++ httpd-2.4.4/server/mpm_unix.c	2013-04-12 09:14:58.282929959 +0200
> @@ -604,7 +604,17 @@
>          len = strlen(data);
>      }
> 
> -    apr_socket_send(sock, data, &len);
> +    rv = apr_socket_send(sock, data, &len);
> +    if (rv == APR_SUCCESS) {
> +        char *buffer[10]; // dummy buffer to see if child is alive
> +
> +        len = sizeof(buffer);
> +        rv = apr_socket_recv(sock, buffer, &len);
> +        if (rv != APR_SUCCESS) {
> +            ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, ap_server_conf,
> APLOGNO(00057) +                         "receiving response to dummy data
> to listener on %pI", lp->bind_addr); +        }
> +    }
>      apr_socket_close(sock);
>      apr_pool_destroy(p);

Not sure how to deal with SSL dummy connection though. With SSL children sends 
no reply in response to tls10_close_notify.

apr_socket_recv method works only for non-SSL connections.

Maybe there is another way to check which children are alive at 
dummy_connection level (and avoiding sending anything if there are no 
children) ?

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

Re: 2.4.4 graceful restart taking long time [PATCH]

Posted by Arkadiusz Miśkiewicz <ar...@maven.pl>.
On Thursday 11 of April 2013, Arkadiusz Miśkiewicz wrote:
> Hello,
> 
> On apache 2.2.22, 2.2.23 and 2.4.4 I'm able to reproduce a problem
> where graceful restart takes very long time. Linux 3.7.10, glibc 2.17 here.

Summary of my findings

1) main process on gracefull restart calls 
/* kill off the idle ones */
ap_mpm_pod_killpg(pod, retained->max_daemons_limit);

and that does dummy_connection() for each pod. This works fast
(1-2s) in most cases where childrens are alive but not in this
case - children exit earlier.

2) children exit in 
        else if (retained->my_generation !=
                 ap_scoreboard_image->global->running_generation) { /* 
restart? */
            /* yeah, this could be non-graceful restart, in which case the
             * parent will kill us soon enough, but why bother checking?
             */
            die_now = 1;
        }

3) in my case main process works slowly enough that it doesn't finish
doing ap_mpm_pod_killpg before all children exit like in 2)

4) so we are ending up in situation where main process is in middle
of ap_mpm_pod_killpg while all children already exited in 2). Main
process is continuing to do dummy_connection and these connect()s
+ dummy data send, polling is taking very long time (like 1-2s for each
dummy_connection). This is the primary reason why graceful is painfully
slow here (1-5 minutes) sometimes.

5) why connect() and socket sending in dummy_connect succeeds?
If it failed then ap_mpm_pod_killpg loop would break and things would
be fast.

Because when all children exited there is main process that still
holds listening socket and connection go to this socket. Verified that
by doing external telnet connection to the same IP/port dummy_connection
is doing - telnet connect() succeeded and telnet waits until next
generation comes up. This is why connect()/apr_socket_send()
in dummy_connect() dont fail.

6) possible solution - detect that sending dummy_connection data weren't
processed by children by trying to read reply

diff -ur httpd-2.4.4.org/server/mpm_unix.c httpd-2.4.4/server/mpm_unix.c
--- httpd-2.4.4.org/server/mpm_unix.c	2012-07-03 21:38:58.000000000 +0200
+++ httpd-2.4.4/server/mpm_unix.c	2013-04-12 09:14:58.282929959 +0200
@@ -604,7 +604,17 @@
         len = strlen(data);
     }
 
-    apr_socket_send(sock, data, &len);
+    rv = apr_socket_send(sock, data, &len);
+    if (rv == APR_SUCCESS) {
+        char *buffer[10]; // dummy buffer to see if child is alive
+
+        len = sizeof(buffer);
+        rv = apr_socket_recv(sock, buffer, &len);
+        if (rv != APR_SUCCESS) {
+            ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, ap_server_conf, APLOGNO(00057)
+                         "receiving response to dummy data to listener on %pI", lp->bind_addr);
+        }
+    }
     apr_socket_close(sock);
     apr_pool_destroy(p);
 

without patch (40s):
[Fri Apr 12 09:08:40.785226 2013] [mpm_prefork:notice] [pid 6790] AH00171: Graceful restart requested, doing restart
[Fri Apr 12 09:09:20.968224 2013] [mpm_prefork:notice] [pid 6790] AH00163: Apache/2.4.4 (Unix) configured -- resuming normal operations

with patch (3s):
[Fri Apr 12 09:10:01.902779 2013] [mpm_prefork:notice] [pid 7532] AH00171: Graceful restart requested, doing restart
[Fri Apr 12 09:10:04.973715 2013] [mpm_prefork:notice] [pid 7532] AH00163: Apache/2.4.4 (Unix) configured -- resuming normal operations

strace with the patch:
http://ixion.pld-linux.org/~arekm/apache4.txt

7) How to reproduce?
Main process need to be slowed down, so all children have a chance to exit
in 2) before 1) finishes.

In my case "strace -ttT -s 300 -p pido-of-main-apache-process" is enough.

Then run ab -n 100000 -c 100 http://apache.ip and in parallel issue graceful restart before ab finishes.

Look at error_log if graceful time was longer than it should be. Strace should show
dummy_connection connects(), polling etc.

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

Re: 2.4.4 graceful restart taking long time

Posted by Arkadiusz Miśkiewicz <ar...@maven.pl>.
On Thursday 11 of April 2013, Arkadiusz Miśkiewicz wrote:

> Another strace including children
> http://ixion.pld-linux.org/~arekm/apache1.txt
> 
> Graceful restart and first OPTIONS write at
> 
> 12594 15:10:01.397356 write(7, "OPTIONS * HTTP/1.0\r\nUser-Agent:
> Apache/2.4.4 (Unix) (internal dummy connection)\r\n\r\n", 83 <unfinished
> ...>

Another observation.

When this happens (when dummy_connection() is called, my " XXX dummy 2 
processing X" are printed) then in that time all children are already 
zombie/defunct, so they cannot answer/do anything. The only living process is 
master process.

> 
> but first read is long after write
> 
> 13279 15:10:52.636685 <... read resumed> "OPTIONS * HTTP/1.0\r\nUser-Agent:
> Apache/2.4.4 (Unix) (internal dummy connection)\r\n\r\n", 8000) = 83
> <0.000014>

And that's read by new generation I think

Old generation childs exit in:

        else if (retained->my_generation !=
                 ap_scoreboard_image->global->running_generation) { /* 
restart? */
            /* yeah, this could be non-graceful restart, in which case the
             * parent will kill us soon enough, but why bother checking?
             */
            die_now = 1;
        }

So looks like master process is trying to send data (in dummy_connection()) to 
childrens that already exit()ed leading to whole slowdown. This is how it 
looks http://ixion.pld-linux.org/~arekm/apache3.txt

Opinions?


13:38:03.383798 connect(7, {sa_family=AF_INET, sin_port=htons(80), 
sin_addr=inet_addr("192.168.1.121")}, 16) = -1 EINPROGRESS (Operation now in 
progress) <0.000024>
13:38:03.383841 poll([{fd=7, events=POLLOUT}], 1, 3000) = 1 ([{fd=7, 
revents=POLLOUT}]) <0.000006>
13:38:03.383872 getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 <0.000006>
13:38:03.383897 write(7, "OPTIONS * HTTP/1.0\r\nUser-Agent: Apache/2.4.4 
(Unix) (internal dummy connection)\r\n\r\n", 83) = 83 <0.000022>

4 in SO_ERROR is a "Interrupted system call", so could support my suspicion

> 
> Huh?
> 
> That's even after resuming operations mesage which was a bit earlier:
> 
> 12594 15:10:52.606253 write(2, "[Thu Apr 11 15:10:52.606236 2013]
> [mpm_prefork:notice] [pid 12594] AH00163: Apache/2.4.4 (Unix) configured --
> resuming normal operations\n", 137) = 137 <0.000009>
> 
> Compare that to case where ab wasn't running:
> http://ixion.pld-linux.org/~arekm/apache-no-ab.txt
> write OPTIONS in main process and read in childrens are interleaving as
> expected.
> 
> Now why these aren't interleaving in apache1.txt where ab was running?
> 
> > As the accept call
> > on httpd side only returns when the first data is send on the socket,
> > IMHO the time the poll takes does take place in the kernel and not in
> > the httpd children code.
> 
> Well, I think kernel is not the one to blame.
> 
> > Have you checked your messages file if the kernel reports
> > something when this happens? How does your run queue and CPU load look
> > like when this happens (top)?
> 
> No kernel messages, no cpu spikes - looks normal.
> 
> > Regards
> > Rüdiger


-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

Re: 2.4.4 graceful restart taking long time

Posted by Arkadiusz Miśkiewicz <ar...@maven.pl>.
On Thursday 11 of April 2013, Plüm, Rüdiger, Vodafone Group wrote:

> > On apache 2.2.22, 2.2.23 and 2.4.4 I'm able to reproduce a problem
> > where graceful restart takes very long time. Linux 3.7.10, glibc 2.17
> > here.
> > Example strace of main httpd process while doing graceful restart:
> > http://pastebin.com/QFH5TjT6
> 
> From the strace

Here is similar strace but without ab running, one second is enough
http://pastebin.com/HKjxxP2p (StartServers 64 this time, ab -c 64)

> it looks like the connect takes a lot of time later on (the
> poll is waiting 1 second for the connect to complete). 

That strace was done with StartServers 128 but ab was using -c 64, so looks 
like idle children responded fast (that first second), busy - too slowly.

Another strace including children
http://ixion.pld-linux.org/~arekm/apache1.txt

Graceful restart and first OPTIONS write at 

12594 15:10:01.397356 write(7, "OPTIONS * HTTP/1.0\r\nUser-Agent: Apache/2.4.4 
(Unix) (internal dummy connection)\r\n\r\n", 83 <unfinished ...>

but first read is long after write

13279 15:10:52.636685 <... read resumed> "OPTIONS * HTTP/1.0\r\nUser-Agent: 
Apache/2.4.4 (Unix) (internal dummy connection)\r\n\r\n", 8000) = 83 
<0.000014>

Huh?

That's even after resuming operations mesage which was a bit earlier:

12594 15:10:52.606253 write(2, "[Thu Apr 11 15:10:52.606236 2013] 
[mpm_prefork:notice] [pid 12594] AH00163: Apache/2.4.4 (Unix) configured -- 
resuming normal operations\n", 137) = 137 <0.000009>

Compare that to case where ab wasn't running:
http://ixion.pld-linux.org/~arekm/apache-no-ab.txt
write OPTIONS in main process and read in childrens are interleaving as 
expected.

Now why these aren't interleaving in apache1.txt where ab was running?

> As the accept call
> on httpd side only returns when the first data is send on the socket, IMHO
> the time the poll takes does take place in the kernel and not in the httpd
> children code.

Well, I think kernel is not the one to blame.

> Have you checked your messages file if the kernel reports
> something when this happens? How does your run queue and CPU load look
> like when this happens (top)?

No kernel messages, no cpu spikes - looks normal.

> Regards
> Rüdiger

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

RE: 2.4.4 graceful restart taking long time

Posted by Plüm, Rüdiger, Vodafone Group <ru...@vodafone.com>.

> -----Original Message-----
> From: Arkadiusz Miśkiewicz [mailto:arekm@maven.pl]
> Sent: Donnerstag, 11. April 2013 13:43
> To: dev@httpd.apache.org
> Subject: 2.4.4 graceful restart taking long time
> 
> 
> Hello,
> 
> On apache 2.2.22, 2.2.23 and 2.4.4 I'm able to reproduce a problem
> where graceful restart takes very long time. Linux 3.7.10, glibc 2.17
> here.
> Example strace of main httpd process while doing graceful restart:
> http://pastebin.com/QFH5TjT6

From the strace it looks like the connect takes a lot of time later on (the poll is waiting 1 second for
the connect to complete). As the accept call on httpd side only returns when the first data is send on the socket,
IMHO the time the poll takes does take place in the kernel and not in the httpd children code.
Have you checked your messages file if the kernel reports something when this happens?
How does your run queue and CPU load look like when this happens (top)?

Regards

Rüdiger