You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Paul Querna <pa...@querna.org> on 2011/09/11 21:10:53 UTC

www.apache.org upgraded to 2.3.15-dev-r116760

Infra has upgraded eos, aka the main webserver for *.apache.org to
2.3.15-dev-r116760

We started with going to 2.3.14-beta, but it was missing all the
range-header changes, so we decided to pull up to trunk at the current
time, which is r116760.

We ran into a few small issues in upgrading from 2.3.8:

  * mod_asis was demoted from 'most', we changed --enable-mods-shared
from 'most' to 'all'. [1]

  * mod_imagemap was also removed from 'most', but we just removed it
from our configuration.

  * We had a bad ServerName value of "bugs.*.apache.org", this caused
an error on startup, but was easily fixed by changing it to a
ServerAlias.

  * We had some painful issues dealing with --with-included-apr.
Basically we always ended up linking to the system APR installed in
/usr/local/lib.  We 'fixed' this for now by explicitly setting
LD_LIBRARY_PATH=/usr/local/apache2-install/www.apache.org/current/lib/
in our init scripts to , and when building httpd we temporarily 'hid'
the system APR by running `gzip /usr/local/lib/libapr*`, so that the
linker wouldn't pick it up. (hat trick to danielsh for the gzip idea).
    This issue caused BDB 4.2 to be used by APR-Util, which caused
mod_mbox to not work, since it was using BDB 4.8.  We are currently
looking at ways to fix our system APR, but it would be nice if
--with-included-apr consistently worked.

Bonus points for adding the connection count summary table into mod_status:
  <http://www.apache.org/server-status>

If you notice any issues, please let infrastructure@ know,

Thanks!

[1] - config.nice: https://gist.github.com/4b95f959282820bd6753

Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Stefan Fritsch wrote on Sat, Oct 15, 2011 at 23:58:18 +0200:
> On Tuesday 11 October 2011, Stefan Fritsch wrote:
> > Can you try setting MaxMemFree, maybe to 4000?
> 
> Actually, since you use FreeBSD and FreeBSD has an efficient malloc, 
> you can try something a lot lower, like 400 or even 1.

Thanks for following up.  Sorry for going silent after my last mail;
I'm not particularly familiar with httpd configuration, and found the
discussion inconclusive when I considered the IRC input as well as the
list's.

I'll ping people again and see if I can reach an actionable conclusion.

Thanks,

Daniel

Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Stefan Fritsch <sf...@sfritsch.de>.
On Tuesday 11 October 2011, Stefan Fritsch wrote:
> Can you try setting MaxMemFree, maybe to 4000?

Actually, since you use FreeBSD and FreeBSD has an efficient malloc, 
you can try something a lot lower, like 400 or even 1.

Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Stefan Fritsch <sf...@sfritsch.de>.
On Thursday 20 October 2011, Daniel Shahaf wrote:

> I gathered some stats on Mon/Tue when mail-archives.a.o traffic was
> routed entirely to aurora.a.o (our EU mirror).  This morning I
> installed current HEAD mod_mbox on eos, after flipping the DNS
> back yesterday, and graceful'd the server.
> 
> Right now I see httpd processes with around 3GB memory in top,
> whereas in three different 'while sleep 5 && $i++ < 5000' samples
> taken prior to flipping the DNS I see exactly one process having
> exceeded 1600MB memory (to 1613MB).

mod_mbox seems to use significant memory (i.e. a few MBs) for doing 
threading on large mboxes. Also, it reads the a mail it displays into 
memory in one chunk. Both could lead to the maximum memory used per 
request increasing compared to mod_mbox not being used. And because of 
the allocator behaviour I have described in a previous mail, this 
could lead to the process size increasing. I would be really 
interested how the memory usage changes if you set MaxMemFree.

> I'm not aware of differences other than flipping the DNS and
> upgrading the mod_mbox version to include the last week's fixes.

Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Stefan Fritsch wrote on Sun, Oct 16, 2011 at 16:25:28 +0200:
> On Sun, 16 Oct 2011, Daniel Shahaf wrote:
> 
> >sebb wrote on Sun, Oct 16, 2011 at 02:39:42 +0100:
> >>https://issues.apache.org/bugzilla/show_bug.cgi?id=51951
> >>
> >>Certain mails cause mod_mbox to hang; maybe it is chewing on memory.
> 
> There was one obvious memory leak in mod_mbox. It's only a few bytes
> per message displayed, but maybe it adds up. The fix is here
> (compiles, but otherwise untested):
> 
> http://svn.apache.org/viewvc/httpd/mod_mbox/trunk/module-2.0/mod_mbox_mime.c?r1=1184831&r2=1184830&pathrev=1184831

I gathered some stats on Mon/Tue when mail-archives.a.o traffic was
routed entirely to aurora.a.o (our EU mirror).  This morning I installed
current HEAD mod_mbox on eos, after flipping the DNS back yesterday, and
graceful'd the server.

Right now I see httpd processes with around 3GB memory in top, whereas
in three different 'while sleep 5 && $i++ < 5000' samples taken prior to
flipping the DNS I see exactly one process having exceeded 1600MB memory
(to 1613MB).

I'm not aware of differences other than flipping the DNS and upgrading
the mod_mbox version to include the last week's fixes.

Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Stefan Fritsch <sf...@sfritsch.de>.
On Sun, 16 Oct 2011, Daniel Shahaf wrote:

> sebb wrote on Sun, Oct 16, 2011 at 02:39:42 +0100:
>> https://issues.apache.org/bugzilla/show_bug.cgi?id=51951
>>
>> Certain mails cause mod_mbox to hang; maybe it is chewing on memory.

There was one obvious memory leak in mod_mbox. It's only a few bytes per 
message displayed, but maybe it adds up. The fix is here (compiles, but 
otherwise untested):

http://svn.apache.org/viewvc/httpd/mod_mbox/trunk/module-2.0/mod_mbox_mime.c?r1=1184831&r2=1184830&pathrev=1184831

mod_mbox issue 51951 Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
sebb wrote on Sun, Oct 16, 2011 at 12:45:26 +0100:
> On 16 October 2011 12:24, sebb <se...@gmail.com> wrote:
> > On 16 October 2011 06:44, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> >> sebb wrote on Sun, Oct 16, 2011 at 02:39:42 +0100:
> >>> https://issues.apache.org/bugzilla/show_bug.cgi?id=51951
> >>>
> >>> Certain mails cause mod_mbox to hang; maybe it is chewing on memory.
> >>
> >> Does that issue still reproduce now that I've cut mail-archives.a.o over
> >> to aurora (which runs a different httpd)?
> >
> > Yes, the Sep 29th mails from press@ don't display in Firefox.
> >
> > However, I just tried one of the same mails in Chrome, and it reports
> > an encoding error.
> 
> Sorry, that was a different mail from ooo-private.
> 
> I need to go back and retrace my steps to be absolutely clear.
> 
> Unfortunately, the only mails where I have noticed this are on
> member-private mailing lists (press and ooo-private), which may be a
> problem when posting screen-shots and contents.
> 
> I don't know how to create a private Buzilla issue, so should I post
> the evidence to a private JIRA issue?
> 

How about uploading the evidence to a member-only https location?  You
could put them on http://people.apache.org/~sebb/ with a .htaccess
password, and put the password in a sebb:member 0640 file.

Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
sebb wrote on Sun, Oct 16, 2011 at 02:39:42 +0100:
> https://issues.apache.org/bugzilla/show_bug.cgi?id=51951
> 
> Certain mails cause mod_mbox to hang; maybe it is chewing on memory.

Does that issue still reproduce now that I've cut mail-archives.a.o over
to aurora (which runs a different httpd)?

Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
[ Sorry again for the delayed reply. ]

Stefan Fritsch wrote on Tue, Oct 11, 2011 at 21:35:08 +0200:
> On Tuesday 11 October 2011, Jim Jagielski wrote:
> > On Oct 11, 2011, at 3:01 PM, Jim Jagielski wrote:
> > > On Oct 11, 2011, at 2:09 PM, Daniel Shahaf wrote:
> > >> Paul Querna wrote on Sun, Sep 11, 2011 at 12:10:53 -0700:
> > >>> Infra has upgraded eos, aka the main webserver for *.apache.org
> > >>> to 2.3.15-dev-r1167603
> > >> 
> > >> We observe processes whose memory, in trend, grows over time up
> > >> to 2.8GB and more, for example:
> > >> 
> > >> http://www.apache.org/eos/mem.txt
> > > 
> > > which mom?
> > 
> > mpm even :)
> 
> mpm event: http://www.apache.org/server-status
> At least I assume that's eos.
> 

Yes.  eos.apache.org == 140.211.11.131

> Is this a new problem since you upgraded?
> 

I'm told that it's not a new problem.

> 
> I think the problem may be the combination of
> - by default, httpd never gives memory back to the OS
> - mpm event creates transaction pool with a separate allocator for 
> every connection 
> - by default, it never destroys transaction pools
...
> Can you try setting MaxMemFree, maybe to 4000? This will limit the 
> maximum free memory kept in each allocator to 4000K. It will also 
> cause mpm event to destroy transaction pools if more than 128 * 0.75 
> == 96 are unused, but I am not sure that will make a difference if the 
> load is more or less constant.
> 

Thanks for your reply.

Besides the theory you suggest above, #asfinfra residents suggested to
isolate mod_mbox.  I started by testing that theory (sorry, it does
exist even though it wasn't on-list).

I changed mail-archives.apache.org DNS entry to point at aurora (in
Amsterdam) rather than eos (in OSUOSL).  Halfway through the one hour
DNS propagation period a graceful restart was done.  Looking at top now
(after DNS finished propagating), the memory use is at around 650MB and
at no point exceeded 750MB (in manual spot checks, not yet in a scripted
every-5-seconds check; I'll do that later).  By comparison, before the
DNS change and the graceful, I saw two processes at 2GB+, and in an
unscientific glimpse it did seem to be growing.

So, this is preliminary data that appears to corroborate the mod_mbox
theory.  It does not rule out the allocators theory, and I plan to
collect more data in the current configuration to see if the processes
continue to behave.

> 
> Cheers,
> Stefan

Thanks,

Daniel

Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Stefan Fritsch <sf...@sfritsch.de>.
On Tuesday 11 October 2011, Jim Jagielski wrote:
> On Oct 11, 2011, at 3:01 PM, Jim Jagielski wrote:
> > On Oct 11, 2011, at 2:09 PM, Daniel Shahaf wrote:
> >> Paul Querna wrote on Sun, Sep 11, 2011 at 12:10:53 -0700:
> >>> Infra has upgraded eos, aka the main webserver for *.apache.org
> >>> to 2.3.15-dev-r1167603
> >> 
> >> We observe processes whose memory, in trend, grows over time up
> >> to 2.8GB and more, for example:
> >> 
> >> http://www.apache.org/eos/mem.txt
> > 
> > which mom?
> 
> mpm even :)

mpm event: http://www.apache.org/server-status
At least I assume that's eos.

Is this a new problem since you upgraded?


I think the problem may be the combination of
- by default, httpd never gives memory back to the OS
- mpm event creates transaction pool with a separate allocator for 
every connection 
- by default, it never destroys transaction pools

This means that every allocator keeps the memory that was allocated 
for that connection (including the request pool). So the process will 
slowly approach the size

(max number of simultaneous connections) * (max memory used for one 
connection)

Max number of simultaneous connections == 3 * 128 == 384 for your 
configuration. While this limit is probably never reached, I have seen 
300 conns for one process in the server-status.

If you divide 2.8GB / 300, that's 9MB. That's not totally off as "max 
memory used for one connection" (depends on the modules/configuration 
you use).



Can you try setting MaxMemFree, maybe to 4000? This will limit the 
maximum free memory kept in each allocator to 4000K. It will also 
cause mpm event to destroy transaction pools if more than 128 * 0.75 
== 96 are unused, but I am not sure that will make a difference if the 
load is more or less constant.


Cheers,
Stefan

Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Jim Jagielski <ji...@apache.org>.
On Oct 11, 2011, at 3:01 PM, Jim Jagielski wrote:

> 
> On Oct 11, 2011, at 2:09 PM, Daniel Shahaf wrote:
> 
>> Paul Querna wrote on Sun, Sep 11, 2011 at 12:10:53 -0700:
>>> Infra has upgraded eos, aka the main webserver for *.apache.org to
>>> 2.3.15-dev-r1167603
>> 
>> We observe processes whose memory, in trend, grows over time up to 2.8GB
>> and more, for example:
>> 
>> http://www.apache.org/eos/mem.txt
>> 
> 
> which mom?
> 

mpm even :)

Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Oct 11, 2011, at 2:09 PM, Daniel Shahaf wrote:

> Paul Querna wrote on Sun, Sep 11, 2011 at 12:10:53 -0700:
>> Infra has upgraded eos, aka the main webserver for *.apache.org to
>> 2.3.15-dev-r1167603
> 
> We observe processes whose memory, in trend, grows over time up to 2.8GB
> and more, for example:
> 
> http://www.apache.org/eos/mem.txt
> 

which mom?

Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Paul Querna wrote on Sun, Sep 11, 2011 at 12:10:53 -0700:
> Infra has upgraded eos, aka the main webserver for *.apache.org to
> 2.3.15-dev-r1167603

We observe processes whose memory, in trend, grows over time up to 2.8GB
and more, for example:

http://www.apache.org/eos/mem.txt

Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Guenter Knauf <fu...@apache.org>.
Paul,
Am 12.09.2011 02:49, schrieb Paul Querna:
> Oops, that would actually be r1167603, dropped off the last character somewhere.
>
> On Sun, Sep 11, 2011 at 12:10 PM, Paul Querna<pa...@querna.org>  wrote:
>> Infra has upgraded eos, aka the main webserver for *.apache.org to
>> 2.3.15-dev-r116760
any thoughts about upgrading p.a.o to 2.2.21 ?

Gün.



Re: www.apache.org upgraded to 2.3.15-dev-r116760

Posted by Paul Querna <pa...@querna.org>.
Oops, that would actually be r1167603, dropped off the last character somewhere.

On Sun, Sep 11, 2011 at 12:10 PM, Paul Querna <pa...@querna.org> wrote:
> Infra has upgraded eos, aka the main webserver for *.apache.org to
> 2.3.15-dev-r116760
>
> We started with going to 2.3.14-beta, but it was missing all the
> range-header changes, so we decided to pull up to trunk at the current
> time, which is r116760.
>
> We ran into a few small issues in upgrading from 2.3.8:
>
>  * mod_asis was demoted from 'most', we changed --enable-mods-shared
> from 'most' to 'all'. [1]
>
>  * mod_imagemap was also removed from 'most', but we just removed it
> from our configuration.
>
>  * We had a bad ServerName value of "bugs.*.apache.org", this caused
> an error on startup, but was easily fixed by changing it to a
> ServerAlias.
>
>  * We had some painful issues dealing with --with-included-apr.
> Basically we always ended up linking to the system APR installed in
> /usr/local/lib.  We 'fixed' this for now by explicitly setting
> LD_LIBRARY_PATH=/usr/local/apache2-install/www.apache.org/current/lib/
> in our init scripts to , and when building httpd we temporarily 'hid'
> the system APR by running `gzip /usr/local/lib/libapr*`, so that the
> linker wouldn't pick it up. (hat trick to danielsh for the gzip idea).
>    This issue caused BDB 4.2 to be used by APR-Util, which caused
> mod_mbox to not work, since it was using BDB 4.8.  We are currently
> looking at ways to fix our system APR, but it would be nice if
> --with-included-apr consistently worked.
>
> Bonus points for adding the connection count summary table into mod_status:
>  <http://www.apache.org/server-status>
>
> If you notice any issues, please let infrastructure@ know,
>
> Thanks!
>
> [1] - config.nice: https://gist.github.com/4b95f959282820bd6753
>