You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Stefan Fritsch <sf...@sfritsch.de> on 2010/06/26 20:49:58 UTC

Some memory usage optimizations

Hi,

there are some places where httpd uses more memory than necessary, 
which can increase cache misses and reduce performance on current 
hardware. Things I would like to change:

1) reorder structs to have fewer holes on 64bit arches

httpd.h has this nice comment since the days of 1.3:

/* Things placed at the end of the record to avoid breaking binary
 * compatibility.  It would be nice to remember to reorder the entire
 * record to improve 64bit alignment the next time we need to break
 * binary compatibility for some other reason.
 */

Pro: Examples for space savings on 64bit:

request_rec:   704 -> 672 bytes
server_rec:    208 -> 192 bytes
proxy_worker:  264 -> 208 bytes

Con: In some cases reordering the struct members makes the code harder 
to read because related members may no longer be grouped together.

Is it worth changing? Would somebody be -1?

(BTW, pahole is useful for finding such structs).


2) Use char instead of int for the module_levels vector in struct 
ap_logconfig. The length of the vector is ~100.

Pro: This may save ~300 bytes per server conf which are read often and 
therefore occupy CPU-cache memory

Con: On some architectures, accessing chars is slower than accessing 
ints.

Does anyone have an idea what is better, here? Int or char?

The same argument could be made for boolean flags in various other 
structs. But I don't think those are worth the effort.


3) In server/config.c, many config vectors are created (allocated and 
zeroed) with length (total_modules + DYNAMIC_MODULE_LIMIT). But this 
is only necessary before dynamic modules are loaded. Afterwards, 
total_modules is set to the total number of modules. Therefore 
allocating a vec of length total_modules should be enough. This would 
save zeroing 128 pointers per request and connection.

It seems the attached patch works and I could not find any problems 
when adding/removing modules during graceful restart. Objections?

Cheers,
Stefan

Re: Some memory usage optimizations

Posted by "William A. Rowe Jr." <wr...@rowe-clan.net>.
On 6/26/2010 5:27 PM, Stefan Fritsch wrote:
> But trunk has > 110 modules already. Maybe we should increase 
> DYNAMIC_MODULE_LIMIT from 128 to 192?

With your proposed optimizations, we can take it to 256 with still a
significant net win..

>>> 3) In server/config.c, many config vectors are created (allocated
>>> and  zeroed) with length (total_modules + DYNAMIC_MODULE_LIMIT).
>>> But this is only necessary before dynamic modules are loaded.
>>> Afterwards, total_modules is set to the total number of modules.
>>> Therefore allocating a vec of length total_modules should be
>>> enough. This would save zeroing 128 pointers per request and
>>> connection.
>>>
>>> It seems the attached patch works and I could not find any
>>> problems  when adding/removing modules during graceful restart.
>>> Objections?
>>
>> What of a separate alloc_modules variable, that would be init to
>> the old (total_modules + DYNAMIC_MODULE_LIMIT), and could be
>> truncated when the pre-config was complete to simply
>> total_modules.  This might even save additional
>> processing/merging/copying on the stale/unused NULL pointers.
> 
> I have commited something like this.

Will look Monday eve with anticipation, thanks!

Re: Some memory usage optimizations

Posted by Stefan Fritsch <sf...@sfritsch.de>.
On Saturday 26 June 2010, William A. Rowe Jr. wrote:
> > 2) Use char instead of int for the module_levels vector in
> > struct  ap_logconfig. The length of the vector is ~100.
> >
> > Pro: This may save ~300 bytes per server conf which are read
> > often and  therefore occupy CPU-cache memory
> >
> > Con: On some architectures, accessing chars is slower than
> > accessing  ints.
> >
> > Does anyone have an idea what is better, here? Int or char?
> >
> > The same argument could be made for boolean flags in various
> > other structs. But I don't think those are worth the effort.
> 
> Anytime it's part of a per-dir, I'd suggest char is probably
> better, or even bitfields.  The shifting pain is equal to slicing
> a char.

Since bit-field ordering is compiler-dependant, I think we should not 
use bit-fields in public headers. In private data, it may be ok. But 
accessing bit fields is even more expensive than accessing chars.

> But we are almost to 128 modules.  So be careful here, let's
> presume that folks continue to propagate specific bits of
> functionality into a module.

To make it clear: I only want to change the elements of the 
module_levels vector which only need to take values from -2 to 15. The 
index used to address the element will of course stay an int.

But trunk has > 110 modules already. Maybe we should increase 
DYNAMIC_MODULE_LIMIT from 128 to 192?

> 
> > 3) In server/config.c, many config vectors are created (allocated
> > and  zeroed) with length (total_modules + DYNAMIC_MODULE_LIMIT).
> > But this is only necessary before dynamic modules are loaded.
> > Afterwards, total_modules is set to the total number of modules.
> > Therefore allocating a vec of length total_modules should be
> > enough. This would save zeroing 128 pointers per request and
> > connection.
> >
> > It seems the attached patch works and I could not find any
> > problems  when adding/removing modules during graceful restart.
> > Objections?
> 
> What of a separate alloc_modules variable, that would be init to
> the old (total_modules + DYNAMIC_MODULE_LIMIT), and could be
> truncated when the pre-config was complete to simply
> total_modules.  This might even save additional
> processing/merging/copying on the stale/unused NULL pointers.

I have commited something like this.

Cheers,
Stefan

Re: Some memory usage optimizations

Posted by "William A. Rowe Jr." <wr...@rowe-clan.net>.
On 6/26/2010 1:49 PM, Stefan Fritsch wrote:
> Hi,
> 
> there are some places where httpd uses more memory than necessary, 
> which can increase cache misses and reduce performance on current 
> hardware. Things I would like to change:
> 
> 1) reorder structs to have fewer holes on 64bit arches
> 
> Pro: Examples for space savings on 64bit:
> 
> Con: In some cases reordering the struct members makes the code harder 
> to read because related members may no longer be grouped together.

Feel free to reorder for more-optimal storage, but please don't completely
disassociate related fields!  It makes debugging a pain in the neck.  For
that matter, lots of those 'add these to the end of the struct' fields should
be reordered anyways now that we are going towards 2.4.

This still needs to be human-readable.

> 2) Use char instead of int for the module_levels vector in struct 
> ap_logconfig. The length of the vector is ~100.
> 
> Pro: This may save ~300 bytes per server conf which are read often and 
> therefore occupy CPU-cache memory
> 
> Con: On some architectures, accessing chars is slower than accessing 
> ints.
> 
> Does anyone have an idea what is better, here? Int or char?
> 
> The same argument could be made for boolean flags in various other 
> structs. But I don't think those are worth the effort.

Anytime it's part of a per-dir, I'd suggest char is probably better, or
even bitfields.  The shifting pain is equal to slicing a char.

But we are almost to 128 modules.  So be careful here, let's presume that
folks continue to propagate specific bits of functionality into a module.

> 3) In server/config.c, many config vectors are created (allocated and 
> zeroed) with length (total_modules + DYNAMIC_MODULE_LIMIT). But this 
> is only necessary before dynamic modules are loaded. Afterwards, 
> total_modules is set to the total number of modules. Therefore 
> allocating a vec of length total_modules should be enough. This would 
> save zeroing 128 pointers per request and connection.
> 
> It seems the attached patch works and I could not find any problems 
> when adding/removing modules during graceful restart. Objections?

What of a separate alloc_modules variable, that would be init to the old
(total_modules + DYNAMIC_MODULE_LIMIT), and could be truncated when the
pre-config was complete to simply total_modules.  This might even save
additional processing/merging/copying on the stale/unused NULL pointers.