You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Brian Behlendorf <br...@hyperreal.org> on 1998/05/22 20:27:11 UTC

Re: cvs commit: apache-1.3/src/modules/standard mod_log_referer.c

At 07:00 PM 5/22/98 +0100, Ben Laurie wrote:
>Err? Surely alloca only hits the kernel if you have to grow the stack as
>a result? And, of course, it gives the memory back much sooner than
>pstrdup does.

Like I said, I don't know.  Others have pointed out man pages that say it's
highly machine dependent and should be avoided in cross-platform code.  On
the other hand we use it in a couple places in our core code, so it must
work well enough.  None of the other modules use it however.  It'd be
interesting to see benchmarks...

	Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
pure chewing satisfaction                                  brian@apache.org
                                                        brian@hyperreal.org

Re: cvs commit: apache-1.3/src/modules/standard mod_log_referer.c

Posted by Ben Laurie <be...@algroup.co.uk>.

Alexei Kosut wrote:
> And thanks to the nice Sun people, from the Solaris alloca(3C) man page:
> 
> "alloca() is machine-, compiler-, and most of all, system-dependent. Its
> use is strongly discouraged."

Seeing as Sun are invariably wrong about everything, does this mean we
should use it at all costs? :-)

Cheers,

Ben.

-- 
Ben Laurie            |Phone: +44 (181) 735 0686|  Apache Group member
Freelance Consultant  |Fax:   +44 (181) 735 0689|http://www.apache.org
and Technical Director|Email: ben@algroup.co.uk |
A.L. Digital Ltd,     |Apache-SSL author    http://www.apache-ssl.org/
London, England.      |"Apache: TDG" http://www.ora.com/catalog/apache

Re: alloca (was Re: cvs commit: apache-1.3/src/modules/standard mod_log_referer.c)

Posted by Dean Gaudet <dg...@arctic.org>.

On Sun, 24 May 1998, Alexei Kosut wrote:

> On Sun, 24 May 1998, Dean Gaudet wrote:
> 
> > > people to buy computers that make sense. Whose bright idea was it to give
> > > the x86 only 16 registers anyway? SPARC has 40. PA-RISC chips have 72. And
> > > those are only the integer ones.
> > 
> > x86 has 6 that are of general use, and 2 that typically have hardwired
> > uses (%esp, %ebp).  It would be heavenly to have 16 registers... maybe
> > you're thinking of 680x0.
> 
> When I type "info registers" into gdb, it lists 16 entries. That's what I
> was counting. Is that wrong?

Yeah, half of those are useless in a general sense.  All the segment
registers (cs, ss, ds, es, fs, gs) are legacy crud.  Nobody uses segmented
32-bit architectures... and even if they did, these registers are just
extensions of the 8 general purpose registers to form full pointers.

>From the point of view of a compiler it has 6 registers (or 7 depending on
the stack frame convention).

Dean

Re: alloca (was Re: cvs commit: apache-1.3/src/modules/standard mod_log_referer.c)

Posted by Alexei Kosut <ak...@leland.Stanford.EDU>.

On Sun, 24 May 1998, Dean Gaudet wrote:

> > people to buy computers that make sense. Whose bright idea was it to give
> > the x86 only 16 registers anyway? SPARC has 40. PA-RISC chips have 72. And
> > those are only the integer ones.
> 
> x86 has 6 that are of general use, and 2 that typically have hardwired
> uses (%esp, %ebp).  It would be heavenly to have 16 registers... maybe
> you're thinking of 680x0.

When I type "info registers" into gdb, it lists 16 entries. That's what I
was counting. Is that wrong?

-- Alexei Kosut <ak...@stanford.edu> <http://www.stanford.edu/~akosut/>
   Stanford University, Class of 2001 * Apache <http://www.apache.org> *

Re: alloca (was Re: cvs commit: apache-1.3/src/modules/standard mod_log_referer.c)

Posted by Dean Gaudet <dg...@arctic.org>.

That's the implementation I meant when I said "we can implement something
like alloca everywhere" (or however I said it)... it's a hack, and it's
required because the gcc source code uses alloca(), so it needs an
alloca() implementation to bootstrap itself.  It doesn't require any
assembly to work, my guess is they use it for a few optimizations. 

As far as how gcc implements alloca() for stuff it compiles -- it's
probably done in terms of internal abstract register names like "frame
pointer" and "stack pointer".  It's all stuff the front-end can do, the
back-end doesn't need any knowledge of it.  In fact alloca() is
essentially how variable sized arrays are implemented (which you need for
pascal, modula-3, and which are an extension in GNU C). 

Dean

On Sun, 24 May 1998, Marc Slemko wrote:

> On Sun, 24 May 1998, Dean Gaudet wrote:
> 
> > alloca essentially has to use compiler-dependant features... it's either
> > implemented as a builtin (as it is in gcc), or using extensions (like
> > inline assembly, which is how it's done in WATCOM C). 
> 
> /usr/src/contrib/gcc/alloca.c:
> 
> /* alloca.c -- allocate automatically reclaimed memory
>    (Mostly) portable public-domain implementation -- D A Gwyn
> 
>    This implementation of the PWB library alloca function,
>    which is used to allocate space off the run-time stack so
>    that it is automatically reclaimed upon procedure exit,
>    was inspired by discussions with J. Q. Johnson of Cornell.
>    J.Otto Tennant <jo...@cray.com> contributed the Cray support.
> 
>    There are some preprocessor constants that can
>    be defined when compiling for your specific system, for
>    improved efficiency; however, the defaults should be okay.
> 
>    The general concept of this implementation is to keep
>    track of all alloca-allocated blocks, and reclaim any
>    that are found to be deeper in the stack than the current
>    invocation.  This heuristic does not reclaim storage as
>    soon as it becomes invalid, but it will do so eventually.
> 
>    As a special case, alloca(0) reclaims storage without
>    allocating any.  It is a good idea to use alloca(0) in
>    your main control loop, etc. to force garbage collection.  */
> 
> This is from the FreeBSD gcc 2.7.2.1 source.  It is using malloc and free.
> There are processor specific assembly implementations, but does gcc have
> them for every processor or does it use the malloc/free one for some?  
> 
> 
>

Re: alloca (was Re: cvs commit: apache-1.3/src/modules/standard mod_log_referer.c)

Posted by Marc Slemko <ma...@worldgate.com>.

On Sun, 24 May 1998, Dean Gaudet wrote:

> alloca essentially has to use compiler-dependant features... it's either
> implemented as a builtin (as it is in gcc), or using extensions (like
> inline assembly, which is how it's done in WATCOM C). 

/usr/src/contrib/gcc/alloca.c:

/* alloca.c -- allocate automatically reclaimed memory
   (Mostly) portable public-domain implementation -- D A Gwyn

   This implementation of the PWB library alloca function,
   which is used to allocate space off the run-time stack so
   that it is automatically reclaimed upon procedure exit,
   was inspired by discussions with J. Q. Johnson of Cornell.
   J.Otto Tennant <jo...@cray.com> contributed the Cray support.

   There are some preprocessor constants that can
   be defined when compiling for your specific system, for
   improved efficiency; however, the defaults should be okay.

   The general concept of this implementation is to keep
   track of all alloca-allocated blocks, and reclaim any
   that are found to be deeper in the stack than the current
   invocation.  This heuristic does not reclaim storage as
   soon as it becomes invalid, but it will do so eventually.

   As a special case, alloca(0) reclaims storage without
   allocating any.  It is a good idea to use alloca(0) in
   your main control loop, etc. to force garbage collection.  */

This is from the FreeBSD gcc 2.7.2.1 source.  It is using malloc and free.
There are processor specific assembly implementations, but does gcc have
them for every processor or does it use the malloc/free one for some?

Re: alloca (was Re: cvs commit: apache-1.3/src/modules/standard mod_log_referer.c)

Posted by Dean Gaudet <dg...@arctic.org>.

On Sun, 24 May 1998, Alexei Kosut wrote:

> Sounds good. But if the compiler's generated code references stack data
> using positive offsets from the stack pointer (%esp? My x86 assembly
> knowledge is non-existant; the only assembly I've ever done is SPARC, and
> only a few lines of that) instead of negative offsets from the frame
> poiner (%ebp, I presume), then alloca() will mess up basically all the
> code that follows it, correct? 

On x86 code can be generated both ways.  If you use gcc
-fomit-frame-pointer for example, you'll get %esp relative stack frame
addressing, and get another free general register (%ebp)...  until you do
something which has an "unknown"  effect on %esp, such as alloca().  At
that point gcc will set up an %ebp frame and use %ebp instead. 

There are other reasons to use %esp even when you've got %ebp set up, one
case (which none of the gcc family generates that I know of) is when
you're doing floating point code -- the x86 ABI doesn't require the stack
to be 8-byte aligned, it only requires 4-byte alignment.  So when you
spill floating-point temporaries onto the integer stack it's 50% likely
they'll be unaligned (on an address == 4 mod 8).  High end x86 compilers
can do stuff like this to avoid this unaligned performance hit: 

    pushl %ebp
    movl %ebp,%esp
    andl #-8,%esp
    subl #-frame_size,%esp

which gets an 8-byte aligned %esp... and then temporaries are referenced
off of %esp and %ebp is only used to restore the stack at the end of the
function.  (There's an alternative that gives you %ebp as a general
register too...) 

> That sounds right. Assuming that alloca() is implemented inline. If it's a
> library function call, it might have to do all sorts of odd things to be
> able to manipulate the activation record of the previous stack frame. And
> that seems a bit iffy to me, especially if you're mixing code generated by
> a multiplicity of compilers with various options on different systems.
> 
> Of course, if you're only concerned with gcc on recent revisions of major
> systems (as you've said you are), then I guess that isn't a problem.

alloca essentially has to use compiler-dependant features... it's either
implemented as a builtin (as it is in gcc), or using extensions (like
inline assembly, which is how it's done in WATCOM C). 

There is no multi-compiler problem -- because alloca() does not violate the
ABI in any way.  Compilers can only interoperate if they use the same
conventions...

> It seems to me that if function call overhead (which is, after all, just a
> few machine instructions) is something we need to worry about on a given
> architecture, then we should take all the CPUs and squish them, and force
> people to buy computers that make sense. Whose bright idea was it to give
> the x86 only 16 registers anyway? SPARC has 40. PA-RISC chips have 72. And
> those are only the integer ones.

x86 has 6 that are of general use, and 2 that typically have hardwired
uses (%esp, %ebp).  It would be heavenly to have 16 registers... maybe
you're thinking of 680x0.

Dean

Re: alloca (was Re: cvs commit: apache-1.3/src/modules/standard mod_log_referer.c)

Posted by Marc Slemko <ma...@worldgate.com>.

On Sun, 24 May 1998, Alexei Kosut wrote:

> It seems to me that if function call overhead (which is, after all, just a
> few machine instructions) is something we need to worry about on a given
> architecture, then we should take all the CPUs and squish them, and force
> people to buy computers that make sense. Whose bright idea was it to give
> the x86 only 16 registers anyway? SPARC has 40. PA-RISC chips have 72. And
> those are only the integer ones.

Yea, but they _need_ more because you need to use registers more.

Trying to figure out why the x86 does certain things certain ways can be a
quite entertaining history lesson.

Re: alloca (was Re: cvs commit: apache-1.3/src/modules/standard mod_log_referer.c)

Posted by Alexei Kosut <ak...@leland.Stanford.EDU>.

On Sun, 24 May 1998, Dean Gaudet wrote:

> > My understanding was that one should never use alloca. Not if one expects
> > code to work correctly, or at all.
> 
> Nah, it's safe, it's just not portable.

You're right, of course. I realized this about five seconds after I sent
the email, when I stopped to think about how alloca() would be
implemented. Or at least how I might implement it.

> On traditional systems on which the stack just grows down (or up) from
> some fixed point alloca() is trivial.  The main reason the compiler needs
> to know about it is that it must set up a stack frame to use it.  In
> x86 parlance, the stack frame setup/destruction looks like this:
> 
>     pushl %ebp
>     movl %esp,%ebp
>     subl #-20,%esp	! create a 20 byte stack frame
>     ... rest of the function goes here, and references negative offsets
>     ... from %ebp to access elements of the stack frame

Sounds good. But if the compiler's generated code references stack data
using positive offsets from the stack pointer (%esp? My x86 assembly
knowledge is non-existant; the only assembly I've ever done is SPARC, and
only a few lines of that) instead of negative offsets from the frame
poiner (%ebp, I presume), then alloca() will mess up basically all the
code that follows it, correct? 

>     popl %ebp

> In this context you can (almost) implement alloca just like this:
> 
>     subl #-nnnn,%esp
>     movl %esp,%eax
> 
> Where nnnn is a 4-byte aligned size, and the resulting pointer is %eax.

That sounds right. Assuming that alloca() is implemented inline. If it's a
library function call, it might have to do all sorts of odd things to be
able to manipulate the activation record of the previous stack frame. And
that seems a bit iffy to me, especially if you're mixing code generated by
a multiplicity of compilers with various options on different systems.

Of course, if you're only concerned with gcc on recent revisions of major
systems (as you've said you are), then I guess that isn't a problem.

[description of stack pages on x86-based systems excised]

> palloc() is actually not much more than the above normally... except that
> it incurs a function call overhead.  On non-x86 architectures, it's
> possible that alloca() will cause a function to end up with a stack
> frame it wouldn't otherwise need -- but on x86 almost most functions have
> stack frames because the x86 has so few registers the compiler needs
> somewhere to spill temporaries.

It seems to me that if function call overhead (which is, after all, just a
few machine instructions) is something we need to worry about on a given
architecture, then we should take all the CPUs and squish them, and force
people to buy computers that make sense. Whose bright idea was it to give
the x86 only 16 registers anyway? SPARC has 40. PA-RISC chips have 72. And
those are only the integer ones.

But that has nothing to do with Apache, I guess. Or anything, really.

> Using a large buffer on the stack is about the same performance-wise as
> alloca, except that it gives you fixed lengths for things, and tends to
> make stacks that are far larger than they need to be.  I still haven't
> done the analysis, but I need to figure out what stack depth we achieve
> regularly in apache, because plopping 8k buffers on the stack left and
> right the way we do can chew up a bunch of pages... and there'll be little
> active data on the stack.  This is fine if you've got one process, but
> when every process (thread) needs a dozen extra pages because of this
> sloppiness... well, it adds up.

Good point. Hmm. *shrug*

-- Alexei Kosut <ak...@stanford.edu> <http://www.stanford.edu/~akosut/>
   Stanford University, Class of 2001 * Apache <http://www.apache.org> *

alloca (was Re: cvs commit: apache-1.3/src/modules/standard mod_log_referer.c)

Posted by Dean Gaudet <dg...@arctic.org>.

On Fri, 22 May 1998, Alexei Kosut wrote:

> My understanding was that one should never use alloca. Not if one expects
> code to work correctly, or at all.

Nah, it's safe, it's just not portable.

> One thing to note is that successful use of alloca() depends on the
> compiler intervening on alloca()'s behalf. I personally would not suspect
> that all compilers correctly support it. gcc does (actually, gcc supports
> alloca as a builtin, rather than calling the system library, so it
> probably works all right), but I wouldn't guarantee that every compiler
> does. And I'd be wary of Apache timeouts' use of longjmps and that
> interaction with alloca.

On traditional systems on which the stack just grows down (or up) from
some fixed point alloca() is trivial.  The main reason the compiler needs
to know about it is that it must set up a stack frame to use it.  In
x86 parlance, the stack frame setup/destruction looks like this:

    pushl %ebp
    movl %esp,%ebp
    subl #-20,%esp	! create a 20 byte stack frame
    ... rest of the function goes here, and references negative offsets
    ... from %ebp to access elements of the stack frame
    popl %ebp

In this context you can (almost) implement alloca just like this:

    subl #-nnnn,%esp
    movl %esp,%eax

Where nnnn is a 4-byte aligned size, and the resulting pointer is %eax.

But this doesn't deal with growing the stack, which you have to do in
page-sized increments (4k on x86).  So if the allocation is larger than
4095 bytes you need to "touch" each 4k page once to ensure the kernel
grows the stack.  (The stack has a "guard" page at the bottom, which
is not-present, so any read or write to it will fault, and the kernel
will then add one more page to the stack and move the guard page one
further down.)

gcc's use of the builtin allows it to handle a bunch of things easily.
It can figure out if the size arg is a constant, and if so it'll do the
rounding at compile time, and if it's smaller than a page-size allocation
it can skip the stack growing loop.  You could actually implement it with
an inline assembly function in gcc, but you wouldn't get those features.

> If we want temporary memory, why not just use malloc() and free()?
> Or a large defined buffer on the stack? That seems to work well for the
> rest of the world... And besides, using palloc() for a couple of bytes of
> memory that won't be freed for a bit is probably all right. Requests go
> quickly.

palloc() is actually not much more than the above normally... except that
it incurs a function call overhead.  On non-x86 architectures, it's
possible that alloca() will cause a function to end up with a stack
frame it wouldn't otherwise need -- but on x86 almost most functions have
stack frames because the x86 has so few registers the compiler needs
somewhere to spill temporaries.

Using a large buffer on the stack is about the same performance-wise as
alloca, except that it gives you fixed lengths for things, and tends to
make stacks that are far larger than they need to be.  I still haven't
done the analysis, but I need to figure out what stack depth we achieve
regularly in apache, because plopping 8k buffers on the stack left and
right the way we do can chew up a bunch of pages... and there'll be little
active data on the stack.  This is fine if you've got one process, but
when every process (thread) needs a dozen extra pages because of this
sloppiness... well, it adds up.

> And thanks to the nice Sun people, from the Solaris alloca(3C) man page:
> 
> "alloca() is machine-, compiler-, and most of all, system-dependent. Its
> use is strongly discouraged."

feh.  alloca() falls into a category where I say "if a system doesn't
support it, then folks shouldn't be building high-end webservers on it."
You can "emulate" alloca() to a high degree of accuracy for systems
that don't support it.  But, it's not something we should start using
now... for 2.0 I wouldn't be adverse to us starting to use it.

Dean

Re: cvs commit: apache-1.3/src/modules/standard mod_log_referer.c

Posted by Alexei Kosut <ak...@leland.Stanford.EDU>.

On Fri, 22 May 1998, Brian Behlendorf wrote:

> At 07:00 PM 5/22/98 +0100, Ben Laurie wrote:
> >Err? Surely alloca only hits the kernel if you have to grow the stack as
> >a result? And, of course, it gives the memory back much sooner than
> >pstrdup does.
> 
> Like I said, I don't know.  Others have pointed out man pages that say it's
> highly machine dependent and should be avoided in cross-platform code.  On
> the other hand we use it in a couple places in our core code, so it must
> work well enough.  None of the other modules use it however.  It'd be
> interesting to see benchmarks...

My understanding was that one should never use alloca. Not if one expects
code to work correctly, or at all.

There are a few uses of alloca() in http_main, but they are all in
Win32-only sections. Presumably, the alloca() in Windows has well-defined
and correct behavior. Or at least it doesn't break anything. I wouldn't
trust it normally (I don't trust it on Windows either, but it seems to
work).

One thing to note is that successful use of alloca() depends on the
compiler intervening on alloca()'s behalf. I personally would not suspect
that all compilers correctly support it. gcc does (actually, gcc supports
alloca as a builtin, rather than calling the system library, so it
probably works all right), but I wouldn't guarantee that every compiler
does. And I'd be wary of Apache timeouts' use of longjmps and that
interaction with alloca.

If we want temporary memory, why not just use malloc() and free()?
Or a large defined buffer on the stack? That seems to work well for the
rest of the world... And besides, using palloc() for a couple of bytes of
memory that won't be freed for a bit is probably all right. Requests go
quickly.

And thanks to the nice Sun people, from the Solaris alloca(3C) man page:

"alloca() is machine-, compiler-, and most of all, system-dependent. Its
use is strongly discouraged."

-- Alexei Kosut <ak...@stanford.edu> <http://www.stanford.edu/~akosut/>
   Stanford University, Class of 2001 * Apache <http://www.apache.org> *