You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "Jason A. Dour" <ja...@bcc.louisville.edu> on 1997/10/13 16:22:18 UTC

Apache Process Model (or Dean Makes My Brain Hurt!)

-----BEGIN PGP SIGNED MESSAGE-----

URGH!  Dean make Mongo brain hurt!  Mongo not want to think this much so
early on Monday morning.  ;)

On Sun, 12 Oct 1997, Dean Gaudet dropped these nuggets of wisdom:
> I don't know if anyone else realised it, but this is the basis of a much
> more secure apache.  i.e. an apache that needs root for almost nothing.
> Tell me if I've forgotten anything here:

	Wow.  Ummm, I think all this is a good step towards a revision of
the server process, but I do have one question...  Did you mean this to be
1.X or 2.X?  I'm hoping 2.X, since there were going to be major changes to
the architecture anyway (or there is likely to be, right?), and we could
work on issues such as the ones you've raised.

	The work looks interesting, and if we're doing major revisions to
the server process, I'm going to be extremely interested in such because
of suEXEC and how to *get rid* of it.  So I'm going be noisy for a change
and toss in my two bits worth even if I am less of an Apache Guru(tm) than
most of you other guys.  8)

	As you probably know, suEXEC is *not* the panacea of CGI security,
and I never intended it to be such from the first day I started coding on
its predecessors.  I still believe that such UID swapping should occur in
such a way that it is closely integrated into the server at some level.
Setting the current server process model as root is scary, so suEXEC was a
necessary evil.  However, if we're breaking the server into more secure
sections, CGI security can be brought into the fold and suEXEC can be
deprecated.

	Also, if we're careful with our new server process model, we can
provide *private logging* for those who want it.  I can't tell you how
many users I've had frustrated (me included) by having to grep private log
entries from dozens of megs of logs.  This is not an unachievable goal,
IMHO, to provide the following types of logging: System, VHost, and User.

	Thus, I like your "Apache Supervisor" idea, with a few changes to
suit my Special Interests.  8)  Just make sure I don't forget anything
necessary...OK?  8)


	ApacheSuper (root:root)
		* open Listen sockets
		* fork()
			- priv to ApacheLogger UID/GID
			- exec ApacheLogger
		* fork()
			- priv to UID/GID of serviced request
			- exec ApacheServer with proper args
		* ApacheSuper loop:
			- monitor for requests
			- monitor ApacheLogger
			- monitor ApacheServer children


	ApacheLogger (wwwlog:wwwlog)
		* open logging pipes
		* write log entries to SYSTEM log files


	ApacheServer (UID and GID varies with request)
		* do Request Magic(tm)  (GEE, I'M SO TECHNICAL!)
		* log to ApacheLogger pipe
		* log to VHOST/USER log if desired


	Now, problems I can see right off despite my lack of experience
with our current process model:

	* ApacheServer is still going to be a large binary, so running
		and exiting this is a Bad Thing.  Some kind of talk would
		need to be set up between ApacheSuper and ApacheServer
		processes that stick around.  Is our current method of
		process communication viable?
	* ApacheLogger has verification problems similar to the ver.
		problems we're presently having with suEXEC.
	* Are we still heading into a Threaded Future?  If so, then is
		all of this conjecture for aught?

I'm sure there are more, but my brain hurts.  I need to go do something
mindless, like installing a new NT server...

TTFN,
Jason
# Jason A. Dour <ja...@bcc.louisville.edu>                            1101
# Programmer Analyst II; Department of Radiation Oncology; Univ. of Lou.
# Finger for URLs, PGP public key, geek code, PJ Harvey info, et cetera.




-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBNEIunJo1JaC71RLxAQFI6wP9EUpppINl+0kETpRf2rGzF+kbnurvNLPO
1ucWN172+GrJfDIMd4XBmI8GU8zMRURVf28ivzN5vOMkoIZVzGqm/PTmhHEzPv64
fOVfba3kw8JyLxKDLhri20v0ro1fiOhS4He3yc8mbbcJJB1tmKYbn/cnlcEIOUdB
1cpZCG8Xak8=
=JYM4
-----END PGP SIGNATURE-----


Re: Apache Process Model (or Dean Makes My Brain Hurt!)

Posted by Dean Gaudet <dg...@arctic.org>.

On Thu, 16 Oct 1997, Jason A. Dour wrote:

> > My solution doesn't require the user to "jump through UNIX hoops" ... a
> > log of their own stuff appears somewhere that they can read it.  Now,
> > rotation is controlled by the server, is that a problem?  I can't see it
> > any other way though, since the server has to be told when to close its
> > filehandles.
> 
> 	I was also talking about ~user logs...but no big deal.

Oh ... this could be done too.  The logger just needs to know what fields
to split the log on.

Dean



Re: Apache Process Model (or Dean Makes My Brain Hurt!)

Posted by "Jason A. Dour" <ja...@bcc.louisville.edu>.
-----BEGIN PGP SIGNED MESSAGE-----

On Wed, 15 Oct 1997, Dean Gaudet wrote:
> No I understand what you're saying, but I probably haven't come out right
> and said that what you want is impossible ... unless you do what I
> suggested at the very bottom of my last post -- run a separate httpd
> parent/child-set bound to each ip address and use ip-vhosting.

	Ahhhh, OK.  If it's impossible, just say so.  8)

> My solution doesn't require the user to "jump through UNIX hoops" ... a
> log of their own stuff appears somewhere that they can read it.  Now,
> rotation is controlled by the server, is that a problem?  I can't see it
> any other way though, since the server has to be told when to close its
> filehandles.

	I was also talking about ~user logs...but no big deal.

Jason
# Jason A. Dour <ja...@bcc.louisville.edu>                            1101
# Programmer Analyst II; Department of Radiation Oncology; Univ. of Lou.
# Finger for URLs, PGP public key, geek code, PJ Harvey info, et cetera.

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBNEZTWpo1JaC71RLxAQETmwP9G+sQJBRxXmII/sxYtrJJnRTPKL5+942w
1lK6TSXAj5CsHZhmemrCWxsaRhsj2crjTExJY33vjjmb2XNV/gdYfqIDYmyK3k0G
hdQQr+SCy/K/1/bkcK9TDQ30Vd8OrGREP5esMy2vBfUOoWoYHa40XfIk01YCBrvG
PTBtyXyFPOI=
=tetB
-----END PGP SIGNATURE-----


Re: Apache Process Model (or Dean Makes My Brain Hurt!)

Posted by Dean Gaudet <dg...@arctic.org>.

On Wed, 15 Oct 1997, Jason A. Dour wrote:

> > I don't consider a thousand file handles per httpd child, or a thousand
> . . .
> > to aplogger(uid)... instead of the one extra that pipes require currently.
> 
> 	I don't think I've made my point on either the suEXEC or the
> logging issue...they're both related due to my UID/GID switching concept.

No I understand what you're saying, but I probably haven't come out right
and said that what you want is impossible ... unless you do what I
suggested at the very bottom of my last post -- run a separate httpd
parent/child-set bound to each ip address and use ip-vhosting.

(Hey Brian, if you're listening, this is another reason I dislike
name-vhosts :)

> Logging:
> 	A single logging daemon sounds fine for SYSTEM level logging. 
> This logging can be trusted, and can be used by admins for archiving,
> tracking, reporting, billing, etc.  It is solid, secure, and a Good Thing. 
> 
> 	I'm talking about adding support for *optional*
> non-centrally-controlled logging...such as a user wanting a private log of
> their served requests without having to jump through UNIX hoops to get it.

My solution doesn't require the user to "jump through UNIX hoops" ... a
log of their own stuff appears somewhere that they can read it.  Now,
rotation is controlled by the server, is that a problem?  I can't see it
any other way though, since the server has to be told when to close its
filehandles.

The hypothetical aplogger program reads lines that look like:

    www.foo.com|whatever-else

and writes those to:

    $ROOT/logs/www.foo.com/access_log

It keeps a cache of open log handles to www.foo.com/access_log files for
various foos.  It can open and close logs as it needs, because httplog
and only httplog has write permission in this hierarchy.  The foo user
has read-only access to $ROOT/logs/www.foo.com.

Such a logger is trivial... here's a slow one that doesn't do any
intelligent file handle caching:

    while(<>) {
	next unless ($vhost,$entry) = /^([a-zA-Z0-9.-]+)\|(.*)/;
	next unless open (LOG, ">>$vhost/access_log");
	print LOG $entry;
	close LOG;
    }

Note that it needs essentially no configuration ... as long as the parent
chdir()s to the right directory to begin with.  It doesn't need to know
what vhosts exist, because it assumes that an external script has already
set up all the pieces it needs.  Remember that you control the vhost name
in the log by the ServerName settings.

> This could be supported with an optional method in the httpd program that
> logged it AFTER it logged to the logging pipe.  Yes it would add overhead,
> but how much?  If the httpd program is running as the user and group of
> the target request, then there are no system accounts at risk, and it
> should be fairly easy to implement.  Generally, use of this option would
> be low, but I still have heard requests for it often...

Of course it adds no overhead if you've already got an apache
running as the target user.  But that alone has huge overhead, as I
mentioned... what I'm suggesting has the least overhead and I think
it has all the functionality you want, at least you haven't shown me
something it can't do yet.

> suEXEC:
> 	This is related to the above issue of private logging.  If the
> httpd server is running as the user and group of the target request, then
> all spawned children will already be the proper user and group.  There
> would be no need for suEXEC at the CGI/SSI level, if the user/group
> switching occurs before the request handling level.

I think the fundamental confusion here is that you can't have an httpd
running as the target user unless you plan to run a thousand copies
of apache, which share nothing with each other, and you plan to use
ip-vhosting for those thousand customers.

A TCP/IP socket can be bound to all addresses on a particular port, or
a single address:port, nothing in between.  A socket has to be served
by a homogenous set of processes (i.e. all running as the same uid),
because after an accept() there is no *efficient/portable* way to pass
the socket off to another process.  The process doing the accept() must
be either the target uid for that address, or it must be a uid that can
serve any address.

Apache can already implement these two extremes.  You can build an
entire box with a few hundred individual instantiations of apache
running with BindAddress on a specific ip address:port, and under
a specific user.  That gives the world you're looking for, but your
machine better be BIG.

Each socket, and each log file consume a file descriptor.  This is a
precious resource -- even on systems allowing you 16000 file descriptors
you are paying for them.  You pay on every fork(), you pay on select()s
(although ap_slack eliminates the select() cost generally) you pay all
over the place.  It's just not a good idea to have that many file
descriptors open.  It's much more efficient to have a dozen or so
open, and that's it ...

Or, you can buy a much smaller machine and you can run static requests
all through the same uid, and pay a little extra cpu when trying to run
dynamic requests as another uid.

Dean


Re: Apache Process Model (or Dean Makes My Brain Hurt!)

Posted by "Jason A. Dour" <ja...@bcc.louisville.edu>.
-----BEGIN PGP SIGNED MESSAGE-----

On Wed, 15 Oct 1997, Dean Gaudet wrote:
> The reason I suggested another mailing list is that we might attract
> non-apache developers who have an interest in the topic... because at this
> stage it's all design talk.

	Agreed.

> I don't consider a thousand file handles per httpd child, or a thousand
. . .
> to aplogger(uid)... instead of the one extra that pipes require currently.

	I don't think I've made my point on either the suEXEC or the
logging issue...they're both related due to my UID/GID switching concept.



Logging:
	A single logging daemon sounds fine for SYSTEM level logging. 
This logging can be trusted, and can be used by admins for archiving,
tracking, reporting, billing, etc.  It is solid, secure, and a Good Thing. 

	I'm talking about adding support for *optional*
non-centrally-controlled logging...such as a user wanting a private log of
their served requests without having to jump through UNIX hoops to get it.
This could be supported with an optional method in the httpd program that
logged it AFTER it logged to the logging pipe.  Yes it would add overhead,
but how much?  If the httpd program is running as the user and group of
the target request, then there are no system accounts at risk, and it
should be fairly easy to implement.  Generally, use of this option would
be low, but I still have heard requests for it often...

suEXEC:
	This is related to the above issue of private logging.  If the
httpd server is running as the user and group of the target request, then
all spawned children will already be the proper user and group.  There
would be no need for suEXEC at the CGI/SSI level, if the user/group
switching occurs before the request handling level.



	Am I being any clearer?  I'm so frazzled from overwork and
undersleep that I'm not sure if I'm presenting myself clearly...


> Remember there are two distinct things that require superuser privs:
> 
>     opening port 80
> 
>     spawning CGIs as a specific user

	See above...the spawning doesn't necessarily have to happen
alongside the UID/GID switch...the UID/GID switching I'm proposing is more
pervasive than just CGI/SSI handling... 

> One way of dealing with the performance issue is to make suexec a
. . .
> (Yup, this is like fastcgi.)

	I somewhat like this approach actually.  It's still not ideal,
IMHO, but I do like it.  I'll think some more on this...

Jason
# Jason A. Dour <ja...@bcc.louisville.edu>                            1101
# Programmer Analyst II; Department of Radiation Oncology; Univ. of Lou.
# Finger for URLs, PGP public key, geek code, PJ Harvey info, et cetera.

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBNEUzKZo1JaC71RLxAQGeJAP/Ybe/uYICcZeKNt2HJ63m5D+XfuKZAmiI
YI3+0f8CGNJPmBNp4Akpb1MdD8rnwwHmfQke715HKvyTADKrHW+qyerM1alScGDf
3IPz4haVPiuYwM9pMjg/H8KlWUFfvldfrpvDXCSuQG/i8Y1ptsEmG54blDH3omVV
HOAXTU1sQko=
=A3RR
-----END PGP SIGNATURE-----


Re: Apache Process Model (or Dean Makes My Brain Hurt!)

Posted by Dean Gaudet <dg...@arctic.org>.

On Tue, 14 Oct 1997, Jason A. Dour wrote:

> On Mon, 13 Oct 1997, Dean Gaudet wrote:
> > Oh it doesn't matter to me.  In fact it could be a separate development
> > effort which we merge in later.  I wouldn't mind a mailing list to discuss
> > high performance web server security issues, they're not as trivial as,
> > say, a secure mail system.  Or maybe they are, and I'm just too close to
> > the problem to see the solution.
> 
> 	I'd like to see this sort of thing discussed as well...even if it
> does have to be a separate mailing list...

The reason I suggested another mailing list is that we might attract
non-apache developers who have an interest in the topic... because at this
stage it's all design talk.

> 	Internally, with the UID/GID model of what I described, no one
> needs to have world access turned on.  Despite what many lay people think,
> the on-disk source for web content often differs from in-browser content,
> and sometimes that information could be sensitive -- say a corporation
> trying to protect its development investment by hiding its exact
> publishing gearworks.  With the current security model, this often means
> world read, meaning anyone on the local machine can read the files
> directly.  I don't find this preferable, and I find it hard to believe
> that it cannot be fixed...that's all.  As I admitted, this is one of my
> Special Interests.

Well, given that unix has exactly one group per file you're kind of
screwed trying to do this.  i.e. if a company is that concerned about
their data then they should buy a dedicated server.  You have the
following requirements:

    - no global access on the local machine
    - read-only access by the webserver
    - read-write access to a group of people responsible for the site

The last is my particular take on it.  If you're willing to say that
exactly one account can have read-write then the solution is trivial, and
the current apache does an OK job at it... at least on reasonable unixes.
Create a group httpd, and do this:

    mkdir /www/docroot
    chgrp httpd /www/docroot
    chmod g+s /www/docroot

Then tell the users to use umask 027.

I define "reasonable unixes" as those which do the following with g+s
directories:

    - all files underneath that directory are created with the group of
	the directory regardless of what the user's default group is,
	and regardless of what group the user is in
    - all newly created subdirectories inherit g+s

Linux has those properties, so do the various BSDs I think.  I'm not
sure about the SysVR4s.  These are the unixes on which you can hope to
use group permissions to do group work without permissions hassles.

> > We can already provide this.  A directory which is owned by httplog,
> > and group readable by a private group can contain a log file which is
> > readable only by httplog and users in the private group (i.e. the users
> > who want a private log).  Then the logger process, of which I intend
> > there to be only one, can log to a file in this directory.
> 
> 	And can this be reliably implemented across a thousand-user
> system?  Can you ensure that something (i.e.  FUU Error #1) won't happen
> to that logfile?  Supporting a system based upon very specific permissions
> would be a nightmare in my experience...somehow those special settings
> always get changed.  Instead, if the process logging the transaction was
> run as the target user, there's no need for special groups and
> permissions.  And such functionality would be *optional* not default,
> since the main logging would be sent through the single pipe to the single
> logger process. 

I don't consider a thousand file handles per httpd child, or a thousand
logging processes a workable solution.  I suppose you could have only a
single pipe to aplogger and aplogger could spawn the thousand children...
that'd require aplogger to be root.  I suppose it could also maintain a
"cache" of spawned children so that it'd only need 200 or 300 open at
a time.  But this means that log writing goes through two extra copies --
one copy from httpd to aplogger(root), and one copy from aplogger(root)
to aplogger(uid)... instead of the one extra that pipes require currently.

My solution with a single logger is far more feasible.  It needs a
rotation program which is clued enough to not screw up permissions.
Combine it with a script that creates a vhost properly (i.e. sets up
all the permissions and adds the templates to the various config files
and restarts the various daemons).  Then I don't really see the permission
problem being an issue.  Only root and httplog can write in those
directories.

Again I'll say that if folks want complete "security" from other folks
then they need to buy another machine.  It's all about cost -- and you
should make it clear to your customers that if they really need the
privacy then they'll have to pay for it.  I wouldn't pretend otherwise.

Oh yeah, and running the logger as the target user means the target
user can futz with their logs.  In my experience websites don't want
the target user to be able to futz with their logs, which is why I'm
going to extremes to make sure there's a protected set of logs which
the target users have read-only access to.

> >     apsuper (run as root)
> <big snippage...>
> 
> 	This is all good...except we still have two superuser programs --

Remember there are two distinct things that require superuser privs:

    opening port 80

    spawning CGIs as a specific user

Combining the two is asking for trouble... they're distinct.  It's trivial
to do the first, I wouldn't want to complicate it with the second -- and
the second is non-trivial.

The second requires the process doing it to have full root privileges,
which is why we can't do it within httpd children.  We don't want
httpd children to have full root privileges... because they're the most
complicated part of this entire picture.

One way of dealing with the performance issue is to make suexec a
service which httpd talks to via a unix domain socket.[1] You write
a suexec-server which opens a unix domain socket in stream mode, and
listens.  Then when it's time to do a suexec thing, open that socket,
write a brief preamble which tells suexec-server what to do, and what
environment to pass, and then proceed as you normally would with a
regular CGI request.  The suexec-server will fork/setuid and exec the CGI.
(Yup, this is like fastcgi.)

This doesn't introduce any extra byte copies ... since the CGI is
eventually talking directly with the httpd via the stream socket, rather
than the pipe() it'd normally use.

The end result is that you replace one exec and a handful of other calls
with a socket open and a context switch.  It's probably faster though, and
should be as secure as suexec is currently.

[1] To portably place access restrictions on a unix domain socket you
have to hide it in a subdirectory which is mode 700 or 770.  This is
because traditional BSD and SysV networking code ignores the privs on
a unix domain socket inode.

So the picture now becomes:

    apsuper (root)
      |
      +-- aplogger (httplog)    # handling request log
      |
      +-- aplogger (httplog)    # handling error log
      |
      +-- apsuexecd (root)	# handling suexec requests
      |
      +-- aphttpd (httpd)       # monitoring aphttpd children
            |
            +-- aphttpd         # serving requests
            |
            +-- aphttpd         # serving requests
            |
            :
            |
            +-- aphttpd         # serving requests
            |
            +-- aphttpd         # serving requests


> 	I once started looking at Qmail, but between work, apache, and
> Real Life, I couldn't find the time and energy to devote to it.  I know
> that at one point it involved like five or more UIDs, a seemingly complex
> process path, and a rabid debate over whether or not it was good.  I've
> since decided not to think one way or the other about the product until I
> can devote time to it...

It's good in my opinion.  There are 7 uids each handling a specific task,
some of which are in groups that don't own any files so that they have
the minimum privileges possible.  There are multiple executables, with
the privileged ones being very small/easy to verify.  All the privileged
interfaces are well defined.  qmail's model won't exactly work for a
webserver though because it is a forking model not unlike the first
webservers... which is way too slow.  But that's fine for email which
is typically done at rates less than 10 messages per second, rather than
the 100 requests per second that some of us see.

There is one more solution for a single machine with privacy between
users, and to cut down on the cost of suexec and so on.  And that is
to bind an httpd to each ip address individually and run a non-suexec
server on it as the specific userid.  I believe uunet does this, but
they probably don't try to put a thousand customers on a box using
this technique.  It's probably good for 50 or 60 customers.

Dean


Re: Apache Process Model (or Dean Makes My Brain Hurt!)

Posted by "Jason A. Dour" <ja...@bcc.louisville.edu>.
-----BEGIN PGP SIGNED MESSAGE-----

On Mon, 13 Oct 1997, Dean Gaudet wrote:
> Oh it doesn't matter to me.  In fact it could be a separate development
> effort which we merge in later.  I wouldn't mind a mailing list to discuss
> high performance web server security issues, they're not as trivial as,
> say, a secure mail system.  Or maybe they are, and I'm just too close to
> the problem to see the solution.

	I'd like to see this sort of thing discussed as well...even if it
does have to be a separate mailing list...

> I wasn't planning on getting rid of suexec ... in fact I was embracing the
> technique of multiple executables.

	I was trying to both limit the number of root processes, as well
as provide for a more secure system both internally and externally.

	Internally, with the UID/GID model of what I described, no one
needs to have world access turned on.  Despite what many lay people think,
the on-disk source for web content often differs from in-browser content,
and sometimes that information could be sensitive -- say a corporation
trying to protect its development investment by hiding its exact
publishing gearworks.  With the current security model, this often means
world read, meaning anyone on the local machine can read the files
directly.  I don't find this preferable, and I find it hard to believe
that it cannot be fixed...that's all.  As I admitted, this is one of my
Special Interests.

	Internally and Externally, there would be *one* superuser process
instead of two...and I would think two is not preferable to one in this
instance.

> We can already provide this.  A directory which is owned by httplog,
> and group readable by a private group can contain a log file which is
> readable only by httplog and users in the private group (i.e. the users
> who want a private log).  Then the logger process, of which I intend
> there to be only one, can log to a file in this directory.

	And can this be reliably implemented across a thousand-user
system?  Can you ensure that something (i.e.  FUU Error #1) won't happen
to that logfile?  Supporting a system based upon very specific permissions
would be a nightmare in my experience...somehow those special settings
always get changed.  Instead, if the process logging the transaction was
run as the target user, there's no need for special groups and
permissions.  And such functionality would be *optional* not default,
since the main logging would be sent through the single pipe to the single
logger process. 

> If you follow where I've been going with reliable piped logs you'll
> note that I'm advocating exactly one logger process, with exactly one
> pipe going to it.  The first field should be the vhost ServerName, the
> rest can be whatever you find interesting.  The logger process should
> open/close files as it needs them according to its file limit.

	I agree completely...  I just also believe that optional
user/vhost logging could be done aside from that...

> Your ApacheSuper is now more than what the httpd parent does ... it
> can't monitor children, because the process monitoring children has
> to be ready to spawn children.  And a process which we trust as secure
> can't be ready to spawn httpd children, because said process has to have
> already read the (httpd) config file... and doing that is insecure.

 	OK.  Understandable.  My mistake.

>     apsuper (run as root)
<big snippage...>

	This is all good...except we still have two superuser programs --
one of which is added *in-line* into most if not all CGI/SSI executions,
thus impacting the overall performance.  If the UID/GID switch were more
internal, not only would we get rid of this efficiency hog, but it would
also provide other security benefits as described above.  Is this goal of
mine impossible to achieve?  Admittedly, I'm not adept at some of the
stuff you're doing...I just want to make certain it is impossible before
giving up on the idea.

> BTW, it's worth looking at qmail's security model.  There's a lot that
> djb does which is good to at least be familiar with.  It all doesn't
> work for a web server, but most of it does.

	I once started looking at Qmail, but between work, apache, and
Real Life, I couldn't find the time and energy to devote to it.  I know
that at one point it involved like five or more UIDs, a seemingly complex
process path, and a rabid debate over whether or not it was good.  I've
since decided not to think one way or the other about the product until I
can devote time to it...

jason
# Jason A. Dour <ja...@bcc.louisville.edu>                            1101
# Programmer Analyst II; Department of Radiation Oncology; Univ. of Lou.
# Finger for URLs, PGP public key, geek code, PJ Harvey info, et cetera.


-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBNEOF2po1JaC71RLxAQHqrQP9F6FWs2LpoZzt0EeZ6ruWk/rARoJmfWYe
Qj4OqmppEdtFDhWmnjsD/JMYaWT30VymS7IbFQUWGXAZU/mVT2aU2e7QltJdRso+
C8xCSYiurnRnnPuKtAq4m28kFXZxclkn1zAuFUXrhEaauz/z9TKXa0f3nx/0y5sF
qnSdapfny2Y=
=ggRE
-----END PGP SIGNATURE-----


Re: Apache Process Model (or Dean Makes My Brain Hurt!)

Posted by Dean Gaudet <dg...@arctic.org>.
On Mon, 13 Oct 1997, Jason A. Dour wrote:

> On Sun, 12 Oct 1997, Dean Gaudet dropped these nuggets of wisdom:
> > I don't know if anyone else realised it, but this is the basis of a much
> > more secure apache.  i.e. an apache that needs root for almost nothing.
> > Tell me if I've forgotten anything here:
> 
> 	Wow.  Ummm, I think all this is a good step towards a revision of
> the server process, but I do have one question...  Did you mean this to be
> 1.X or 2.X?

Oh it doesn't matter to me.  In fact it could be a separate development
effort which we merge in later.  I wouldn't mind a mailing list to discuss
high performance web server security issues, they're not as trivial as,
say, a secure mail system.  Or maybe they are, and I'm just too close to
the problem to see the solution.

> 	The work looks interesting, and if we're doing major revisions to
> the server process, I'm going to be extremely interested in such because
> of suEXEC and how to *get rid* of it.  So I'm going be noisy for a change
> and toss in my two bits worth even if I am less of an Apache Guru(tm) than
> most of you other guys.  8)

I wasn't planning on getting rid of suexec ... in fact I was embracing the
technique of multiple executables.

> 	Also, if we're careful with our new server process model, we can
> provide *private logging* for those who want it.  I can't tell you how
> many users I've had frustrated (me included) by having to grep private log
> entries from dozens of megs of logs.  This is not an unachievable goal,
> IMHO, to provide the following types of logging: System, VHost, and User.

We can already provide this.  A directory which is owned by httplog,
and group readable by a private group can contain a log file which is
readable only by httplog and users in the private group (i.e. the users
who want a private log).  Then the logger process, of which I intend
there to be only one, can log to a file in this directory.

If you follow where I've been going with reliable piped logs you'll
note that I'm advocating exactly one logger process, with exactly one
pipe going to it.  The first field should be the vhost ServerName, the
rest can be whatever you find interesting.  The logger process should
open/close files as it needs them according to its file limit.

> 	Thus, I like your "Apache Supervisor" idea, with a few changes to
> suit my Special Interests.  8)  Just make sure I don't forget anything
> necessary...OK?  8)
> 
> 
> 	ApacheSuper (root:root)
> 		* open Listen sockets
> 		* fork()
> 			- priv to ApacheLogger UID/GID
> 			- exec ApacheLogger
> 		* fork()
> 			- priv to UID/GID of serviced request
> 			- exec ApacheServer with proper args
> 		* ApacheSuper loop:
> 			- monitor for requests
> 			- monitor ApacheLogger
> 			- monitor ApacheServer children

Your ApacheSuper is now more than what the httpd parent does ... it
can't monitor children, because the process monitoring children has
to be ready to spawn children.  And a process which we trust as secure
can't be ready to spawn httpd children, because said process has to have
already read the (httpd) config file... and doing that is insecure.

The parent httpd doesn't monitor requests either, that's the children's
job.  Doing it in the parent is how NCSA works ... i.e. slow.

Here is how my system works:

    apsuper (run as root)
	* detaches from shell
	* open Listen sockets
	* open request logging pipe
	* fork()
	    - change priv to httplog:httplog
	    - exec aplogger
	* write request aplogger child pid to a pid file
	* open error logging pipe
	* fork()
	    - change priv to httplog:httplog
	    - exec aplogger
	* write error aplogger child pid to a pid file
	* fork()
	    - change priv to httpd:httpd
	    - exec aphttpd
	* write aphttpd child pid to a pid file
	* loop:
	    - monitor logging children, replace as needed
	    - if aphttpd child exits then kill logging children
		and exit
		(an alternate scheme would have it respawn)

    aplogger (run as httplog:httplog)
	* read logging configuration
	* read logging pipe
	* write log entries to appropriate files
	* die cleanly whenever it receives SIGHUP/TERM
	    (this is how logs are rotated, it is respawned by
	    the apsuper anyhow)

    aphttpd (run as httpd:httpd)
	* read httpd configuration
	* spawn httpd children, which serve requests
	* monitor httpd children
	* handle HUP restarts which will actually behave like
	    USR1 do now... these are not used for log rotation,
	    they're only used for re-reading the configuration

Note that the only complicated program is aphttpd.  Both apsuper
and aplogger are trivial.  aphttpd uses suexec to spawn CGIs.
We have a process tree somewhat like this:

    apsuper
      |
      +-- aplogger		# handling request log
      |
      +-- aplogger		# handling error log
      |
      +-- aphttpd		# monitoring aphttpd children
	    |
	    +-- aphttpd		# serving requests
	    |
	    +-- aphttpd		# serving requests
	    |
	    :
	    |
	    +-- aphttpd		# serving requests
	    |
	    +-- aphttpd		# serving requests

BTW, it's worth looking at qmail's security model.  There's a lot that
djb does which is good to at least be familiar with.  It all doesn't
work for a web server, but most of it does.

Dean