You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Philip Mak <pm...@aaanime.net> on 2001/11/11 11:48:19 UTC

RFC: Security/Performance Best Practices (long)

Recently, I've been using Apache::ASP to program a new version of an
existing website that gets over 5 million page views per month. This
website will have to fit on a RaQ4i (450MHz) server, so I'm pretty
conscious about performance. Security is also important due to the
popularity of the site.

I've read various documentation and combined them together into the
following strategy for security and performance on a mod_perl driven
website. I haven't seen these combined strategies formally written up
anywhere, so I thought I would try to do that and ask you guys for
suggestions. This is a bit unorganized right now, but all the general
concepts should be there. The goal is to produce a document that
explains all the principles, and shows all the configuration
directives required to accomplish this.

This website runs off a MySQL database. Although all the webpages are
generated dynamically, they don't change often (unless the webmaster
explicitly updates them).

I setup a lightweight frontend httpd (port 80) that proxies to a
heavyweight mod_perl backend httpd (port 8001). mod_gzip is installed
on the frontend to deliver compressed HTML pages for faster download
time. mod_proxy_add_forward is also installed so that the backend logs
the true IP address of the request in its logs.

In my account, I have these directories:

httpd: apachectl, httpd.conf, logs for the mod_perl httpd
perl: DocumentRoot for backend httpd
web: DocumentRoot for frontend httpd
global: contains *.pm, startup.pl, global.asa (for Apache::ASP)

The proxying is configured in the frontend httpd.conf as follows:

1: RewriteEngine On
2: RewriteRule ^/(.+)\.asp$ http://127.0.0.1:8001/$1.asp [L,P]
3: RewriteRule ^/(.+)\.pl$ http://127.0.0.1:8001/$1.pl [L,P]
4: RewriteCond /home/aw/perl%{REQUEST_URI}index.asp -f
5: RewriteRule ^(.*)/$ http://127.0.0.1:8001$1/ [L,P]

Line 2 passes any URL with a .asp extension to the backend.
Line 3 passes any URL with a .pl extension to the backend.
Line 4,5 passes any request for a directory to the backend, if there
is an index.asp file in that directory.

Notice that to the outside world, the hostname/port of the website is
exactly the same whether it's being served by the frontend or
backend. I prefer this approach since it lets my <img src> tags refer
to images in the same directory, for example. It also doesn't require
an extra DNS lookup on the client end (which it would if the mod_perl
server and non-mod_perl server were on different hostnames).

I don't have a ProxyPassReverse directive since I haven't thought
about it; I wouldn't need it anyway since I don't do any redirecting
(at least not right now), but I'll probably end up adding it just in
case.

The following users were created on the system:

aw: I login as this user. Group = aw, httpd
aw_guest: mod_perl httpd runs as this user. Group = aw
httpd: lightweight httpd runs as this user. Group = httpd

aw owns all of the files except httpd/logs.

The "web" directory is world readable. It only contains images that
everyone can get from the web server anyway.

The "httpd" and "global" directories are group readable, so only aw
and aw_guest can read it. "perl" is world readable, but the files
inside are only group readable (this allows the httpd user to tell
what files exist, but nothing more). This protects my source code
(and the database passwords they contain!) from being browsed by
others.

So that I won't accidentally create world readable files, I have this
line in ~/.profile for "aw":

umask 027

This creates files as rw-r----- by default. Files I upload by FTP
still default to mode rw-r--r--, but I only upload image files that
way (I use vi through ssh to edit the code) so that's perfect.

There is a level of isolation here; in case I write an insecure script
that gets hacked, the hacker will only gain access to the aw_guest
account. The aw_guest account can read all my site's files, but it
can't write to any of them. Also, the MySQL username/password used by
the website has read-only access to the database.

Apache::ASP is set so that every page has headers indicating that it
can be cached for up to one hour:

  $Response->AddHeader('Last-Modified', time2str(time));
  $Response->{CacheControl} = 'public';
  $Response->{Expires} = 3600;

I could have set the expiry time higher, but I decided to put it at
3600 so that in case I change content on the website and forget to
manually clear the cache, it won't be out of date by more than 1
hour. In terms of performance issues, 1 hour should be long enough
such that the backend httpd server doesn't have to do too much work.

In my frontend httpd server, I have a basic cache configuration:

ProxyRequests on
CacheRoot /home/httpd/cache
CacheSize 10000 # cache size of 10 MB
CacheGcInterval 1 # clean up the cache every hour
CacheMaxExpire 24 # nothing lives in the cache for > 24 hours
CacheDefaultExpire 1 # default expiry time is 1 hour

I can force the frontend httpd server to reload a specific page from
the backend by viewing it in my browser and clicking Reload (when
reloading, Opera and Netscape will send a cache-control header
specifying that fresh data is to be pulled); this is useful when I'm
tweaking a page and have to keep reloading it to see how it comes
out.

I also wrote a quick suid script (owned by httpd, mode -rwsr-xr-- so
that only users in group "httpd" can execute it) that does "rm -rf
/home/httpd/cache/*" to allow the "aw" user to clear the frontend
cache manually (which I might want to do if I change a bunch of
stuff).

[aw aw]$ cat /usr/local/bin/clear_cache.sh
#!/bin/bash

IFS=' '
PATH='/bin'
/bin/echo Clearing httpd cache...
/bin/rm -rf /home/httpd/cache/*
/bin/echo Cache cleared.

Since suid shell scripts are unsafe due to race conditions (and Linux
doesn't even allow them due to that reason), I needed to write a
wrapper C program:

#define SCRIPT "/usr/local/bin/clear_cache.sh"

main(ac, av)
     char **av;
{
  execv(SCRIPT, av);
}

That C program is the one that is set suid.

According to ab (ApacheBench) benchmarks: Without frontend caching,
the server can do about 20 requests per second. With frontend caching,
it goes as high as 400 (network bandwidth permitting).

In summary, the strategies I have employed here are:

- principle of minimal permissions and isolation
- using mod_proxy to proxy to mod_perl httpd, with caching
- mod_gzip to speed download times

The performance aspect of this site is still uncertain, as it hasn't
gone live yet (I think there's another 1-2 weeks before it goes
live). I think it's kind of a challenge to get it all to work on a
RaQ4i with this much traffic, but the bandwidth is cheapest on this
webhost and they only offer RaQ4is. ab (ApacheBenchmark) tests suggest
that the RaQ will be able to handle the load, though. The guys on the
RaQ forums often say that the RaQs can't handle this sort of load, but
I don't think they have concrete evidence to go on.

>From working on this website, I've also learned some nice Apache::ASP
coding techniques that make things easier (namely Script_OnStart and
XMLSubs), but that's beyond the scope of this article.



Re: RFC: Security/Performance Best Practices (long)

Posted by Alessio Bragadini <al...@sevenseas.org>.
Philip Mak:

> This website runs off a MySQL database. Although all the webpages
> are generated dynamically, they don't change often (unless the
> webmaster explicitly updates them).

Do you generate reliable Last-Modified and Expires headers? This could
help bandwidth usage for you and your users.

-- 
Alessio F. Bragadini            alessio@sevenseas.org

Re: RFC: Security/Performance Best Practices (long)

Posted by Stas Bekman <st...@stason.org>.
Philip Mak wrote:

> Recently, I've been using Apache::ASP to program a new version of an
> existing website that gets over 5 million page views per month. This
> website will have to fit on a RaQ4i (450MHz) server, so I'm pretty
> conscious about performance. Security is also important due to the
> popularity of the site.
> 
> I've read various documentation and combined them together into the
> following strategy for security and performance on a mod_perl driven
> website. I haven't seen these combined strategies formally written up
> anywhere, so I thought I would try to do that and ask you guys for
> suggestions. This is a bit unorganized right now, but all the general
> concepts should be there. The goal is to produce a document that
> explains all the principles, and shows all the configuration
> directives required to accomplish this.

I'm afraid there is no such a thing as a single detailed performance 
scenario fitting all users of mod_perl, I'd even say even a big chunk of 
users. Each user has a different hw, different requirements, different 
amount of $$, in-house knowledge and what not. So IMHO it'd be a mistake 
to even try to write an ultimate performance document. Hmm, may be such 
a thing could exist in winXX world, I don't know.

You're heartly welcome to help to improve the existing documentation, 
which already sports I think about 200 printed pages of various 
scenarious, tips and tricks which comprise a "Lego"-like set for 
everybody to learn from and build their own bridge, their own monster 
and their own crane, using common components.

Please don't create another documentation fork, unless you really have 
to, since it's so much easier for people to read all the docs from one 
place, rather than jumping between many scattered docs and of course 
avoid efforts duplication.

Quite a few people have complained to me that while the mod_perl guide 
includes a lot of performance improvement techniques, it doesn't help 
newbies to make the right choice in first place. So they have to try a 
few of them first, before making the right choice. So if that's what you 
think is missing you are welcome to try to fill the gap, but personally 
I doubt it's possible for the reasons I've mentioned above and many 
other reasons.

I just want to repeat that any attempt to improve the existing 
documentation and add new docs is very welcome and the 2.0 documentation 
project is going be the next cool thing if enough people will get 
involved and give help. You can check the modperl-docs cvs repository to 
see what's there so far.

_____________________________________________________________________
Stas Bekman             JAm_pH      --   Just Another mod_perl Hacker
http://stason.org/      mod_perl Guide   http://perl.apache.org/guide
mailto:stas@stason.org  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/