You are viewing a plain text version of this content. The canonical link for it is here.
Posted to asp@perl.apache.org by Joshua Chamas <jo...@chamas.com> on 2001/11/11 23:50:48 UTC
[Fwd: RFC: Security/Performance Best Practices (long)]

Here's a posting from Philip to the mod_perl list that can be 
an inspirational read for many of us Apache::ASP users!

Philip, if you have a polished form of this doc sometime, I 
could bundle it in the Apache::ASP docs, perhaps under 
some section like a BEST PRACTICES, or SCENARIOS.  Actually
I think the below is pretty good as is, so let me know
if/when you want me to post it.

--Josh

-------- Original Message --------
Subject: RFC: Security/Performance Best Practices (long)
Date: Sun, 11 Nov 2001 05:48:19 -0500 (EST)
From: Philip Mak <pm...@aaanime.net>
To: <mo...@apache.org>

Recently, I've been using Apache::ASP to program a new version of an
existing website that gets over 5 million page views per month. This
website will have to fit on a RaQ4i (450MHz) server, so I'm pretty
conscious about performance. Security is also important due to the
popularity of the site.

I've read various documentation and combined them together into the
following strategy for security and performance on a mod_perl driven
website. I haven't seen these combined strategies formally written up
anywhere, so I thought I would try to do that and ask you guys for
suggestions. This is a bit unorganized right now, but all the general
concepts should be there. The goal is to produce a document that
explains all the principles, and shows all the configuration
directives required to accomplish this.

This website runs off a MySQL database. Although all the webpages are
generated dynamically, they don't change often (unless the webmaster
explicitly updates them).

I setup a lightweight frontend httpd (port 80) that proxies to a
heavyweight mod_perl backend httpd (port 8001). mod_gzip is installed
on the frontend to deliver compressed HTML pages for faster download
time. mod_proxy_add_forward is also installed so that the backend logs
the true IP address of the request in its logs.

In my account, I have these directories:

httpd: apachectl, httpd.conf, logs for the mod_perl httpd
perl: DocumentRoot for backend httpd
web: DocumentRoot for frontend httpd
global: contains *.pm, startup.pl, global.asa (for Apache::ASP)

The proxying is configured in the frontend httpd.conf as follows:

1: RewriteEngine On
2: RewriteRule ^/(.+)\.asp$ http://127.0.0.1:8001/$1.asp [L,P]
3: RewriteRule ^/(.+)\.pl$ http://127.0.0.1:8001/$1.pl [L,P]
4: RewriteCond /home/aw/perl%{REQUEST_URI}index.asp -f
5: RewriteRule ^(.*)/$ http://127.0.0.1:8001$1/ [L,P]

Line 2 passes any URL with a .asp extension to the backend.
Line 3 passes any URL with a .pl extension to the backend.
Line 4,5 passes any request for a directory to the backend, if there
is an index.asp file in that directory.

Notice that to the outside world, the hostname/port of the website is
exactly the same whether it's being served by the frontend or
backend. I prefer this approach since it lets my <img src> tags refer
to images in the same directory, for example. It also doesn't require
an extra DNS lookup on the client end (which it would if the mod_perl
server and non-mod_perl server were on different hostnames).

I don't have a ProxyPassReverse directive since I haven't thought
about it; I wouldn't need it anyway since I don't do any redirecting
(at least not right now), but I'll probably end up adding it just in
case.

The following users were created on the system:

aw: I login as this user. Group = aw, httpd
aw_guest: mod_perl httpd runs as this user. Group = aw
httpd: lightweight httpd runs as this user. Group = httpd

aw owns all of the files except httpd/logs.

The "web" directory is world readable. It only contains images that
everyone can get from the web server anyway.

The "httpd" and "global" directories are group readable, so only aw
and aw_guest can read it. "perl" is world readable, but the files
inside are only group readable (this allows the httpd user to tell
what files exist, but nothing more). This protects my source code
(and the database passwords they contain!) from being browsed by
others.

So that I won't accidentally create world readable files, I have this
line in ~/.profile for "aw":

umask 027

This creates files as rw-r----- by default. Files I upload by FTP
still default to mode rw-r--r--, but I only upload image files that
way (I use vi through ssh to edit the code) so that's perfect.

There is a level of isolation here; in case I write an insecure script
that gets hacked, the hacker will only gain access to the aw_guest
account. The aw_guest account can read all my site's files, but it
can't write to any of them. Also, the MySQL username/password used by
the website has read-only access to the database.

Apache::ASP is set so that every page has headers indicating that it
can be cached for up to one hour:

  $Response->AddHeader('Last-Modified', time2str(time));
  $Response->{CacheControl} = 'public';
  $Response->{Expires} = 3600;

I could have set the expiry time higher, but I decided to put it at
3600 so that in case I change content on the website and forget to
manually clear the cache, it won't be out of date by more than 1
hour. In terms of performance issues, 1 hour should be long enough
such that the backend httpd server doesn't have to do too much work.

In my frontend httpd server, I have a basic cache configuration:

ProxyRequests on
CacheRoot /home/httpd/cache
CacheSize 10000 # cache size of 10 MB
CacheGcInterval 1 # clean up the cache every hour
CacheMaxExpire 24 # nothing lives in the cache for > 24 hours
CacheDefaultExpire 1 # default expiry time is 1 hour

I can force the frontend httpd server to reload a specific page from
the backend by viewing it in my browser and clicking Reload (when
reloading, Opera and Netscape will send a cache-control header
specifying that fresh data is to be pulled); this is useful when I'm
tweaking a page and have to keep reloading it to see how it comes
out.

I also wrote a quick suid script (owned by httpd, mode -rwsr-xr-- so
that only users in group "httpd" can execute it) that does "rm -rf
/home/httpd/cache/*" to allow the "aw" user to clear the frontend
cache manually (which I might want to do if I change a bunch of
stuff).

[aw aw]$ cat /usr/local/bin/clear_cache.sh
#!/bin/bash

IFS=' '
PATH='/bin'
/bin/echo Clearing httpd cache...
/bin/rm -rf /home/httpd/cache/*
/bin/echo Cache cleared.

Since suid shell scripts are unsafe due to race conditions (and Linux
doesn't even allow them due to that reason), I needed to write a
wrapper C program:

#define SCRIPT "/usr/local/bin/clear_cache.sh"

main(ac, av)
     char **av;
{
  execv(SCRIPT, av);
}

That C program is the one that is set suid.

According to ab (ApacheBench) benchmarks: Without frontend caching,
the server can do about 20 requests per second. With frontend caching,
it goes as high as 400 (network bandwidth permitting).

In summary, the strategies I have employed here are:

- principle of minimal permissions and isolation
- using mod_proxy to proxy to mod_perl httpd, with caching
- mod_gzip to speed download times

The performance aspect of this site is still uncertain, as it hasn't
gone live yet (I think there's another 1-2 weeks before it goes
live). I think it's kind of a challenge to get it all to work on a
RaQ4i with this much traffic, but the bandwidth is cheapest on this
webhost and they only offer RaQ4is. ab (ApacheBenchmark) tests suggest
that the RaQ will be able to handle the load, though. The guys on the
RaQ forums often say that the RaQs can't handle this sort of load, but
I don't think they have concrete evidence to go on.

>>From working on this website, I've also learned some nice Apache::ASP
coding techniques that make things easier (namely Script_OnStart and
XMLSubs), but that's beyond the scope of this article.

---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org