You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Will Fould <wi...@gmail.com> on 2007/04/16 21:21:30 UTC

Growing Up

Hi,

I have a service that is currently running a basic LAMP stack with mod_perl
and life has been good!

The site running has been getting very busy and I've ordered a second
machine with intention to move the database off that machine and start the
growing up process.

I am looking for next steps to growing up from this machine.  Can somebody
recommend a good article, presentation or document that advocates various
strategies to growing up the current architecture (i.e. basic load
balancing, network topology, switches, etc. )?

I realize that milage will vary based on the particular service and demands.
Currently, the site does not deliver a lot of static content that can be
cached or cause huge I/O issues (i.e. images, media, huge pages, etc). Our
database is probably 95% read-only.

Thanks a lot

w

Re: Growing Up

Posted by Perrin Harkins <pe...@elem.com>.
On 4/16/07, Will Fould <wi...@gmail.com> wrote:
> I am looking for next steps to growing up from this machine.  Can somebody
> recommend a good article, presentation or document that advocates various
> strategies to growing up the current architecture (i.e. basic load
> balancing, network topology, switches, etc. )?

Have you read the book "Practical mod_perl"?  That's often a good
starting point.  http://modperlbook.org/

For an extreme case, you can read the LiveJournal story:
http://danga.com/words/2005_oscon/

You can also ready my story about eToys:
http://perl.apache.org/docs/tutorials/apps/scale_etoys/etoys.html

It sound like you're mostly looking for database scaling advice, so
you may want to check for presentations and papers related to your
database.  I know that the MySQL conference publishes lots of good
stuff every year.

- Perrin

Re: Growing Up

Posted by Jonathan Vanasco <mo...@2xlp.com>.
On Apr 17, 2007, at 3:55 AM, Clinton Gormley wrote:
>
> Must disagree with you about pound http://www.apsis.ch/pound/ 
> index_html
> being a PITA to configure and maintain.
>
> Pound is really easy to configure, fast as all hell, and just never  
> goes
> down.  I've been using it for about 3 years now and I've never ever  
> had
> a problem with it.

if its working for you, great ;)
I had some issues when I first tried it, then leaned to nginx which  
can handle proxy+loadbalancing and serving static content as well.


> Just a point of clarification, with reference to this email:
> http://marc.info/?l=apache-modperl&m=117595808501296&w=2
> (File Uploads using MP2 best practises):
>
> is it reasonable to serve your static files from a mod_perl server, as
> long as you have a proxy/pound/squid in front?
>
> My understanding is that the cost of using your mod_perl server to  
> serve
> static files is the amount of time that a slow request would tie them
> up.  However, if your requests are all fast, because your proxy  
> handles
> the slow part, then this ceases to be an issue.  Am I correct in this
> assumption?
>
> I have a bunch of mod_perl servers behind a single pound proxy (plus
> failover), and they share the uploaded images via NFS currently,
> although I'm considering moving to iSCSI with OCFS2 when I am  
> convinced
> of its stability.
>
> Any views on this?

That assumption sounds right -- so long as you have a caching proxy  
like squid.  Not all proxies cache ( i'm pretty sure that pound  
doesn't ).  Any content you can offload from mp should give your app  
a big boost -- the thing that 'kills' modperl performance is tying up  
the same apache child used for content generation with 45 .gifs/jpg/ 
pngs and a handful of css/js files.

If you're doing uploaded images over NFS though, chances are you have  
a lot of images -- which can make caching a bit of a nightmare as you  
try to balance the cache params.  so i'd strongly suggest using a  
lightweight server  (even vanilla apache would be an improvement).   
alternately, you could consider using amazon's s3 for mass storage  
with a CDN for distribution.  ( i'm constantly told that s3 has  
uptime/access issues -- your data is safe, but it might not be  
accessible for an  hour ). using a combo of the two gives you  
reliable storage and distro both for cheap.


// Jonathan Vanasco

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - -
| FindMeOn.com - The cure for Multiple Web Personality Disorder
| Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - -
| RoadSound.com - Tools For Bands, Stuff For Fans
| Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - -



Re: Growing Up

Posted by Jonathan Vanasco <jv...@2xlp.com>.
On Apr 18, 2007, at 12:36 PM, Perrin Harkins wrote:

> On 4/18/07, Denis Banovic <de...@ncm.at> wrote:
>> Is it possible to configure Perlbal so there is no single point of  
>> failure?
>
> That sort of high-availability setup is beyond the scope of an
> application-level load balancer like Perlbal.  You need to use
> something that allows for IP takeover.  The hardware solutions work,
> and there are many HA software solutions for Linux as well.  Some of
> them are noted in the mod_perl docs:
> http://perl.apache.org/download/ 
> third_party.html#High_Availability_and_Load_Balancing_Projects

To add to that software list: on BSD there's CARP
	http://en.wikipedia.org/wiki/Common_Address_Redundancy_Protocol

FWIW, A common setup i see is something along the lines of

	1. internet
	2. lan gateway : 2+ nodes running carp or virtual server
	3. load balancer : 2+ nodes ( sometimes running on the gateways )
	4. lan : db + application servers

often i see people using soekris boxes on the gateways   ( ever  
notice if you subscribe to bsd user-groups thats all people ever talk  
about )
http://www.soekris.com - they're little embedded boxes that run off  
compactflash cards.  it ends up costing ~$700 for 2 independent boxes.
they're not terribly fast , and don't support gigabit, but they hold  
up well if you're only doing the basic firewall/routing.  and unless  
your traffic is requiring gigabit connectivity, they're a decent  
option to delay running out and buying a couple of dedicated firewall  
boxes with gigabit to run as your gateway.


// Jonathan Vanasco

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
| SyndiClick.com
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
|      FindMeOn.com - The cure for Multiple Web Personality Disorder
|      Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
|      RoadSound.com - Tools For Bands, Stuff For Fans
|      Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -



Re: Growing Up

Posted by Perrin Harkins <pe...@elem.com>.
On 4/18/07, Denis Banovic <de...@ncm.at> wrote:
> Is it possible to configure Perlbal so there is no single point of failure?

That sort of high-availability setup is beyond the scope of an
application-level load balancer like Perlbal.  You need to use
something that allows for IP takeover.  The hardware solutions work,
and there are many HA software solutions for Linux as well.  Some of
them are noted in the mod_perl docs:
http://perl.apache.org/download/third_party.html#High_Availability_and_Load_Balancing_Projects

- Perrin

AW: Growing Up

Posted by Denis Banovic <de...@ncm.at>.
Hi!

Is it possible to configure Perlbal so there is no single point of failure?
I wanted to use perlbal as LB in front of few machines but decided to go with 2 LB's because they can work in redundant mode.

Denis

-----Ursprüngliche Nachricht-----
Von: Frank Wiles [mailto:frank@wiles.org] 
Gesendet: Dienstag, 17. April 2007 18:46
An: Perrin Harkins
Cc: Clinton Gormley; modperl@perl.apache.org
Betreff: Re: Growing Up

On Tue, 17 Apr 2007 10:48:57 -0400
"Perrin Harkins" <pe...@elem.com> wrote:

> On 4/17/07, Clinton Gormley <cl...@traveljury.com> wrote:
> > is it reasonable to serve your static files from a mod_perl server, 
> > as long as you have a proxy/pound/squid in front?
> 
> Yes, but spending no time in mod_perl for a static file is better than 
> spending a little time, and the files will be served faster if there's 
> no extra proxying step.  If you aren't having scaling problems, then 
> don't worry about it.

   Personally, I've fallen in love with Perlbal and it can serve up 
   static files from disk so that would be probably what I would do
   in this situation. 

 ---------------------------------
   Frank Wiles <fr...@wiles.org>
   http://www.wiles.org
 ---------------------------------


Re: Growing Up

Posted by Frank Wiles <fr...@wiles.org>.
On Tue, 17 Apr 2007 10:48:57 -0400
"Perrin Harkins" <pe...@elem.com> wrote:

> On 4/17/07, Clinton Gormley <cl...@traveljury.com> wrote:
> > is it reasonable to serve your static files from a mod_perl server,
> > as long as you have a proxy/pound/squid in front?
> 
> Yes, but spending no time in mod_perl for a static file is better than
> spending a little time, and the files will be served faster if there's
> no extra proxying step.  If you aren't having scaling problems, then
> don't worry about it.

   Personally, I've fallen in love with Perlbal and it can serve up 
   static files from disk so that would be probably what I would do
   in this situation. 

 ---------------------------------
   Frank Wiles <fr...@wiles.org>
   http://www.wiles.org
 ---------------------------------


Re: Growing Up

Posted by Perrin Harkins <pe...@elem.com>.
On 4/17/07, Clinton Gormley <cl...@traveljury.com> wrote:
> is it reasonable to serve your static files from a mod_perl server, as
> long as you have a proxy/pound/squid in front?

Yes, but spending no time in mod_perl for a static file is better than
spending a little time, and the files will be served faster if there's
no extra proxying step.  If you aren't having scaling problems, then
don't worry about it.

- Perrin

Re: Growing Up

Posted by Clinton Gormley <cl...@traveljury.com>.
> 	switch to a lightweight proxy + httpd on port 80.  i like nginx  
> because its had much fewer critical bugs than lighttpd.  others like  
> lighty.  either will be fine - they'll free up apache to deal with  
> content generation and you'll see a ginormous performance boost off  
> that .  you could use squid or pound for similar tasks, but they're a  
> PITA to configure and maintain

Must disagree with you about pound http://www.apsis.ch/pound/index_html
being a PITA to configure and maintain.

Pound is really easy to configure, fast as all hell, and just never goes
down.  I've been using it for about 3 years now and I've never ever had
a problem with it.

Just a point of clarification, with reference to this email:
http://marc.info/?l=apache-modperl&m=117595808501296&w=2
(File Uploads using MP2 best practises):

is it reasonable to serve your static files from a mod_perl server, as
long as you have a proxy/pound/squid in front?

My understanding is that the cost of using your mod_perl server to serve
static files is the amount of time that a slow request would tie them
up.  However, if your requests are all fast, because your proxy handles
the slow part, then this ceases to be an issue.  Am I correct in this
assumption?

I have a bunch of mod_perl servers behind a single pound proxy (plus
failover), and they share the uploaded images via NFS currently,
although I'm considering moving to iSCSI with OCFS2 when I am convinced
of its stability.

Any views on this?

thanks

Clint


Re: Growing Up

Posted by Jonathan Vanasco <jv...@2xlp.com>.
On Apr 16, 2007, at 3:21 PM, Will Fould wrote:

> Hi,
>
> I have a service that is currently running a basic LAMP stack with  
> mod_perl  and life has been good!
>
> The site running has been getting very busy and I've ordered a  
> second machine with intention to move the database off that machine  
> and start the growing up process.
>
> I am looking for next steps to growing up from this machine.  Can  
> somebody recommend a good article, presentation or document that  
> advocates various strategies to growing up the current architecture  
> (i.e. basic load balancing, network topology, switches, etc. )?
>
> I realize that milage will vary based on the particular service and  
> demands. Currently, the site does not deliver a lot of static  
> content that can be cached or cause huge I/O issues (i.e. images,  
> media, huge pages, etc). Our database is probably 95% read-only.
>
> Thanks a lot
>
> w


I don't have any articles or papers offhand, but I can say what I  
have been discussing with friends lately-- it seems like everyone is  
clustering their apps this month.

i'm in the process right now too -- scaling one of my apps from 1  
server to a 2node cluster with a 1TB mirror raid + 4gb ram postgres  
store on the back and modperl/nginx on the front.  i'm only running 8  
apache children on  the front, and bumped memcached up to 700mb.

	switch to a lightweight proxy + httpd on port 80.  i like nginx  
because its had much fewer critical bugs than lighttpd.  others like  
lighty.  either will be fine - they'll free up apache to deal with  
content generation and you'll see a ginormous performance boost off  
that .  you could use squid or pound for similar tasks, but they're a  
PITA to configure and maintain
	the #1 slowdown i've seen from apache is from using the same server  
to handle the perl/php/python interpreter as being used for  
transferring a static file.  every request not served by apache is  
more resources / memory for your app.
	
	check your db memory usage.  if its 95% read only , is it full of  
complex joins?  blocking operations?  if so, don't just consider  
offloading to a dedicated db machine, but also consider running a  
slave read-only version of  it to the local machine.

	apache sucks a ton of memory, but in comparison to what a db needs  
its nothing.  when you migrate to a clustered setup you'll have at  
least 1gb of 'extra' memory to use.  half of that can easily go to  
apache, but you'll see the law of diminishing returns weigh in after  
N httpd instances -- thats when you toss memory to memcached or a  
local replicant.

	profile your db requests: if you have a lot of repetitive queries,  
you could save a bunch of queries by using memcached

	profile your app design for db handle / connection suitability.  a   
lot of people program for a system with 1db connection that handles  
all read/write.  its usually fine on 1 box, but it doesn't work in a  
clustered system.

	question your db / schema.  if you're using a 'new'  feature in  
mysql, you might have a giant performance hit.  if you have a badly  
planned/indexed query on postgres you're looking at the difference  
between 100ms and 10minutes on a select.  also check for blocking  
operations.  if your db does 'select for share', you might get a  
performance boost.

	question your os.  some distros run apps better than others-- memory  
management / kqueue v select v poll / io / etc  .  i have friends who  
have been swapping distros like crazy over the past few months trying  
to squeeze a little more performance out.  its easier changing  
distros than add new servers to a cluster.

	you can hold off on any real networking until you're at a 3+  
cluster.  if you've got 2-3 machines, you can just handle everything  
with an extra NIC.  at 3-4 you'll want a lan.

this is generic info -- i use it with all my projects( 60% mp/pg ,  
20% php/pg , 20% python/pg ), and i have friends using similar stuff  
in php/mysql , erlang/pg , python/pg , rails/mysql


// Jonathan Vanasco

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
| SyndiClick.com
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
|      FindMeOn.com - The cure for Multiple Web Personality Disorder
|      Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
|      RoadSound.com - Tools For Bands, Stuff For Fans
|      Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -



Re: Growing Up

Posted by Frank Wiles <fr...@wiles.org>.
On Mon, 16 Apr 2007 12:21:30 -0700
"Will Fould" <wi...@gmail.com> wrote:
 
> I have a service that is currently running a basic LAMP stack with
> mod_perl and life has been good!
> 
> The site running has been getting very busy and I've ordered a second
> machine with intention to move the database off that machine and
> start the growing up process.
> 
> I am looking for next steps to growing up from this machine.  Can
> somebody recommend a good article, presentation or document that
> advocates various strategies to growing up the current architecture
> (i.e. basic load balancing, network topology, switches, etc. )?
> 
> I realize that milage will vary based on the particular service and
> demands. Currently, the site does not deliver a lot of static content
> that can be cached or cause huge I/O issues (i.e. images, media, huge
> pages, etc). Our database is probably 95% read-only.

   You say your DB is 95% read only, but you can't cache.  I assume
   you mean you can't cache entirely rendered HTML pages with something
   like squid.  But you *can* and should cache the database from the
   database with something like Cache::Memcached.  I'd normally 
   suggest Cache::FastMmap, but you've already indicated you're growing
   enough to need to be able to scale across multiple machines. 

   I think Ask Bjørn Hansen's presentation covers all of the
   recent generally accepted wisdom: 

   http://develooper.com/talks/Real-World-Scalability-Web-Builder-2006.pdf
   
 ---------------------------------
   Frank Wiles <fr...@wiles.org>
   http://www.wiles.org
 ---------------------------------


Re: Growing Up

Posted by Carl Johnstone <mo...@fadetoblack.me.uk>.
> There is a consideration, regarding using a proxy or a different server,
> that has not been brought up: If there is mod_perl based access control
> for the static files, then it's basically impossible not to go through a
> mod_perl server to serve them.

If you're access control is in mod_perl, you have to at least hit the 
mod_perl server to check whether access is allowed.

I've not used it myself, however Perlbal has a neat feature where it can 
"internally" redirect. So mod_perl can return a redirect to Perlbal, which 
will then go and retrieve the real file from your static server and send 
that to the client. Otherwise I'm not sure how complete a proxy solution 
Perlbal is but Live Journal is suppoed to be using it.

> In fact, I'm not sure what the effect would be in that scenario if a
> proxy was used: would it serve the static file regardless of the access
> control?, does it depend on the expiration data on the headers sent
> through the proxy when the acess controled static file was sent?

Proxies should inspect the Vary: header to see under what conditions it can 
serve the same content. So if you're using Cookies for authentication, you 
should have 'Cookie' in your Vary header. It will then only re-serve the 
same content should it receive the same Vary header.

Compared with setting the content to be no-cache or immediately expired this 
has the advantage that if the client re-requests the same resource it can be 
served from proxy cache rather than hitting the end servers again.

Carl


Re: Growing Up

Posted by Perrin Harkins <pe...@elem.com>.
On 4/17/07, Rafael Caceres <rc...@aasa.com.pe> wrote:
> There is a consideration, regarding using a proxy or a different server,
> that has not been brought up: If there is mod_perl based access control
> for the static files, then it's basically impossible not to go through a
> mod_perl server to serve them.

I use mod_auth_tkt.  You issue a cookie with credentials, and the C
module can use it to check access rights on static files from the
proxy server.  You have to run apache as your proxy server, but I
prefer that anyway.

> In fact, I'm not sure what the effect would be in that scenario if a
> proxy was used: would it serve the static file regardless of the access
> control?

No, it would talk to mod_perl every time and not do any caching,
unless you have a mis-configured proxy.

- Perrin

Re: Growing Up

Posted by Rafael Caceres <rc...@aasa.com.pe>.
On Mon, 2007-04-16 at 12:21 -0700, Will Fould wrote:
> Hi,
> 
> I have a service that is currently running a basic LAMP stack with
> mod_perl  and life has been good!
> 
> The site running has been getting very busy and I've ordered a second
> machine with intention to move the database off that machine and start
> the growing up process. 
> 
> I am looking for next steps to growing up from this machine.  Can
> somebody recommend a good article, presentation or document that
> advocates various strategies to growing up the current architecture
> (i.e. basic load balancing, network topology, switches, etc. )?  
> 
> I realize that milage will vary based on the particular service and
> demands. Currently, the site does not deliver a lot of static content
> that can be cached or cause huge I/O issues (i.e. images, media, huge
> pages, etc). Our database is probably 95% read-only. 
> 
> Thanks a lot

There is a consideration, regarding using a proxy or a different server,
that has not been brought up: If there is mod_perl based access control
for the static files, then it's basically impossible not to go through a
mod_perl server to serve them.
In fact, I'm not sure what the effect would be in that scenario if a
proxy was used: would it serve the static file regardless of the access
control?, does it depend on the expiration data on the headers sent
through the proxy when the acess controled static file was sent?

Rafael Caceres 


Analizado por ThMailServer para Linux.