You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by md <my...@yahoo.com> on 2002/06/25 04:22:06 UTC

when to mod_perl?

Hello,

I'm working on a dynamic site that I originally
thought I would do with mod_perl. Now after reviewing
the requirements and available hardware, I wonder if
mod_perl will be my best solution.

The machine will not be a huge box (though I wasn't
provided much in the way of specs) and will only have
256M of RAM. There will be static content, incuding
5-10 images per page. The client has only given me
sparse information, but claimed that he currently had
4,000 unique visitors a day and wanted to move to
10,000-15,000 unique visitors per day (he didn't give
me page view stats). I may or may not be able to set
up multiple instances of Apache.

Given limited hardware (esp. RAM), am I better off to
go with mod_perl (larger Apache processes with limited
RAM) or CGI (smaller apache processes but the usual
cons)?

Thanks...

__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

Re: when to mod_perl?

Posted by md <my...@yahoo.com>.
--- Cees Hek <ce...@sitesuite.com.au> wrote:
 
> I would build your application using plain old CGI,
> following the guidlines that
> mod_perl provides for running CGI applications under
> the Apache::Registry
> module.  If you properly analyse your application,
> and build small tight CGI
> scripts, then when the load goes up, you can pick
> and choose the heaviest hit
> scripts and run them under Apache::Registry for the
> performance boost.

Thanks...that sounds reasonable. I doubt that all the
dynamic pieces of this site would really require
mod_perl. To answer another's reply, this will run on
either Linux or BSD.
 
> Also, if the load goes really high, you can ask for
> more hardware, and run the
> entire site under Apache::Registry without any code
> changes.

Upgrading hardware once the load gets high was
discussed...This would make the migration easier, as I
have told the client that we may start with CGI then
move to mod_perl later. I've never used
Apache::Registry before, but this sounds like a good
solution.
 
> I would recommend taking a look at CGI::Application.
>  It provides a very clean
> framework for building CGI programs, and by using
> it, you will avoid most if not
> all of the pitfalls that most CGI programs have that
> require them to be recoded,
> or cleaned up for use with Apache::Registry.

I normally use CGI::Application, but in this case I'll
also need something like CGI::Session as well, not to
mention either Template-Toolkit or HTML::Template. 

Are there any "gotchas" with CGI::Session and
Apache::Registry? And yes, I'll read The Guide :)

> Good luck...

Thanks for the help!



__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

Re: when to mod_perl?

Posted by Cees Hek <ce...@sitesuite.com.au>.
Quoting md <my...@yahoo.com>:

> Hello,
> 
> I'm working on a dynamic site that I originally
> thought I would do with mod_perl. Now after reviewing
> the requirements and available hardware, I wonder if
> mod_perl will be my best solution.
> 
> The machine will not be a huge box (though I wasn't
> provided much in the way of specs) and will only have
> 256M of RAM. There will be static content, incuding
> 5-10 images per page. The client has only given me
> sparse information, but claimed that he currently had
> 4,000 unique visitors a day and wanted to move to
> 10,000-15,000 unique visitors per day (he didn't give
> me page view stats). I may or may not be able to set
> up multiple instances of Apache.
> 
> Given limited hardware (esp. RAM), am I better off to
> go with mod_perl (larger Apache processes with limited
> RAM) or CGI (smaller apache processes but the usual
> cons)?

You can easily build an application that uses the best of both worlds.  The
biggest benefit of mod_perl is speed, but you don't have to tie yourself tightly
to mod_perl to get that benefit.  

I would build your application using plain old CGI, following the guidlines that
mod_perl provides for running CGI applications under the Apache::Registry
module.  If you properly analyse your application, and build small tight CGI
scripts, then when the load goes up, you can pick and choose the heaviest hit
scripts and run them under Apache::Registry for the performance boost.

Also, if the load goes really high, you can ask for more hardware, and run the
entire site under Apache::Registry without any code changes.

I would recommend taking a look at CGI::Application.  It provides a very clean
framework for building CGI programs, and by using it, you will avoid most if not
all of the pitfalls that most CGI programs have that require them to be recoded,
or cleaned up for use with Apache::Registry.

Good luck...

Cees

Re: when to mod_perl?

Posted by Perrin Harkins <pe...@elem.com>.
Peter Bi wrote:
> I have a question regarding to the cached files. Although the maximal period
> is set to be 24 hours in httpd.conf's proxy settings, many of the files,
> which were cached from the backend mod_perl dynamical program, are strangely
> modified every a few minutes. For all the files I checked so far, they do
> look to be modified because the hex strings on top of the files (such as
> 000000003D189FC2) are different after each modifications.
> 
> Forgive me if this is off-topic: it is more likely a mod_proxy question. I
> searched, but could not find related information pages to read.

I'm afraid it is off-topic.  You probably don't really need to worry 
about it; it's just part of how mod_proxy is implemented, most likely 
related to garbage collection.  You can look at the mod_proxy source or 
ask about it on the general httpd list.

- Perrin


Re: when to mod_perl?

Posted by Peter Bi <mo...@att.net>.
----- Original Message -----
From: "Randal L. Schwartz" <me...@stonehenge.com>
To: "Peter Bi" <mo...@att.net>
Cc: "Perrin Harkins" <pe...@elem.com>; "md" <my...@yahoo.com>;
"Stas Bekman" <st...@stason.org>; <mo...@perl.apache.org>
Sent: Tuesday, June 25, 2002 10:18 AM
Subject: Re: when to mod_perl?


> >>>>> "Peter" == Peter Bi <mo...@att.net> writes:
>
> Peter> I have a question regarding to the cached files. Although the
> Peter> maximal period is set to be 24 hours in httpd.conf's proxy
> Peter> settings, many of the files, which were cached from the backend
> Peter> mod_perl dynamical program, are strangely modified every a few
> Peter> minutes. For all the files I checked so far, they do look to be
> Peter> modified because the hex strings on top of the files (such as
> Peter> 000000003D189FC2) are different after each modifications.
>
> If you're talking about www.stonehenge.com, I don't provide
> last-modified for any of the HTML pages: they're all dynamic.  If the
> proxy server is caching them, it's going to still punch through to the
> back for each hit.

That is one of our sites.

>
> Similarly, if you are talking about your own site, and you *do*
> provide a mostly useless "last modified" time, then the front end is
> still going to go to the back end and say "I've got a version from
> time $x, is that current?" and if you're not handling
> "if-modified-since", then every hit will be cached, uselessly.
>

I used:
$r->update_mtime($id); # id is less than the current time and does not
change for a specific page
$r->set_last_modified;
if ($r->protocol =~ /(\d\.\d)/ && $1 >= 1.1){
      $r->header_out('Cache-Control' => "max-age=" . 100*24*3600);
} else {
      $r->header_out('Expires' => HTTP::Date::time2str($id + 100*24*3600));
}

It would not be surprising if none of the dynamic pages created was cached,
which then meant I had improper headers in mod_perl. In fact, they do serve
a number of views (maybe several tens) before modifying in the proxy
directory again. For example, I checked a file status:
$last access time: Tue Jun 25 11:44:12 2002
$last modify time: Tue Jun 25 11:40:52 2002
and for the same file later:
$last access time: Tue Jun 25 11:51:14 2002
$last modify time: Tue Jun 25 11:44:54 2002
so they were modified but not for every hits.

> I avoid that on stonehenge by not providing last-modified for any of
> my HTML pages.  mod_proxy thus has no idea about caching, so it's all
> dynamic.  My images automatically have last-modified, and thus the
> cache can check for updates with if-modified-since, using the cache
> when needed.  If I was really smart, I'd use mod_expires to say "this
> image is good for $N hours", and then the front end wouldn't even
> touch the back end at all.
>

But if one makes a proper header, the proxy would not distinquish whether it
is static or dynamic. It should deliver or cache all the backend pages the
same way, providing the headers are right.

Here is another strange clue for me. The cached files have three extra
request headers "X-Forwarded-For:", "X-Host: ",  "X-Server-Hostname: " (from
mod_proxy_forward). While the files are modified continuously, the
"X-Forwarded-For" header, which record a browser's IP,  does NOT change
although the later hits come from completely different IPs.


> As I said, as long as my loadav is low enough for my current hits, I've
> got better things to work on. :)
>
> --
> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777
0095
> <me...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
> See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl
training!
>


Peter Bi


Re: when to mod_perl?

Posted by "Randal L. Schwartz" <me...@stonehenge.com>.
>>>>> "Peter" == Peter Bi <mo...@att.net> writes:

Peter> I have a question regarding to the cached files. Although the
Peter> maximal period is set to be 24 hours in httpd.conf's proxy
Peter> settings, many of the files, which were cached from the backend
Peter> mod_perl dynamical program, are strangely modified every a few
Peter> minutes. For all the files I checked so far, they do look to be
Peter> modified because the hex strings on top of the files (such as
Peter> 000000003D189FC2) are different after each modifications.

If you're talking about www.stonehenge.com, I don't provide
last-modified for any of the HTML pages: they're all dynamic.  If the
proxy server is caching them, it's going to still punch through to the
back for each hit.

Similarly, if you are talking about your own site, and you *do*
provide a mostly useless "last modified" time, then the front end is
still going to go to the back end and say "I've got a version from
time $x, is that current?" and if you're not handling
"if-modified-since", then every hit will be cached, uselessly.

I avoid that on stonehenge by not providing last-modified for any of
my HTML pages.  mod_proxy thus has no idea about caching, so it's all
dynamic.  My images automatically have last-modified, and thus the
cache can check for updates with if-modified-since, using the cache
when needed.  If I was really smart, I'd use mod_expires to say "this
image is good for $N hours", and then the front end wouldn't even
touch the back end at all.

As I said, as long as my loadav is low enough for my current hits, I've
got better things to work on. :)

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<me...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Re: Apache::Session - What goes in session?

Posted by Tony Bowden <to...@kasei.com>.
On Thu, Aug 22, 2002 at 01:20:37PM -0700, md wrote:
> > Don't worry about whether it *seems* efficient. Do it right, and then
> > worry about how to speed that up - if, and only if, it's too slow.
> > Premature optimisation is the root of all evil, and
> > all that ..

> Thanks for that tidbit.
> One question though...the only thing in the cookie is
> the sesison id. I'm getting the user id when the user
> logs in and putting that in the session. Would you
> pull the user id from the db everytime too instead of
> putting it in the session? I'm leaning towards taking
> it out.

Personally I'd hash the user id into the session id and then extract it
programatically. But pulling it from the database is fine too.

Tony

Re: Apache::Session - What goes in session?

Posted by md <my...@yahoo.com>.
--- Tony Bowden <to...@kasei.com> wrote:

> Don't worry about whether it *seems* efficient. Do
> it right, and then
> worry about how to speed that up - if, and only if,
> it's too slow.
> 
> Premature optimisation is the root of all evil, and
> all that ..


Thanks for that tidbit.

I removed almost everything from the sesison and I'm
now pulling that info from the DB with no noticable
difference.

I think a can eliminate a few db calls by placing a
few things in hidden fields or the query string. 

One question though...the only thing in the cookie is
the sesison id. I'm getting the user id when the user
logs in and putting that in the session. Would you
pull the user id from the db everytime too instead of
putting it in the session? I'm leaning towards taking
it out.



__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

Re: Apache::Session - What goes in session?

Posted by Perrin Harkins <pe...@elem.com>.
Peter J. Schoenster wrote:
> If I'm using Apache::DBI so I have a persistent connection to MySQL, 
> would it not be faster to simply use a table in MySQL?

Probably not, if the MySQL server is on a separate machine.  If it's on 
the same machine, it would be close.  Remember, MySQL has more work to 
do (parse SQL statement, make query plan, etc.) than a simple hash-based 
system like BerkeleyDB does.  Best thing would be to benchmark it though.

- Perrin


RE: Apache::Session - What goes in session?

Posted by Jesse Erlbaum <je...@erlbaum.net>.
Hi Peter --

> > The morale of the story:  Flat files rock!  ;-)
>
> If I'm using Apache::DBI so I have a persistent connection to MySQL,
> would it not be faster to simply use a table in MySQL?


Unlikely.  Even with cached database connections you are probably not going
to beat the performance of going to a flat text file.  Accessing files is
something the OS is optimized to do.  The process of issuing a SQL query,
having it parsed and retrieving results is probably more time-consuming than
you think.

One way to think about it is this:  MySQL stores its data in files.  There
are many layers of code between DBI and those files, each of which add
processing time.  Going directly to files is far less code, and less code is
most often faster code.

The best way to be cure is to benchmark the difference yourself.  Try out
the Benchmark module.  Quantitative data trumps anecdotal data every time.


Warmest regards,

-Jesse-


--

  Jesse Erlbaum
  The Erlbaum Group
  jesse@erlbaum.net
  Phone: 212-684-6161
  Fax: 212-684-6226




Re: Apache::Session - What goes in session?

Posted by "Peter J. Schoenster" <pe...@schoenster.com>.
On 21 Aug 2002 at 2:09, Ask Bjoern Hansen wrote:

> Now using good old Fcntl to control access to simple "flat files".
> (Data serialized with pack("N*", ...); I don't think anything beats
> "pack" and "unpack" for serializing data).
> 
> The expiration went into the data and purging the cache was a simple
> cronjob to find files older than a few minutes and deleting them.
> 
> The performance?  I don't remember the exact figure, but it was at
> least several times faster than the BerkeleyDB system.  And *much*
> simpler.
> 
> 
> The morale of the story:  Flat files rock!  ;-)

If I'm using Apache::DBI so I have a persistent connection to MySQL, 
would it not be faster to simply use a table in MySQL?


Peter



---------------------------
"Reality is that which, when you stop believing in it, doesn't go
away".
                -- Philip K. Dick


Re: Apache::Session - What goes in session?

Posted by si...@siberian.org.
Thanks, you just saved us a ton of time.

Off to change course ;)

J

On Tue, 20 Aug 2002 13:12:29 -0400
  Perrin Harkins <pe...@elem.com> wrote:
>siberian@siberian.org wrote:
>>We are investigating using IPC rather then a file based 
>>structure but 
>>its purely investigation at this point.
>>
>>What are the speed diffs between an IPC cache and a 
>>Berkely DB cache. My 
>>gut instinct always screams 'Stay Off The Disk' but my 
>>gut is not always 
>>right.. Ok, rarely right.. ;)
>
>Most of the shared memory modules are much slower than 
>Berkeley DB.  The fastest option around is IPC::MM, but 
>data you store in that does not persist if you restart 
>the server which is a problem for some. BerkeleyDB (the 
>new one, not DB_File) is also very fast, and other 
>options like Cache::Mmap and Cache::FileCache are much 
>faster than anything based on IPC::Sharelite and the 
>like.
>
>I have charts and numbers in my TPC presentation, which I 
>will be putting up soon.
>
>- Perrin
>


Re: Apache::Session - What goes in session?

Posted by Perrin Harkins <pe...@elem.com>.
Jie Gao wrote:
> There are cases in which it is desirable to expire an entry which
> hasn't been used for a certain period of time; authenticated sessions
> data, for example.

Okay, so you're looking for a session module rather than a cache. 
Apache::Session doesn't handle expiration, but you could add it, as many 
here have.  You could also just use one of the general-purpose storage 
modules like MLDBM::Sync, BerkeleyDB, or the storage modules in 
Cache::Cache (like Cache::FileBackend) and then add expiration.  Those 
are all generic storage modules with no cache-specific stuff in their APIs.

- Perrin



Re: Apache::Session - What goes in session?

Posted by Jie Gao <J....@isu.usyd.edu.au>.
On Tue, 20 Aug 2002, Perrin Harkins wrote:

> Jie Gao wrote:
>  > I wish some of these modules would be able to "touch" cached data so that
>  > it would expire cache entries on "last-accessed" rather than on the time
>  > the entries were created.
>
> Why?  People used to do that with cached because they had limited space
> and wanted to purge the cache with an LRU algorithm to keep size down,
> but disk space is too cheap to worry about now.
>
> If an item in the cache is okay to stay there as long as people are
> accessing it, you are essentially saying that cached items never become
> invalid.  In that case, why bother ever deleting any of them?

There are cases in which it is desirable to expire an entry which
hasn't been used for a certain period of time; authenticated sessions
data, for example. Absolute expiration is indeed needed, as well.

Regards,



Jie


Re: Apache::Session - What goes in session?

Posted by Perrin Harkins <pe...@elem.com>.
Jie Gao wrote:
 > I wish some of these modules would be able to "touch" cached data so that
 > it would expire cache entries on "last-accessed" rather than on the time
 > the entries were created.

Why?  People used to do that with cached because they had limited space
and wanted to purge the cache with an LRU algorithm to keep size down,
but disk space is too cheap to worry about now.

If an item in the cache is okay to stay there as long as people are
accessing it, you are essentially saying that cached items never become
invalid.  In that case, why bother ever deleting any of them?

- Perrin



Re: Apache::Session - What goes in session?

Posted by Jie Gao <J....@isu.usyd.edu.au>.
On Tue, 20 Aug 2002, Perrin Harkins wrote:

> Date: Tue, 20 Aug 2002 13:12:29 -0400
> From: Perrin Harkins <pe...@elem.com>
> To: siberian@siberian.org
> Cc: Dave Rolsky <au...@urth.org>, modperl@perl.apache.org
> Subject: Re: Apache::Session - What goes in session?
>
> siberian@siberian.org wrote:
> > We are investigating using IPC rather then a file based structure but
> > its purely investigation at this point.
> >
> > What are the speed diffs between an IPC cache and a Berkely DB cache. My
> > gut instinct always screams 'Stay Off The Disk' but my gut is not always
> > right.. Ok, rarely right.. ;)
>
> Most of the shared memory modules are much slower than Berkeley DB.  The
> fastest option around is IPC::MM, but data you store in that does not
> persist if you restart the server which is a problem for some.
> BerkeleyDB (the new one, not DB_File) is also very fast, and other
> options like Cache::Mmap and Cache::FileCache are much faster than
> anything based on IPC::Sharelite and the like.

I wish some of these modules would be able to "touch" cached data so that
it would expire cache entries on "last-accessed" rather than on the time
the entries were created.

Regards,



Jie


Re: Apache::Session - What goes in session?

Posted by Perrin Harkins <pe...@elem.com>.
siberian@siberian.org wrote:
> We are investigating using IPC rather then a file based structure but 
> its purely investigation at this point.
> 
> What are the speed diffs between an IPC cache and a Berkely DB cache. My 
> gut instinct always screams 'Stay Off The Disk' but my gut is not always 
> right.. Ok, rarely right.. ;)

Most of the shared memory modules are much slower than Berkeley DB.  The 
fastest option around is IPC::MM, but data you store in that does not 
persist if you restart the server which is a problem for some. 
BerkeleyDB (the new one, not DB_File) is also very fast, and other 
options like Cache::Mmap and Cache::FileCache are much faster than 
anything based on IPC::Sharelite and the like.

I have charts and numbers in my TPC presentation, which I will be 
putting up soon.

- Perrin


Re: Apache::Session - What goes in session?

Posted by Perrin Harkins <pe...@elem.com>.
Ask Bjoern Hansen wrote:
> The performance?  I don't remember the exact figure, but it was at
> least several times faster than the BerkeleyDB system.  And *much*
> simpler.

In my benchmarks, recent versions of BerkeleyDB, used with the 
BerkeleyDB module and allowed to manage their own locking, beat all 
available flat-file modules.  It may be possible to improve the 
flat-file ones, but it even beat Tie::TextDir which is about as simple 
(and therefore fast) as they come.  The only thing that did better was 
IPC::MM.

- Perrin


Re: Apache::Session - What goes in session?

Posted by Ask Bjoern Hansen <as...@develooper.com>.
On Tue, 20 Aug 2002 siberian@siberian.org wrote:

> We are investigating using IPC rather then a file based
> structure but its purely investigation at this point.
>
> What are the speed diffs between an IPC cache and a
> Berkely DB cache. My gut instinct always screams 'Stay Off
> The Disk' but my gut is not always right.. Ok, rarely
> right.. ;)

IPC (for many definitions of that) has all sorts of odd limitations
and isn't that fast.  Don't go there.

The disk is usually much faster than you think.  Often overlooked
for caching is a simple file based cache.

Here's a story about that:

A while ago Graham Barr and I spend some time going through a number
of iterations for a "self cleaning" cache system.  It would take
lots of writes and fewer reads.  In each cache entry a number of
integers would be stored.  Just storing the last thousand entries
would be enough.

We tried quite a few different approaches; the most noteworthy was a
system of semaphores to control access to a number of slots in a
BerkeleyDB.  That should be pretty fast, right?

It got a bit complicated as our systems didn't support that many
semaphores, so we had to come up with a system for sharing the
semaphores across multiple "slots" in the database.

Designing and writing this implementation took a few days.  It was
really cool.

Anyway, after fixing that and a few deadlocks we were benchmarking
away.  The system was so clever.  We thought it was simple and neat.
Okay, neat at least.  And it was really slow. Slow. (~200 writes a
second on a 400MHz Pentium II if I recall correctly).

First we suspected we did something wrong with the semaphores, but
further benchmarking showed that the BerkeleyDB just wasn't that
fast for writing.

30 minutes thinking and 30 minutes typing code later we had a
prototype for a simple filebased system.

Now using good old Fcntl to control access to simple "flat files".
(Data serialized with pack("N*", ...); I don't think anything beats
"pack" and "unpack" for serializing data).

The expiration went into the data and purging the cache was a simple
cronjob to find files older than a few minutes and deleting them.

The performance?  I don't remember the exact figure, but it was at
least several times faster than the BerkeleyDB system.  And *much*
simpler.


The morale of the story:  Flat files rock!  ;-)


  - ask

-- 
ask bjoern hansen, http://www.askbjoernhansen.com/ !try; do();


Re: Apache::Session - What goes in session?

Posted by si...@siberian.org.
We are investigating using IPC rather then a file based 
structure but its purely investigation at this point.

What are the speed diffs between an IPC cache and a 
Berkely DB cache. My gut instinct always screams 'Stay Off 
The Disk' but my gut is not always right.. Ok, rarely 
right.. ;)

John-

On Tue, 20 Aug 2002 11:49:52 -0500 (CDT)
  Dave Rolsky <au...@urth.org> wrote:
>On Tue, 20 Aug 2002 siberian@siberian.org wrote:
>
>> Currently we are working on a 'per machine' cache so all
>> children can benefit for each childs initial database 
>>read
>> of the translated string, the differential between
>> children is annoying in the 'per child cache' strategy.
>
>Sounds like you want BerkeleyDB.pm (not DB_File), which 
>is quite fast and
>handles locking/concurrent access internally (when set up 
>properly).
>
>See the Alzabo::ObjectCache::{Store,Sync}::BerkeleyDB 
>modules for
>examples.
>
>For Alzabo, I also have a caching system that caches data 
>in a database,
>for cross-machine caching/syncing.  I haven't really 
>benchmarked it yet
>but I imagine it could be a win in some situations.  For 
>example, you
>could set up the cache as a separate machine running 
>MySQL and still pull
>your data from another machine, possibly running a 
>different RDBMS.
>
>
>-dave
>
>/*==================
>www.urth.org
>we await the New Sun
>==================*/
>


Re: Apache::Session - What goes in session?

Posted by Dave Rolsky <au...@urth.org>.
On Tue, 20 Aug 2002 siberian@siberian.org wrote:

> Currently we are working on a 'per machine' cache so all
> children can benefit for each childs initial database read
> of the translated string, the differential between
> children is annoying in the 'per child cache' strategy.

Sounds like you want BerkeleyDB.pm (not DB_File), which is quite fast and
handles locking/concurrent access internally (when set up properly).

See the Alzabo::ObjectCache::{Store,Sync}::BerkeleyDB modules for
examples.

For Alzabo, I also have a caching system that caches data in a database,
for cross-machine caching/syncing.  I haven't really benchmarked it yet
but I imagine it could be a win in some situations.  For example, you
could set up the cache as a separate machine running MySQL and still pull
your data from another machine, possibly running a different RDBMS.


-dave

/*==================
www.urth.org
we await the New Sun
==================*/


Re: Apache::Session - What goes in session?

Posted by si...@siberian.org.
We do see some slowdown on our langauge translation db 
calls since they are so intensive. Moving to a 'per child' 
cache for each string as it came out of the db sped page 
loads up from 4.5 seconds to .6-1.0 seconds per page which 
is significant.

Currently we are working on a 'per machine' cache so all 
children can benefit for each childs initial database read 
of the translated string, the differential between 
children is annoying in the 'per child cache' strategy.

John-

On Tue, 20 Aug 2002 16:33:07 +0100
  Tony Bowden <to...@kasei.com> wrote:
>On Mon, Aug 19, 2002 at 06:54:01PM -0700, md wrote:
>> I can definitely get it all from the db, but that 
>>doesn't
>> seem very efficient.
>
>Don't worry about whether it *seems* efficient. Do it 
>right, and then
>worry about how to speed that up - if, and only if, it's 
>too slow.
>
>Premature optimisation is the root of all evil, and all 
>that ..
>
>At BlackStar the session was just a single hashed ID and 
>all other info
>was loaded from the database every time. We thought about 
>caching some
>info a few times, but always ran into problems with 
>replication.  In the
>end we discovered that fetching everything from the 
>database on every
>request wasn't noticeably slower than anything else we 
>could up with,
>and was a lot more flexible. Throwing more memory at the 
>database servers
>was usually quicker, cheaper and more effective than 
>micro-optimising
>our session vs caching strategy...
>
>Tony


Re: Apache::Session - What goes in session?

Posted by Tony Bowden <to...@kasei.com>.
On Mon, Aug 19, 2002 at 06:54:01PM -0700, md wrote:
> I can definitely get it all from the db, but that doesn't
> seem very efficient.

Don't worry about whether it *seems* efficient. Do it right, and then
worry about how to speed that up - if, and only if, it's too slow.

Premature optimisation is the root of all evil, and all that ..

At BlackStar the session was just a single hashed ID and all other info
was loaded from the database every time. We thought about caching some
info a few times, but always ran into problems with replication.  In the
end we discovered that fetching everything from the database on every
request wasn't noticeably slower than anything else we could up with,
and was a lot more flexible. Throwing more memory at the database servers
was usually quicker, cheaper and more effective than micro-optimising
our session vs caching strategy...

Tony

Re: Apache::Session - What goes in session?

Posted by md <my...@yahoo.com>.
Thanks...you've given me plenty to work with. Great
explination. This is good pragmatic stuff to know!


__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

Re: Apache::Session - What goes in session?

Posted by Perrin Harkins <pe...@elem.com>.
md wrote:
> I haven't looked at the cache modules docs yet...would
> it be possible to build cache on the separate
> load-balanced machines as we go along...as we do with
> template caching?

Of course.  However, if a user is sent to a random machine each time you 
won't be able to cache anything that a user is allowed to change during 
their time on the site, because they could end up on a machine that has 
an old cached value for it.  Sticky load-balancing or a cluster-wide 
cache (which you can update when data changes) deals with this problem.

> everything seems so user specific...

That doesn't mean you can't cache it.  You can do basically the same 
thing you were doing with the session: stuff a hash of user-specific 
stuff into the cache.  The next time that user sends a request, you 
check the cache for data on that user ID (you get the user ID from the 
session) and if you don't find any you just fetch it from the db.

Pseudo-code:

sub fetch_user_data {
   my $user_id = shift;
   my $user_data;
   unless ($user_data = fetch_from_cache($user_id)) {
     $user_data = fetch_from_db($user_id);
   }
   return $user_data;
}

> I would be curious though that if my choice is simply
> that the data is stored in the session or comes from
> the database with each request, would it still be best
> to essentially only store the session id in the
> session and pull everything else from the db? It still
> seems that something trivial like a greeting name (a
> preference) could go in the session.

Your decision about what to put in the session is not connected to your 
decision about what to pull from the db each time.  You can cache all 
the data if you want to, and still have very little in the session.

This might sound like an academic distinction, but I think it's 
important to keep the concepts separate: a session is a place to store 
transient state information that is irrelevant as soon as the user logs 
out, and a cache is a way of speeding up access to a slow resource like 
a database, and the two things should not be confused.  You can actually 
cache the session data if you need to (with a write-through cache that 
updates the backing database as well).  A cache will typically be faster 
than session storage because it doesn't need to be very reliable and 
because you can store and retrieve individual chunks of data (user's 
name, page names) when you need them instead of storing and retrieving 
everything on every request.  Separating these concepts allows you to do 
things like migrate the session storage to a transactional database some 
day, and move your cache storage to a distributed multicast cache when 
someone comes out with a module for that.

> The only
> gotcha would be that the calendar would need to update
> every day, at least on the current month's pages.

The cache modules I mentioned have a concept of "timeout" so that you 
can say "cache this for 12 hours" and then when it expires you fetch it 
again and update the cache for another 12 hours.

> Even though there are some "preset" pages, the user
> can change the names and the user can also create a
> cutom page with its own name.

No problem, you can cache data that's only useful for a single user, as 
I explained above.

> Not
> to mention that between the fact that the users' daily
> pages can have any number of user selected features
> per page and features themselves can have archive
> depths of anywhere from 3 to 20 years, there's a lot
> of info.

No problem, disks are cheap.  400MB of disk space will cost you about as 
much as a movie in New York these days.

- Perrin


Re: Apache::Session - What goes in session?

Posted by md <my...@yahoo.com>.
--- Perrin Harkins <pe...@elem.com> wrote:

> There are a few ways to deal with this.  The
> simplest is to use the 
> "sticky" load-balancing feature that many
> load-balancers have.  Failing 
> that, you can store to a network file system like
> NFS or CIFS, or use a 
> database.  (There are also fancier options with
> things like Spread, but 
> that's getting a little ahead of the game.)  You can
> use MySQL for 
> caching, and it will probably have similar
> performance to a networked 
> file system.  Unfortunately, the Apache::Session
> code isn't all that 
> easy to use for this, since it assumes you want to
> generate IDs for the 
> objects you store rather than passing them in.  You
> could adapt the code 
> from it to suit your needs though.  The important
> thing is to leave out 
> all of the mutually exclusive locking it implements,
> since a cache is 
> all about "get the latest as quick as you can" and
> lost updates are not 
> a problem ("last save wins" is good enough for a
> cache).

I haven't looked at the cache modules docs yet...would
it be possible to build cache on the separate
load-balanced machines as we go along...as we do with
template caching? By that I mean if an item has cached
on machine one then further requests on machine one
will come from cache where if on machine two the same
item hasn't cached, it will be pulled from the db the
first time and then cached?

If this isn't possible, I'm not sure if I'll be able
to implement any caching or not (some of the site
configuration is out of my hands) and everything seems
so user specific...I'll definitely reread your posts
and go through my app for things that should be
cached.

I would be curious though that if my choice is simply
that the data is stored in the session or comes from
the database with each request, would it still be best
to essentially only store the session id in the
session and pull everything else from the db? It still
seems that something trivial like a greeting name (a
preference) could go in the session.

> The relationships to the features and pages differ
> by user, but there 
> might be general information about the features
> themselves that is 
> stored in the database and is not user-specific. 
> That could be cached 
> separately, to save some trips to the db for each
> user.

The only thing I can think of right now is a
calendar...that should probably be cached. The only
gotcha would be that the calendar would need to update
every day, at least on the current month's pages. But
this is only on a "feature" page, not a users created
page (that is a user can click a link on their daily
page that takes them to a feature page where they can
go through archives).
 

> You can cache the names too if you want to, but
> keeping them out of the 
> session means that you won't be slowed down by
> fetching that extra data 
> and de-serializing it with Storable unless the page
> you're on actually 
> needs it.  

Even though there are some "preset" pages, the user
can change the names and the user can also create a
cutom page with its own name. So there could be
thousands of unique page names, many (most) specific
to unique users (like "Jim's Sports Page", etc.). Not
to mention that between the fact that the users' daily
pages can have any number of user selected features
per page and features themselves can have archive
depths of anywhere from 3 to 20 years, there's a lot
of info.

> It's also good to separate things that
> have to be reliable 
> (like the ID of the current user, since without that
> you have to send 
> them back to log in again) from things that don't
> need to be (you could 
> always fetch the list of pages from the db if your
> cache went down).

Very good advice. I've found that occasionally
something happens to my session where the sesssion id
is ok but some of the other data disapears (like
current page id) which really screws things up until
you log out and log back in again. This leads me to
suspect that I've answered my own question from above.
It's just whether I can cache or not.

Thanks for all your time and help.



__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

Re: Apache::Session - What goes in session?

Posted by Perrin Harkins <pe...@elem.com>.
md wrote:

>We are using a load-balanced
>system; I shoudl have mentioned that earlier. Won't
>that be an issue with caching to disk? Is it possible
>to cache to the db?
>

There are a few ways to deal with this.  The simplest is to use the 
"sticky" load-balancing feature that many load-balancers have.  Failing 
that, you can store to a network file system like NFS or CIFS, or use a 
database.  (There are also fancier options with things like Spread, but 
that's getting a little ahead of the game.)  You can use MySQL for 
caching, and it will probably have similar performance to a networked 
file system.  Unfortunately, the Apache::Session code isn't all that 
easy to use for this, since it assumes you want to generate IDs for the 
objects you store rather than passing them in.  You could adapt the code 
from it to suit your needs though.  The important thing is to leave out 
all of the mutually exclusive locking it implements, since a cache is 
all about "get the latest as quick as you can" and lost updates are not 
a problem ("last save wins" is good enough for a cache).

>The "modules" will consist of a "pages" module with
>the names of all the pages the user has created (with
>links) and a "emails" module which will display all
>the features that the user is getting via email. 
>These modules will be displayed on every page. 
>
>You can see that almost everything is user-specific.
>

The relationships to the features and pages differ by user, but there 
might be general information about the features themselves that is 
stored in the database and is not user-specific.  That could be cached 
separately, to save some trips to the db for each user.

>Right now I'm storing the page names/ids in a hash ref
>in the session (the emails module isn't live yet), but
>I thought that I would change that and only store the
>module id and pull the names from the db (if the user
>hasn't turned off the module) with each page call.
>

You can cache the names too if you want to, but keeping them out of the 
session means that you won't be slowed down by fetching that extra data 
and de-serializing it with Storable unless the page you're on actually 
needs it.  It's also good to separate things that have to be reliable 
(like the ID of the current user, since without that you have to send 
them back to log in again) from things that don't need to be (you could 
always fetch the list of pages from the db if your cache went down).

- Perrin


Re: Apache::Session - What goes in session?

Posted by md <my...@yahoo.com>.
--- Perrin Harkins <pe...@elem.com> wrote:

> >Current page name and id are never stored in db, so
> >different browser windows can be on different
> >pages...
> >
> 
> I thought your session was all stored in MySQL.  Why
> are you putting 
> these in the session exactly?  If these things are
> not relevant to more 
> than one request (page), they don't belong in the
> session.  They should 
> just be in ordinary variables.

You are correct, these items are in the session in the
db. I meant that they weren't kept in long term
storage in the db after the session ended like the
default page id and user name are. The current page
id/name is only relevent for an active session. Once a
session is started current page is set to whatever the
default page id is and will change as the user changes
pages. The only reason I did this (as I recall) is
that way I can get the page name once. 
 
> You should use a cache for that, rather than the
> session.  This is 
> long-term data that you just want quicker access to.

Yes, that's exactly what I want to do. My main concern
is long-term data that I want quicker access to. I can
definitely get it all from the db, but that doesn't
seem very efficient.
 
> Template Toolkit caches the compiled template code,
> but it doesn't cache 
> your data or the output of the templates.  What you
> should do is grab a 
> module like Cache::Cache or Cache::Mmap and take a
> look at the examples 
> there.  You use it in a way that's very similar to
> what you're doing 
> with Apache::Session for the things you referred to
> as global.  There 
> are also good examples in the documentation for the
> Memoize module.

Great...exactly the kind of info I was looking for.
I'll look at those. We are using a load-balanced
system; I shoudl have mentioned that earlier. Won't
that be an issue with caching to disk? Is it possible
to cache to the db?

> There are various reasons to use a cache rather than
> treating the 
> session like a cache.  If you put a lot of data in
> the session, it will 
> slow down every hit loading and saving that data. 
> In a cache, you can 
> just keep multiple cached items separately and only
> grab them if you 
> need them for this page.  With a cache you can store
> things that come 
> from the database but are not user-specific, like
> today's weather.

Thank you for all the excellent advice and
explination(in this and other posts).

Most of the info I'll be pulling is *very*
user-specific...user name, which features to display
on which page, what features the user gets by email,
etc.

What happens is the user logs in and then the username
(greeting), the default page id (the user can create
many pages with different features per page) and what
features go on the default page are pulled from the
database and the default page is displayed, as well as
any "module" info.

The "modules" will consist of a "pages" module with
the names of all the pages the user has created (with
links) and a "emails" module which will display all
the features that the user is getting via email. 
These modules will be displayed on every page. 

You can see that almost everything is user-specific.

Right now I'm storing the page names/ids in a hash ref
in the session (the emails module isn't live yet), but
I thought that I would change that and only store the
module id and pull the names from the db (if the user
hasn't turned off the module) with each page call.

Thanks again for all the info!

__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

Re: Apache::Session - What goes in session?

Posted by Perrin Harkins <pe...@elem.com>.
md wrote:

>I don't think "global" was the term I should have
>used. What I mean is data that will be seen on all or
>most pages by the same user...like "Hello Jim"
>

Okay, don't put that in the session.  It belongs in a cache.  The 
session is for transient state information, that you don't want to keep 
after the user logs out.

>Current page name and id are never stored in db, so
>different browser windows can be on different
>pages...
>

I thought your session was all stored in MySQL.  Why are you putting 
these in the session exactly?  If these things are not relevant to more 
than one request (page), they don't belong in the session.  They should 
just be in ordinary variables.

>>That sounds like a "user" or "subscriptions" object
>>to me, not session data.
>>    
>>
>
>Once again, I shouldn't have used the term "global".
>This is the "subscriptions" info for a single
>user...that's why I had thought to put this in the
>session instead of pulling from the db each page call
>since the data will rarely change.
>

You should use a cache for that, rather than the session.  This is 
long-term data that you just want quicker access to.

>I am using TT caching
>for the templates, but I'm not sure how to cache the
>non-session data.
>

Template Toolkit caches the compiled template code, but it doesn't cache 
your data or the output of the templates.  What you should do is grab a 
module like Cache::Cache or Cache::Mmap and take a look at the examples 
there.  You use it in a way that's very similar to what you're doing 
with Apache::Session for the things you referred to as global.  There 
are also good examples in the documentation for the Memoize module.

There are various reasons to use a cache rather than treating the 
session like a cache.  If you put a lot of data in the session, it will 
slow down every hit loading and saving that data.  In a cache, you can 
just keep multiple cached items separately and only grab them if you 
need them for this page.  With a cache you can store things that come 
from the database but are not user-specific, like today's weather.

>What about something like "default page id", which is
>the page that is considered your home page? This id is
>stored permanently in the db ("lasts more than the
>current current browsing session") but I keep it in
>the session since this also rarely changes so I don't
>want 
>to keep hitting the db to get it.
>

I would have some kind of user object which has a property of 
default_page_id.  The first time the user logs in I would fetch that 
from the database, and then I would cache it so that I wouldn't need to 
go back to the database for it on future requests.

- Perrin


Re: Apache::Session - What goes in session?

Posted by md <my...@yahoo.com>.
--- Perrin Harkins <pe...@elem.com> wrote:
> md wrote:

> That doesn't sound very global to me.  What happens
> when users open 
> multiple browser windows on your site?  Doesn't it
> screw up the "current 
> page" data?

I don't think "global" was the term I should have
used. What I mean is data that will be seen on all or
most pages by the same user...like "Hello Jim", where
"Jim" is pulled from the database when the session is
created and passed around in the session after that
(and updated in the db and session if user changes
their greeting name). 

Current page name and id are never stored in db, so
different browser windows can be on different
pages...I'm not sure if that's good or bad. However,
changes to the user name will be seen in both browser
windows since that's updated both in the session and
db.
 

> Optimizing database fetches or caching data is
> independent of the 
> session issue.  Nothing that is relevant to more
> than one user should 
> ever go in the session.

Correct. That little info I am putting in the session
corresponds directly to a single user.
 

> That sounds like a "user" or "subscriptions" object
> to me, not session data.

Once again, I shouldn't have used the term "global".
This is the "subscriptions" info for a single
user...that's why I had thought to put this in the
session instead of pulling from the db each page call
since the data will rarely change. This info will be
displayed on every page the user visits (unless they
"turn off" this module).

 
> No, that's caching.  Don't use the session for
> caching, use a cache for 
> it.  They're not the same.  A session is often
> stored in a database so 
> that it can be reliable.  A cache is usually stored
> on the file system 
> so it can be fast.

The session is stored in a database
(Apache::Session::MySQL), and I am using TT caching
for the templates, but I'm not sure how to cache the
non-session data. I've seen this discussed but I
definitely need more info on this. As it stands I see
two options: get data from the session or get it from
the db...how do I bring  caching into play?
 
> Things like the login status of this session, and
> the user ID that is 
> associated with it go in the session.  Status of a
> particular page has 
> to be passed in query args or hidden fields, to
> avoid problems with 
> multiple browser windows.  Data that applies to
> multiple users or lasts 
> more than the current browsing session never goes in
> the session.

What about something like "default page id", which is
the page that is considered your home page? This id is
stored permanently in the db ("lasts more than the
current current browsing session") but I keep it in
the session since this also rarely changes so I don't
want 
to keep hitting the db to get it.

Thanks again...



__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

Re: Apache::Session - What goes in session?

Posted by Perrin Harkins <pe...@elem.com>.
md wrote:
> Currently I'm putting very little in the session

Good.  You should put in as little as possible.

> what I am putting in the session is more "global" in
> nature...greeting, current page number, current page
> name...

That doesn't sound very global to me.  What happens when users open 
multiple browser windows on your site?  Doesn't it screw up the "current 
page" data?

> I'm
> pulling a lot of info from the database and I wonder
> if my design is sound.

Optimizing database fetches or caching data is independent of the 
session issue.  Nothing that is relevant to more than one user should 
ever go in the session.

> Now I need to add global "modules" to the page which
> will show user info like which pages they have created
> and which features are being emailed to the user.
> These modules will display on every page unless the
> user turns them off.

That sounds like a "user" or "subscriptions" object to me, not session data.

> It seems that since this info
> wouldn't change very often that I should put the data
> in the session...

No, that's caching.  Don't use the session for caching, use a cache for 
it.  They're not the same.  A session is often stored in a database so 
that it can be reliable.  A cache is usually stored on the file system 
so it can be fast.

Things like the login status of this session, and the user ID that is 
associated with it go in the session.  Status of a particular page has 
to be passed in query args or hidden fields, to avoid problems with 
multiple browser windows.  Data that applies to multiple users or lasts 
more than the current browsing session never goes in the session.

- Perrin


RE: Apache::Session - What goes in session?

Posted by Jesse Erlbaum <je...@erlbaum.net>.
Hello md --

> I'm using mod_perl and Apache::Session on an app that
> is similar to MyYahoo. I found a few bits of info from
> a previous thread, but I'm curious as to what type of
> information should go in the session and what should
> come from the database.

One thing to watch out for is the trap of using session data as a dumping
ground for global variables.  Since you are asking "what belongs in a
session", it seems you are already thinking along those lines.  I have found
that many people who are fond of sessions often use them to store data which
I would be personally inclined to store in hidden form data, in a simple
cookie, or retrieve from a database when needed.

In my systems I usually only store a single "session ID" in a cookie -- a
key which references a database row.  This allows me to have as much data as
I like but keep it all in the database.  There is one case where it might
make sense to put data into a "session" of some sort -- to cache information
which is very time-consuming to retrieve.  Minimizing time-consuming
database operations is an important thing to think about in large systems,
and a place where session data might come in handy.

Warmest regards,

-Jesse-


--

  Jesse Erlbaum
  The Erlbaum Group
  jesse@erlbaum.net
  Phone: 212-684-6161
  Fax: 212-684-6226




Apache::Session - What goes in session?

Posted by md <my...@yahoo.com>.
I'm using mod_perl and Apache::Session on an app that
is similar to MyYahoo. I found a few bits of info from
a previous thread, but I'm curious as to what type of
information should go in the session and what should
come from the database.

Currently I'm putting very little in the session, but
what I am putting in the session is more "global" in
nature...greeting, current page number, current page
name...data that doesn't change very often. I'm
pulling a lot of info from the database and I wonder
if my design is sound. Most of the info being pulled
from the database is features for the page. 

Now I need to add global "modules" to the page which
will show user info like which pages they have created
and which features are being emailed to the user.
These modules will display on every page unless the
user turns them off. It seems that since this info
wouldn't change very often that I should put the data
in the session...

Anyone have any general tips on session design?

Thanks.

__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

Re: when to mod_perl?

Posted by Peter Bi <mo...@att.net>.
>....
> Thanks to the caching, any of my images or other static content gets
> pushed once a day to the front, and then doesn't tie up the back ever
> again. .....

I have a question regarding to the cached files. Although the maximal period
is set to be 24 hours in httpd.conf's proxy settings, many of the files,
which were cached from the backend mod_perl dynamical program, are strangely
modified every a few minutes. For all the files I checked so far, they do
look to be modified because the hex strings on top of the files (such as
000000003D189FC2) are different after each modifications.

Forgive me if this is off-topic: it is more likely a mod_proxy question. I
searched, but could not find related information pages to read.

Thanks.


Peter Bi

----- Original Message -----
From: "Randal L. Schwartz" <me...@stonehenge.com>
To: "Perrin Harkins" <pe...@elem.com>
Cc: "md" <my...@yahoo.com>; "Stas Bekman" <st...@stason.org>;
<mo...@perl.apache.org>
Sent: Tuesday, June 25, 2002 8:38 AM
Subject: Re: when to mod_perl?


> >>>>> "Perrin" == Perrin Harkins <pe...@elem.com> writes:
>
> Perrin> Static content is easy; just don't serve it from mod_perl.  The
proxy
> Perrin> approach is good, and so is a separate image server (which you can
> Perrin> host on the same machine).  I've found thttpd to be an amazingly
> Perrin> efficient server for images, but a slimmed-down apache does very
well
> Perrin> too.
>
> On the new www.stonehenge.com, I'm using a stripped down Apache (just
> mod_proxy and mod_rewrite) for a reverse caching proxy, and it's about
> 1.5M RSS per process.  I divert requests for TT's /splash/images and
> Apache's /icons, but otherwise, all content requests (including for
> /merlyn/Pictures/ images) go to my heavyweight mod_perl backends,
> which are running about 10M RSS.
>
> Thanks to the caching, any of my images or other static content gets
> pushed once a day to the front, and then doesn't tie up the back ever
> again.  On a 500Mhz 256M box, I'm easily serving 50K requests a day
> (about 10K of those are fully uncached dynamic pages touching about 20
> to 50 TT includes), with loadaverages staying below 0.5.  If it ever
> starts getting higher, I can cache the expensive menubar creation
> (which is nearly completely static) using Perrin's device, but I've
> not bothered yet.
>
> It's been amazingly carefree.  I'm planning to move
> www.geekcruises.com to be served on the same box, although they get
> only about 1/10th the traffic.
>
> --
> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777
0095
> <me...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
> See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl
training!
>


Re: when to mod_perl?

Posted by "Randal L. Schwartz" <me...@stonehenge.com>.
>>>>> "Perrin" == Perrin Harkins <pe...@elem.com> writes:

Perrin> Static content is easy; just don't serve it from mod_perl.  The proxy
Perrin> approach is good, and so is a separate image server (which you can
Perrin> host on the same machine).  I've found thttpd to be an amazingly
Perrin> efficient server for images, but a slimmed-down apache does very well
Perrin> too.

On the new www.stonehenge.com, I'm using a stripped down Apache (just
mod_proxy and mod_rewrite) for a reverse caching proxy, and it's about
1.5M RSS per process.  I divert requests for TT's /splash/images and
Apache's /icons, but otherwise, all content requests (including for
/merlyn/Pictures/ images) go to my heavyweight mod_perl backends,
which are running about 10M RSS.

Thanks to the caching, any of my images or other static content gets
pushed once a day to the front, and then doesn't tie up the back ever
again.  On a 500Mhz 256M box, I'm easily serving 50K requests a day
(about 10K of those are fully uncached dynamic pages touching about 20
to 50 TT includes), with loadaverages staying below 0.5.  If it ever
starts getting higher, I can cache the expensive menubar creation
(which is nearly completely static) using Perrin's device, but I've
not bothered yet.

It's been amazingly carefree.  I'm planning to move
www.geekcruises.com to be served on the same box, although they get
only about 1/10th the traffic.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<me...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Re: when to mod_perl?

Posted by Perrin Harkins <pe...@elem.com>.
md wrote:
> I was just a bit worried about the amount of static
> content. In the past I've had a lot more hardware to
> work with and I never had to worry about it much.

Static content is easy; just don't serve it from mod_perl.  The proxy 
approach is good, and so is a separate image server (which you can host 
on the same machine).  I've found thttpd to be an amazingly efficient 
server for images, but a slimmed-down apache does very well too.

- Perrin


Re: when to mod_perl?

Posted by Stas Bekman <st...@stason.org>.
Peter Bi wrote:
> wait a second ...
> 
> don't forget using proxy: it saves you a lot of dynamical calls, especially
> if you have also a database.

good point, Peter. And there are many others. It's the best if you can 
take some time and read the guide before you start coding. It includes a 
big chunk of the wisdow that passed through this list in the last 5 years.

In your case I'd suggest reading at least:

http://perl.apache.org/release/docs/1.0/guide/strategy.html
http://perl.apache.org/release/docs/1.0/guide/scenario.html
http://perl.apache.org/release/docs/1.0/guide/performance.html

and probably these two:

http://perl.apache.org/release/docs/general/perl_reference.html
http://perl.apache.org/release/docs/1.0/guide/porting.html


__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


Re: when to mod_perl?

Posted by Peter Bi <mo...@att.net>.
wait a second ...

don't forget using proxy: it saves you a lot of dynamical calls, especially
if you have also a database.


Peter Bi


----- Original Message -----
From: "md" <my...@yahoo.com>
To: "Stas Bekman" <st...@stason.org>
Cc: <mo...@perl.apache.org>
Sent: Monday, June 24, 2002 9:36 PM
Subject: Re: when to mod_perl?


>
> --- Stas Bekman <st...@stason.org> wrote:
> > In any case we are talking about registry scripts,
> > aren't we? In that
> > case it takes very little time to turn it on and off
> > and test what is
> > better. Unless you are talking about writing full
> > fledged mod_perl API
> > handlers, which is only when your should
> > plan/analyze before you code.
>
> Actually at first I was planning to do full fledged
> mod_perl handlers, so that's why I wanted to plan
> before I coded.
>
> I was just a bit worried about the amount of static
> content. In the past I've had a lot more hardware to
> work with and I never had to worry about it much.
>
> I think you all have answered my question well enough
> that I feel confortable sticking with straight
> mod_perl.
>
> Thanks...
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! - Official partner of 2002 FIFA World Cup
> http://fifaworldcup.yahoo.com


Re: when to mod_perl?

Posted by md <my...@yahoo.com>.
--- Stas Bekman <st...@stason.org> wrote:
> In any case we are talking about registry scripts,
> aren't we? In that 
> case it takes very little time to turn it on and off
> and test what is 
> better. Unless you are talking about writing full
> fledged mod_perl API 
> handlers, which is only when your should
> plan/analyze before you code.

Actually at first I was planning to do full fledged
mod_perl handlers, so that's why I wanted to plan
before I coded. 

I was just a bit worried about the amount of static
content. In the past I've had a lot more hardware to
work with and I never had to worry about it much.

I think you all have answered my question well enough
that I feel confortable sticking with straight
mod_perl.

Thanks...

__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

Re: when to mod_perl?

Posted by Stas Bekman <st...@stason.org>.
md wrote:
> Hello,
> 
> I'm working on a dynamic site that I originally
> thought I would do with mod_perl. Now after reviewing
> the requirements and available hardware, I wonder if
> mod_perl will be my best solution.
> 
> The machine will not be a huge box (though I wasn't
> provided much in the way of specs) and will only have
> 256M of RAM. There will be static content, incuding
> 5-10 images per page. The client has only given me
> sparse information, but claimed that he currently had
> 4,000 unique visitors a day and wanted to move to
> 10,000-15,000 unique visitors per day (he didn't give
> me page view stats). I may or may not be able to set
> up multiple instances of Apache.
> 
> Given limited hardware (esp. RAM), am I better off to
> go with mod_perl (larger Apache processes with limited
> RAM) or CGI (smaller apache processes but the usual
> cons)?

Don't get mislead by the memory requirements. If your code will run 10 
times faster you will need *at least* 10 times less servers to do the 
job. But it's not uncommon to get even better speedups. So chances are 
that mod_perl will be a win in any case. Read the guide for restricting 
the memory used, shared memory, etc., and you are all set. It includes 
some numbers, showing how much memory you really need if you follow the 
guidelines.

The only situation when mod_cgi could be a win over mod_perl is when you 
  have almost zero code loaded and most of your operations are CPU or 
IO/bound, so mod_perl's precompilation/caching won't help much. but 
that's a very rare situation.

In any case we are talking about registry scripts, aren't we? In that 
case it takes very little time to turn it on and off and test what is 
better. Unless you are talking about writing full fledged mod_perl API 
handlers, which is only when your should plan/analyze before you code.


__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


RE: when to mod_perl?

Posted by David LeBlanc <wh...@oz.net>.
You don't mention what OS you're using, but with Linux, 256mb just running
httpd seems quite generous whether you're using mod_perl or not. From what I
know, mod_perl is going to give you more performance on any given box.

And now, I can't resist: When should you? Why, when you're in the mod of
course ;)

David LeBlanc
Seattle, WA USA

> -----Original Message-----
> From: md [mailto:my_pretty_perl@yahoo.com]
> Sent: Monday, June 24, 2002 19:22
> To: modperl@perl.apache.org
> Subject: when to mod_perl?
>
>
> Hello,
>
> I'm working on a dynamic site that I originally
> thought I would do with mod_perl. Now after reviewing
> the requirements and available hardware, I wonder if
> mod_perl will be my best solution.
>
> The machine will not be a huge box (though I wasn't
> provided much in the way of specs) and will only have
> 256M of RAM. There will be static content, incuding
> 5-10 images per page. The client has only given me
> sparse information, but claimed that he currently had
> 4,000 unique visitors a day and wanted to move to
> 10,000-15,000 unique visitors per day (he didn't give
> me page view stats). I may or may not be able to set
> up multiple instances of Apache.
>
> Given limited hardware (esp. RAM), am I better off to
> go with mod_perl (larger Apache processes with limited
> RAM) or CGI (smaller apache processes but the usual
> cons)?
>
> Thanks...
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! - Official partner of 2002 FIFA World Cup
> http://fifaworldcup.yahoo.com