You are viewing a plain text version of this content. The canonical link for it is here.

Posted to modperl@perl.apache.org by sh...@isupportlive.com on 2000/04/15 09:03:30 UTC

Modperl/Apache deficiencies... Memory usage.

Modperlers...,

I'd like to start a discussion about the deficiences in Apache/modperl
and get you feedback with regard to this issue.  The problem as I see
it is that the process model that Apache uses is very hard on modperl.
It is very memory inneffecient basically.  Each process of apache has
it's registry which holds the compiled perl scripts in..., a copy of
each for each process.  This has become an issue for one of the
companies that I work for, and I noted from monitoring the list that
some people have apache processes that are upwards of 25Megs, which is
frankly ridiculous.

This is not meant to be a flame, and I'd really like to get down to
the nitty gritty on how we can solve this problem.  Zach Brown wrote
phhttpd which is a threaded server which can handle a lot more load
than apache, but the problem is it doesn't have the features that
Apache has, and it's not going to catch up any time soon so I think
it's not going to be the cure-all.  I wrote a very small perl engine
for phhttpd that worked within it's threaded paradigm that sucked up a
neglibible amount of memory which used a very basic version of
Apache's registry.  Once again though it didn't have the feature set
that Apache/modperl has.  Now to address the issue: I think we have a lot of
code in Modperl that is basically awesome.  Not to mention the fact
that Apache itself has a lot of modules and other things which are
quite usefull.  However I think if it were possible to divorce the
actually perl engine from the Apache process we could solve this memory
usage problem.

Basically heres what I'm thinking might be possible, but if it's not
just let me know.  (Well, I know it's possible, but I mean how much
work would it take to institute, and has someone else worked on this,
or taken a look at how much work we'd be talking about)  What I'm
thinking is essentially we take the perl engine which has the apache
registry and all the perl symbols etc., and seperate it into it's own
process which would could be multithreaded (via pthreads) for multiple
processor boxes.  (above 2 this would be beneficial probably)  On the
front side the apache module API would just connect into this other
process via shared memory pages (shmget et. al), or Unix pipes or
something like that.  The mod_perl process would have a work queue
that the Apache processes could add work to via our front end API.
The work threads inside of that mod_perl process would take work
"orders" out of the work queue and process them and send the result
back to the waiting apache process.  (Maybe just something as simple
as a blocking read on a pipe coming out of the mod_perl process...
this would keep down context switching issues and other nasty bits)

One of my concerns is that maybe the apache module API is simply too
complex to pull something like this off.  I don't know, but it seems
like it should be able to handle something like this.

Does anyone know of any program which has been developed like this?
Basically we'd be turning the "module of apache" portion of mod_perl
into a front end to the "application server" portion of mod_perl that
would do the actual processing.  It seems quite logical that something
like this would have been developed, but possibly not.  The seperation
of the two components seems like it should be done, but there must be
a reason why no one has done it yet... I'm afraid this reason would be
the apache module API doesn't lend itself to this.

Well, thanks to everyone in advance for their thoughts/comments...
Shane Nay.

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Gunther Birznieks <gu...@extropia.com>.

I think I may be a bit dense on this list so forgive me if I try to clarify 
(at least for myself to make sure I have this right)...

I think what you are proposing is not that much different from the proxy 
front-end model. The mod_proxy is added overhead, but that solves your 
memory problem. You can have 50 apache processes on the front-end dealing 
with images and the like and then have only 2 or 5 or however many 
Apache/Perl processes on the backend.

The only inefficiency with this is that HTTP is the protocol being used for 
the front-end HTTPD daemon to communicate with Perl instead of a direct 
socket using a binary/compressed data protocol.

By the way, if you really prefer this out of process yet still a pool of 
Perl interpreters model, you could always consider purchasing Binary 
Evolution's Velocigen product for Netscape for UNIX. I believe they have a 
mode that allows the Perl engine to run out-of-process with a lightweight 
NSAPI wrapper talking to Perl.

It turns out that this is probably the best way to deal with a buggy 
product like Netscape anyway... NSAPI is such a flakey beast that it's no 
wonder that a company would want to separate the application processes out 
(but now I am getting out of topic).

It's likely that this is a faster solution that the mod_proxy solution 
mod_perl uses because mod_proxy and HTTP are both relatively complex and 
designed to do more than provide back-end application server communications.

Here's the relevant Velocigen URL:

http://www.binaryevolution.com/velocigen/arch.vet

However, I would caution that really mod_perl speeds things up SO much as 
it is, that this architectural improvement over using front-end/back-end 
apache servers is really probably not going to make that big a deal unless 
you are writing something that will be under some really really heavy 
stress. And, of course, you should do your own benchmarking to see if this 
is the case.

While you are at it, you might consider PerlEx from ActiveState. As that 
provide in-process thread-pooled Perl engines that run in the IIS memory 
space.

But again, I would stress that speed isn't the only thing. Think about 
reliability. I think the mod_perl model tends to be more reliable (in the 
front/backend scenario) because the apache servers can be monitored to die 
off independently when they spin out of control.. and they can't pollute 
each other's memory space.  Using some mod_rewrite rules, you can also 
limit which applications are partitioned from each other in which back-end 
services as well very easily.

I don't know how easily you can specify what I would term 
application-affinities in the Velocigen or PerlEx model based on URL alone.

Anyway, good luck with your search for information...

Thanks,
     Gunther

At 10:46 PM 4/15/00 +0000, shane@isupportlive.com wrote:
>Perrin-
>On Sat, Apr 15, 2000 at 11:33:15AM -0700, Perrin Harkins wrote:
> > > Each process of apache has
> > > it's registry which holds the compiled perl scripts in..., a copy of
> > > each for each process.  This has become an issue for one of the
> > > companies that I work for, and I noted from monitoring the list that
> > > some people have apache processes that are upwards of 25Megs, which is
> > > frankly ridiculous.
> >
> > I have processes that large, but more than 50% of that is shared through
> > copy-on-write.
> >
> > > I wrote a very small perl engine
> > > for phhttpd that worked within it's threaded paradigm that sucked up a
> > > neglibible amount of memory which used a very basic version of
> > > Apache's registry.
> >
> > Can you explain how this uses less memory than mod_perl doing the same
> > thing?  Was it just that you were using fewer perl interpreters?  If 
> so, you
> > need to improve your use of apache with a multi-server setup.  The only way
> > I could see phttpd really using less memory to do the same work is if you
> > somehow managed to get perl to share more of its internals in memory.  Did
> > you?
>
>Yep very handily I might add ;-).  Basically phhttpd is not process
>based, it's threaded based.  Which means that everything is running
>inside of the same address space.  Which means 100% sharing except for
>the present local stack of variables... which is very minimal.  In
>terms of the perl thing... when you look at your processes and see all
>that non-shared memory, most of that is stack variables.  Now most
>webservers are running on single processor machines, so they get no
>benefit from having 10s or even 100s of copies of these perl stack
>variables.  Its much more efficient to have a single process handle
>all the perl requests.  On a multiprocessor box that single process
>could have multiple threads in order to take advantage of the
>processors.  See..., mod_perl stores the stack state of every script
>it runs in the apache process... for every script... copies of it,
>many many copies of it.  This is not efficient.  What would be
>efficient is to have as many threads/processes as you have processors
>for the mod_perl engine.  In other words seperate the engine from the
>apache process so that there is never unneccesary stack variables
>being tracked.
>
>Hmm... can I explain this better.  Let me try.  Okay, for every apache
>proccess there is an entire perl engine with all the stack variables
>for every script you run recorded there.  What I'm proposing is a
>system where by there would be a seperate process that would have only
>a perl engine in it... you would make as many of these processes as
>you have processors.  (Or multithread them... it doesn't really
>matter)  Now your apache processes would not have a bunch of junk
>memory in them.  Your apache processes would be the size of a stock
>apache process, like 4-6M or so, and you would have 1 process that
>would be 25MB or so that would have all your registry in it.  For a
>high capacity box this would be an incredible boon to increasing
>capacity.  (I'm trying to explain clearly, but I'd be the first to
>admit this isn't one of my strong points)
>
>As to how the multithreaded phhttpd can handle tons of load, well...
>that's a seperate issue and frankly a question much better handled by
>Zach.  I understand it very well, but I don't feel that I could
>adequately explain it.  Its based on real time sig_queue software
>technology... for a "decent" reference on this you can take a look at
>a book by Oreily called "POSIX.4 Programming for the Real World".  I
>should say that this book doesn't go into enough depth... but it's the
>only book that goes into any depth that I could find.
>
> >
> > > What I'm
> > > thinking is essentially we take the perl engine which has the apache
> > > registry and all the perl symbols etc., and seperate it into it's own
> > > process which would could be multithreaded (via pthreads) for multiple
> > > processor boxes.  (above 2 this would be beneficial probably)  On the
> > > front side the apache module API would just connect into this other
> > > process via shared memory pages (shmget et. al), or Unix pipes or
> > > something like that.
> >
> > This is how FastCGI, and all the Java servlet runners (JServ, Resin, etc.)
> > work.  The thing is, even if you run the perl interpreters in a
> > multi-threaded process, it still needs one interpreter per perl thread 
> and I
> > don't know how much you'd be able to share between them.  It might not be
> > any smaller at all.
>
>But there is no need to have more than one perl thread per processor.
>Right now we have a perl "thread" (er.. engine is a better term) per
>process.  Since most boxes start up 10 processes or so of Apache we'd
>be talking about a memory savings something like this:
>6MB stock apache process
>25MB (we'll say that's average) mod_perl apache process 50% shared,
>leaving 12.5 MB non shared
>The way it works now: 12.5 * 10=125MB + 12.5 (shared bit one
>instance)= 147.5 MB total.
>Suggested way:
>6MB stock with about 3MB shared or so.  3MB * 10=30 +25MB mod_perl
>process = 55MB total.
>
>That would be an overal difference of 147.5-55... almost 100 MB of
>memory.  I have no idea how accurate this is, but I'd put my money on
>not too far from the expected result in a high load enviro with lots
>of apache scripts.
>
> >
> > My suggestion would be to look at the two-server approach for mod_perl, and
> > if that doesn't work for you look at FastCGI, and if that doesn't work for
> > you join the effort to get mod_perl working on Apache 2.0 with a
> > multi-threaded model.  Or just skip the preliminaries and go straight for
> > the hack value...
>
>Well... the second option certainly has a lot of merit.  Maybe I
>should get involved in that... actually that has a lot of appeal to
>me.  Hmm... I guess it's time to pick apache 2.0 stuff and do some
>tinkering! :)  As far as the present problem... I'm not all that
>concerned about it.  It actually falls outside of the area of my
>responsibilities at our site..., I'm thinking for the other people in
>the community mostly.
>
>Thanks!
>Shane
> >
> > - Perrin
> >

__________________________________________________
Gunther Birznieks (gunther.birznieks@extropia.com)
Extropia - The Web Technology Company
http://www.extropia.com/

SUMMARY Re: mod_perl/apache and db/content server

Posted by "Angel R. Rivera" <an...@wolf.com>.

Would like to thank everyone who has answered publicly and privately.
Special thanks to Stas Bekman for 'splaining it so I would understand.

>Apache::DBI uses DBI, it doesn't matter where the db is located as long as
>you can connect to it thru DBI. If I understand your decription correctly,
>you shouldn't have any problem. But it has nothing to do with mod_perl,
>other than my original reply.


Angel R. Rivera, angel@wolf.com
--------------------------------------------------------------------------
Website:  http://www.wolf.com
Lists:	     http://www.wolf.com/lists/
Our Life:   http://www.wolf.com/Ookami.jpg
--------------------------------------------------------------------------
"The Quality of a person's life is in direct proportion to their commitment
   to excellence, regardless of their chosen field of endeavor."

                                             Vincent T. Lombardi

Re: mod_perl/apache and db/content server

Posted by Stas Bekman <sb...@stason.org>.

On Tue, 18 Apr 2000, Angel R. Rivera wrote:

> hanks a lot that does help.
> 
> We will not be using apache/mod_perl or DBI on the DS20. It will
> only be a content server (containing the actual HTML pages) and
> database server  (PostgreSQL and Oracle) everything else will
> handled by unix and linux boxes running apache/mod_perl/mod-
> a-bunch-of stuff as the frontend.  Can Apache::DBI still control
> the db reconnection issue?

Apache::DBI works only with mod_perl server. 

>   At 11:46 AM 4/18/00 -0500, Rafael Caceres wrote:
> >We have an Digital 4100 Alpha box currently running Apache 1.3.3 with
> >mod_perl 1.21 and MS Frontpage extensions.
> >The main use for mod_perl is script that manage a bug tracking application
> >that connects, using DBD:DBI to an Oracle database.
> >The performance advantage is evident, as re-connection to the database is
> >not necessary. I would do it again without any second thoughts.
> 
> 
> Angel R. Rivera, angel@wolf.com
> --------------------------------------------------------------------------
> Website:  http://www.wolf.com
> Lists:	     http://www.wolf.com/lists/
> Our Life:   http://www.wolf.com/Ookami.jpg
> --------------------------------------------------------------------------
> "The Quality of a person's life is in direct proportion to their commitment
>    to excellence, regardless of their chosen field of endeavor."
> 
>                                              Vincent T. Lombardi
> 
> 



______________________________________________________________________
Stas Bekman             | JAm_pH    --    Just Another mod_perl Hacker
http://stason.org/      | mod_perl Guide  http://perl.apache.org/guide 
mailto:stas@stason.org  | http://perl.org    http://stason.org/TULARC/
http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org
----------------------------------------------------------------------

Re: mod_perl/apache and db/content server

Posted by "Angel R. Rivera" <an...@wolf.com>.

hanks a lot that does help.

We will not be using apache/mod_perl or DBI on the DS20. It will
only be a content server (containing the actual HTML pages) and
database server  (PostgreSQL and Oracle) everything else will
handled by unix and linux boxes running apache/mod_perl/mod-
a-bunch-of stuff as the frontend.  Can Apache::DBI still control
the db reconnection issue?

  At 11:46 AM 4/18/00 -0500, Rafael Caceres wrote:
>We have an Digital 4100 Alpha box currently running Apache 1.3.3 with
>mod_perl 1.21 and MS Frontpage extensions.
>The main use for mod_perl is script that manage a bug tracking application
>that connects, using DBD:DBI to an Oracle database.
>The performance advantage is evident, as re-connection to the database is
>not necessary. I would do it again without any second thoughts.


Angel R. Rivera, angel@wolf.com
--------------------------------------------------------------------------
Website:  http://www.wolf.com
Lists:	     http://www.wolf.com/lists/
Our Life:   http://www.wolf.com/Ookami.jpg
--------------------------------------------------------------------------
"The Quality of a person's life is in direct proportion to their commitment
   to excellence, regardless of their chosen field of endeavor."

                                             Vincent T. Lombardi

mod_perl/apache and db/content server

Posted by "Angel R. Rivera" <an...@wolf.com>.

We have decided it is time to set up a DS20 class Alpha box
as content/db server.  Database and content will reside on the
alpha with web server boxes on the front end.  My question is
will mod_perl still provide performance improvements-specially
w/ Apache::DBI?  TIA., -ar


Angel R. Rivera, angel@wolf.com
--------------------------------------------------------------------------
Website:  http://www.wolf.com
Lists:	     http://www.wolf.com/lists/
Our Life:   http://www.wolf.com/Ookami.jpg
--------------------------------------------------------------------------
"The Quality of a person's life is in direct proportion to their commitment
   to excellence, regardless of their chosen field of endeavor."

                                             Vincent T. Lombardi

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Gunther Birznieks <gu...@extropia.com>.

Actually, it is a bit more complicated. I was wrong in my assumptions. To 
be more blunt, I was wrong period. :)

I had been testing some stuff using global variables and experienced this 
behavior of non-releasing RAM. Since then a few people privately showed me 
otherwise and Stas' posted a thread pointing to a discussion on this issue 
from a past-mod_perl list discussion.

It is still the case that one needs to be careful with how variables grow 
in mod_perl, but it is not entirely the case that the RAM is always not freed.

Here is the URL that Stas posted which makes some of these persistent 
server memory issues more clear (perhaps even for a growing Java server).

http://forum.swarthmore.edu/epigone/modperl/zarwhegerd

Later,
     Gunther

At 11:19 AM 4/19/00 +0000, shane@isupportlive.com wrote:
>On Tue, Apr 18, 2000 at 01:24:16PM +0800, Gunther Birznieks wrote:
> > If you aren't careful with your programming, an apache HTTPD can always
> > grow pretty quickly because Perl never releases the RAM it allocates
> > previously. While it does that reference count garbage collection, that is
> > internal to the RAM that was allocated.
> >
> > Let's say you need to sort a record set returned from a DBI call in an
> > unusual perl-like way. If you do this "in memory", you need an array to
> > hold the entire recordset in memory at once. If you do this, though, you
> > will allocate the RAM for that one request that sorted the array and then
> > the HTTPD will remain that size forever.
> >
> > Keeping the higher RAM allocation is good for performance if you have the
> > RAM of course. So this is one of those design tradeoffs. And Perl was not
> > really written to be a persistent language, so again, the tradeoff of
> > operational speed seems to make sense versus persistent memory usage.
> >
> > Later,
> >    Gunther
> >
>
>Gunther,
>
>Curiosity leads me to the following question...:
>
>So what your talking about is lets say a variable becomes 40k large,
>or bigger.  Since we're talking about a pretty big operation we could
>even be talking in terms of several 100k, but anyway:
>
>That variable would retain it's size throughout the persistence of the
>perl interpretor, correct?  And that memory would be specific to that
>variable?  Hm.. okay, that's where I was getting messed up.  The
>variables value is lost after the block end, but it's size is never
>realloc'd down to something more appropriate?  That's an interesting
>problem in and of itself.  So if you were to do something like this:
>
>$i=20;
>$bigvar="something thats 40k long";
>somememoryhog($bigvar);
>sub somememoryhog {
>         my $var=shift;
>         somemoryhog($var) if($i-->=0);
>}
>
>It would call some memory hog 20 times, each time it would copy the
>value of $bigvar onto the next level down of the recursive stack of
>somememoryhog.  The total memory usage would be 20*40k=800k, and it
>would never re allocate that variable down to a reasonable size?
>That's the behaviour I thought that would happen, but I was thinking
>the value would be retained through the stack (clearly my error).
>(Okay, so sue me it would call somememory hog more than 20 times, I'm
>just trying to clear up something :->)
>
>Thanks,
>Shane.

__________________________________________________
Gunther Birznieks (gunther.birznieks@extropia.com)
Extropia - The Web Technology Company
http://www.extropia.com/

Re: Modperl/Apache deficiencies... Memory usage.

Posted by sh...@isupportlive.com.

On Tue, Apr 18, 2000 at 02:07:24AM -0400, Jeff Stuart wrote:
> I understand that.  :)  And that was something that I had to learn myself.
> :)  It's a BAD thing when suddenly your httpd process takes up 100 MB.  :)
> It's just that it sounded like Shane was saying that his httpds were
> starting OUT at 4 to 6 MB.  That sounded a little unusual to me but then
> again, I've pared down my httpd config so that I don't have things in that I
> don't need.
> 
> I'm just curious as to what he has in there.
> 
> --
> Jeff Stuart
> jstuart@ohio.risci2.net

Well, the machine I took the estimates off of was a dev machine.
Which means that the key is *capability* not so much ability to serve
up lots of requests.  Your right, if I were to re-hash my configs to
be something other than a dev box I could serve a lot more hits and
have a smaller apache mem usage.  The other box I was talking about is
pared down, but I obviously can't restart a clients machine at will
just to do a calculation for the mod_perl list :-).

In direct response to your question though :)... I have about 20
modules compiled in.  Not to mention the modules that I'm loading...,
but here's the compiled in list for your curiousity:

Compiled-in modules:
  http_core.c
  mod_env.c
  mod_log_config.c
  mod_mime.c
  mod_negotiation.c
  mod_status.c
  mod_include.c
  mod_autoindex.c
  mod_dir.c
  mod_cgi.c
  mod_asis.c
  mod_imap.c
  mod_actions.c
  mod_userdir.c
  mod_alias.c
  mod_access.c
  mod_auth.c
  mod_setenvif.c
  mod_jserv.c

Thanks,
Shane.

Re: Modperl/Apache deficiencies... Memory usage.

Posted by sh...@isupportlive.com.

On Tue, Apr 18, 2000 at 01:24:16PM +0800, Gunther Birznieks wrote:
> If you aren't careful with your programming, an apache HTTPD can always 
> grow pretty quickly because Perl never releases the RAM it allocates 
> previously. While it does that reference count garbage collection, that is 
> internal to the RAM that was allocated.
> 
> Let's say you need to sort a record set returned from a DBI call in an 
> unusual perl-like way. If you do this "in memory", you need an array to 
> hold the entire recordset in memory at once. If you do this, though, you 
> will allocate the RAM for that one request that sorted the array and then 
> the HTTPD will remain that size forever.
> 
> Keeping the higher RAM allocation is good for performance if you have the 
> RAM of course. So this is one of those design tradeoffs. And Perl was not 
> really written to be a persistent language, so again, the tradeoff of 
> operational speed seems to make sense versus persistent memory usage.
> 
> Later,
>    Gunther
> 

Gunther,

Curiosity leads me to the following question...:

So what your talking about is lets say a variable becomes 40k large,
or bigger.  Since we're talking about a pretty big operation we could
even be talking in terms of several 100k, but anyway:

That variable would retain it's size throughout the persistence of the
perl interpretor, correct?  And that memory would be specific to that
variable?  Hm.. okay, that's where I was getting messed up.  The
variables value is lost after the block end, but it's size is never
realloc'd down to something more appropriate?  That's an interesting
problem in and of itself.  So if you were to do something like this:

$i=20;
$bigvar="something thats 40k long";
somememoryhog($bigvar);
sub somememoryhog {
	my $var=shift;
	somemoryhog($var) if($i-->=0);
}

It would call some memory hog 20 times, each time it would copy the
value of $bigvar onto the next level down of the recursive stack of
somememoryhog.  The total memory usage would be 20*40k=800k, and it
would never re allocate that variable down to a reasonable size?
That's the behaviour I thought that would happen, but I was thinking
the value would be retained through the stack (clearly my error).
(Okay, so sue me it would call somememory hog more than 20 times, I'm
just trying to clear up something :->)

Thanks,
Shane.

RE: Modperl/Apache deficiencies... Memory usage.

Posted by Jeff Stuart <js...@ohio.risci2.net>.

I understand that.  :)  And that was something that I had to learn myself.
:)  It's a BAD thing when suddenly your httpd process takes up 100 MB.  :)
It's just that it sounded like Shane was saying that his httpds were
starting OUT at 4 to 6 MB.  That sounded a little unusual to me but then
again, I've pared down my httpd config so that I don't have things in that I
don't need.

I'm just curious as to what he has in there.

--
Jeff Stuart
jstuart@ohio.risci2.net

-----Original Message-----
From: Gunther Birznieks [mailto:gunther@extropia.com]
Sent: Tuesday, April 18, 2000 1:24 AM
To: modperl@apache.org
Subject: RE: Modperl/Apache deficiencies... Memory usage.

If you aren't careful with your programming, an apache HTTPD can always
grow pretty quickly because Perl never releases the RAM it allocates
previously. While it does that reference count garbage collection, that is
internal to the RAM that was allocated.

Let's say you need to sort a record set returned from a DBI call in an
unusual perl-like way. If you do this "in memory", you need an array to
hold the entire recordset in memory at once. If you do this, though, you
will allocate the RAM for that one request that sorted the array and then
the HTTPD will remain that size forever.

Keeping the higher RAM allocation is good for performance if you have the
RAM of course. So this is one of those design tradeoffs. And Perl was not
really written to be a persistent language, so again, the tradeoff of
operational speed seems to make sense versus persistent memory usage.

Later,
   Gunther

At 12:25 AM 4/18/00 -0400, Jeff Stuart wrote:
>Shane, question for you.  No offense intended here at all but what do you
>have in your apache servers (other than mod_perl) that use 4 to 6 MB?  I've
>got one server that I'm working on that handles close 1 Mil hits per day
>than runs WITH mod_perl that uses 4 to 6 MB.  ;-)  Without mod_perl, it
>takes up around 500 to 800 KB.   Now on another server my mod_perl server
>uses about 13 Mb per but it's my devel machine so I've got a lot of stuff
>loaded that I wouldn't have in a production server.
>
>--
>Jeff Stuart
>jstuart@ohio.risci2.net
>
>-----Original Message-----
>From: shane@isupportlive.com [mailto:shane@isupportlive.com]
>Sent: Saturday, April 15, 2000 6:46 PM
>To: Perrin Harkins
>Cc: modperl@apache.org
>Subject: Re: Modperl/Apache deficiencies... Memory usage.
>
>Your apache processes would be the size of a stock
>apache process, like 4-6M or so, and you would have 1 process that
>would be 25MB or so that would have all your registry in it.

__________________________________________________
Gunther Birznieks (gunther.birznieks@extropia.com)
Extropia - The Web Technology Company
http://www.extropia.com/

RE: Modperl/Apache deficiencies... Memory usage.

Posted by Gunther Birznieks <gu...@extropia.com>.

If you aren't careful with your programming, an apache HTTPD can always 
grow pretty quickly because Perl never releases the RAM it allocates 
previously. While it does that reference count garbage collection, that is 
internal to the RAM that was allocated.

Let's say you need to sort a record set returned from a DBI call in an 
unusual perl-like way. If you do this "in memory", you need an array to 
hold the entire recordset in memory at once. If you do this, though, you 
will allocate the RAM for that one request that sorted the array and then 
the HTTPD will remain that size forever.

Keeping the higher RAM allocation is good for performance if you have the 
RAM of course. So this is one of those design tradeoffs. And Perl was not 
really written to be a persistent language, so again, the tradeoff of 
operational speed seems to make sense versus persistent memory usage.

Later,
   Gunther

At 12:25 AM 4/18/00 -0400, Jeff Stuart wrote:
>Shane, question for you.  No offense intended here at all but what do you
>have in your apache servers (other than mod_perl) that use 4 to 6 MB?  I've
>got one server that I'm working on that handles close 1 Mil hits per day
>than runs WITH mod_perl that uses 4 to 6 MB.  ;-)  Without mod_perl, it
>takes up around 500 to 800 KB.   Now on another server my mod_perl server
>uses about 13 Mb per but it's my devel machine so I've got a lot of stuff
>loaded that I wouldn't have in a production server.
>
>--
>Jeff Stuart
>jstuart@ohio.risci2.net
>
>-----Original Message-----
>From: shane@isupportlive.com [mailto:shane@isupportlive.com]
>Sent: Saturday, April 15, 2000 6:46 PM
>To: Perrin Harkins
>Cc: modperl@apache.org
>Subject: Re: Modperl/Apache deficiencies... Memory usage.
>
>Your apache processes would be the size of a stock
>apache process, like 4-6M or so, and you would have 1 process that
>would be 25MB or so that would have all your registry in it.

__________________________________________________
Gunther Birznieks (gunther.birznieks@extropia.com)
Extropia - The Web Technology Company
http://www.extropia.com/

Re: Apache::AuthCookie or Apache::AuthDBI or Apache::???

Posted by Vivek Khera <kh...@kciLink.com>.

>>>>> "KF" == Kenneth Frankel <ke...@atsbank.com> writes:

KF> What's the best way to authenticate users?  I have a site where the entire 
KF> site is to be protected. I want to log users in at the front of the web 

For a site whose contents are entirely protected, I'd use basic auth
with a cookie override.  That's what I've done in the past.  Neither
of these require perl or mod_perl, though.

See the apache module registry at www.apache.org for references to my
mod_auth_cookie which tricks Apache into converting a cookie into a
basic auth header.  How you set the cookie is up to you...

How you authenticate depends mostly on your needs of maintaining the
database.  I've used flat files with htpasswd, dbm files with htpasswd
and my own home brew scripts, and MySQL tables with my own scripts.

None of these require mod_perl, either, but you can use mod_perl based
versions of the necessary authentication modules.

Apache::AuthCookie or Apache::AuthDBI or Apache::???

Posted by Kenneth Frankel <ke...@atsbank.com>.

What's the best way to authenticate users?  I have a site where the entire 
site is to be protected. I want to log users in at the front of the web 
site, and keep them logged in as they travel around.   I was trying to get 
AuthCookie to work but haven't been successful so far.  Should I continue 
down this route?  Is Apache::Session + AuthBasic better?  Or is AuthDBI?  I 
have a mysql database handy.  What's the most popular Auth method 
nowadays?  Is there a popularity/usage chart compiled anywhere?

Thanks in advance!

Kenneth

Re: Modperl/Apache deficiencies... Memory usage.

Posted by sh...@isupportlive.com.

You're right.  I am mistaken :-(.  Just tested it, and it was
something silly in an old script I had lying around that I thought was
a bug... my mistake.  (Note to self: Test all examples before
posting... or you look like an idiot :-) )

Sorry,
Shane.

> I think you're mistaken. Try the following:
> 
> package My::Test;
> 
> sub new {
>   return bless {}, shift;
> }
> sub DESTROY {
>   warn "destroyed";
> }
> sub test {
>   my $object = new My::Test;
>   print ref $object, "\n";
>   # object will get destroyed when it goes out of scope (now)
> }
> 
> for (1..10) {
>   warn "t $_\n";
>   test();
> }
> 
> __END__
> 
> Your second example doesn't do what I think you were expecting.
> 
> Jim

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Jim Winstead <ji...@trainedmonkey.com>.

On Apr 18, shane@isupportlive.com wrote:
> On Mon, Apr 17, 2000 at 11:12:24AM -0700, Perrin Harkins wrote:
> > shane@isupportlive.com wrote:
> > > Now with modperl the Perl garbage collector is
> > > NEVER used.  Because the reference count of those variables is never
> > > decremented... it's because it's all in the registry, and it's hard to
> > > tell... hmm... what should I throw away, and what should I keep? ;-).
> > 
> > What I know about Perl internals could fit on the head of a pin, but
> > this strikes me as a very odd statement.  If the garbage collector is
> > never used, why do my lexical variables go out of scope and get
> > destroyed?  There are mod_perl packages like Apache::Session that
> > absolutely depend on garbage collection of lexical variables to work. 
> > Are you saying that destroying the variables and actually reclaiming the
> > memory are separate, and only the first is happening?
> 
> Go out of scope, yes.  Destroyed, no.  Want to test?  No problem.  Do
> the following in a perl script.

I think you're mistaken. Try the following:

package My::Test;

sub new {
  return bless {}, shift;
}
sub DESTROY {
  warn "destroyed";
}
sub test {
  my $object = new My::Test;
  print ref $object, "\n";
  # object will get destroyed when it goes out of scope (now)
}

for (1..10) {
  warn "t $_\n";
  test();
}

__END__

Your second example doesn't do what I think you were expecting.

Jim

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Perrin Harkins <pe...@primenet.com>.

shane@isupportlive.com wrote:
> my $i=0;
> dosomefunnierstuff();
> sub dosomefunnierstuff {
>         my $funnierstuff;
>         if($funnierstuff=~/funnier/) {
>                 dosomefunnierstuff();
>         } else {
>                 $funnierstuff="funnier".$i++;
>         }
>         print "Funnierstuff is $funnierstuff\n";
> }
> 
> That proves the point a bit more clearly.  It will show that each
> layer of the "stack" keeps its own discreet copy of the variable.

Oh, I see what you're talking about.  That's a closure.  It's a language
feature, so changing that behavior would be significant.  This shouldn't
be a problem if you simply avoid using closures in a recursive
algorithm.  In your example, I believe only the value of $i will be
saved each time, since $funnierstuff will go out of scope at the end of
the block and get garbage collected.

- Perrin

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Jim Winstead <ji...@trainedmonkey.com>.

I get it. You're talking about Apache::Registry scripts.

http://perl.apache.org/guide/perl.html#my_Scoped_Variable_in_Nested_S

Jim

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Matt Sergeant <ma...@sergeant.org>.

On Tue, 18 Apr 2000 shane@isupportlive.com wrote:

> On Mon, Apr 17, 2000 at 01:33:08PM -0600, Jason Terry wrote:
> > This is the first i have seen "delete" referenced.  What does it do?  How is it used?
> > 
> > Thank you
> >       -Jason
> 
> It's the stack cleaner... but it only works for scalars, and maybe
> arrays, but its better for arrays and hashes to do the following at
> the bottom of each code block in registry scripts.  (True good note)
> 
> @somearray = ();
> %somehash =();
> delete $somescalar;
> (and don't forget untie :-> )
> 
> Basically it will clean up *most* of the memory taken up by these
> variables.

Ouch - maybe you should read a perl book before trying to re-implement
mod_perl more efficiently ;-)

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Ken Williams <ke...@forum.swarthmore.edu>.

shane@isupportlive.com wrote:
>On Mon, Apr 17, 2000 at 01:33:08PM -0600, Jason Terry wrote:
>> This is the first i have seen "delete" referenced.  What does it do? 
>> How is it used?
>> 
>> Thank you
>>       -Jason
>
>It's the stack cleaner... but it only works for scalars, and maybe
>arrays, but its better for arrays and hashes to do the following at
>the bottom of each code block in registry scripts.  (True good note)
>
>@somearray = ();
>%somehash =();
>delete $somescalar;
>(and don't forget untie :-> )
>
>Basically it will clean up *most* of the memory taken up by these
>variables.

There are some serious misconceptions at work here.  First of all, delete()
can't be used on a scalar.  I think you're thinking of undef().  And second,
the memory de-allocation is very good, and very precise, and doesn't work any
differently under mod_perl than it does under regular Perl.  Why?  Because
mod_perl *IS* regular Perl.  It's simply a Perl interpreter that lasts a long
time.

Code run under Apache::Registry has some additional things it needs to
consider, since the entire "CGI" will be wrapped up inside a subroutine,
creating trapped lexicals (the guide is actually incorrect in calling them
closures).  But the guide suggests several workarounds, all of which will get
the job done.

[ By the way, Stas - is there a CVS version of the guide that I can make
patches against?  I found a few inaccuracies. ]

Well-written code will get its variables cleaned up exactly when and how it
wants it cleaned up.  This is true under mod_perl or standalone Perl. 
Sometimes you have to be a little more careful under mod_perl, especially using
Registry, because of the persistence & subroutine-wrapping issues.  But try
this [simplified] version of one of your examples:

     my $i=0;
     dosomefunnierstuff($i);
     sub dosomefunnierstuff {
       print "Funnierstuff is ", $_[0]++, "\n";
     }

Most good coders would argue that functions that access variables that
aren't explicitly passed to them are best avoided anyway.  This is just
another reason that's true (or another version of the same reasons).

Of course, since the strange behavior you observed is fully expected
once you understand the problem, slick (and disliked =) coders can feel
free to take advantage of it for their own evil purposes.

  -------------------                            -------------------
  Ken Williams                             Last Bastion of Euclidity
  ken@forum.swarthmore.edu                            The Math Forum

Re: Modperl/Apache deficiencies... Memory usage.

Posted by sh...@isupportlive.com.

On Mon, Apr 17, 2000 at 01:33:08PM -0600, Jason Terry wrote:
> This is the first i have seen "delete" referenced.  What does it do?  How is it used?
> 
> Thank you
>       -Jason

It's the stack cleaner... but it only works for scalars, and maybe
arrays, but its better for arrays and hashes to do the following at
the bottom of each code block in registry scripts.  (True good note)

@somearray = ();
%somehash =();
delete $somescalar;
(and don't forget untie :-> )

Basically it will clean up *most* of the memory taken up by these
variables.

Thanks,
Shane.

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Jason Terry <jt...@g-web.net>.

This is the first i have seen "delete" referenced.  What does it do?  How is it used?

Thank you
      -Jason
----- Original Message ----- 
From: <sh...@isupportlive.com>
To: <mo...@apache.org>
Sent: Monday, April 17, 2000 9:32 PM
Subject: Re: Modperl/Apache deficiencies... Memory usage.


> > 
> > Go out of scope, yes.  Destroyed, no.  Want to test?  No problem.  Do
> > the following in a perl script.
> > 
> > my($funnything);
> > print"Value of funnything is $funnything";
> > $funnything="Uh oh... check this out";
> > 
> > You'll find some interesting results on your second interation :-).
> > Even funnier might be the folowing...
> > 
> > dosomefunnystuff();
> > sub dosomefunnystuff {
> > my($funnystuff);
> > if($funnystuff eq "funny") {
> > dosomefunnystuff();
> > }
> > print "Funnystuff is $funnystuff";
> > $funnystuff="funny";
> > }
> > 
> > Try that, and you will truely find out how memory inefficient modperl
> > is :-).  I haven't tested that second one, but based on what I know
> > about how perl works... it should prove... interesting.
> > 
> > Thanks,
> > Shane.
> 
> Quick note about this:  You'll have to hit the same process, so you
> might have to reload a couple times for the effect to hit.  Also
> something maybe even funner...
> my $i=0;
> dosomefunnierstuff();
> sub dosomefunnierstuff {
> my $funnierstuff;
> if($funnierstuff=~/funnier/) {
> dosomefunnierstuff();
> } else {
> $funnierstuff="funnier".$i++;
> }
> print "Funnierstuff is $funnierstuff\n";
> }
> 
> That proves the point a bit more clearly.  It will show that each
> layer of the "stack" keeps its own discreet copy of the variable.
> That's why I've said before recursion!=good for modperl.
> Personally... I LOVE recursive algorithms... but it just doesn't make
> sense in a mod_perl enviro.  If you do use recursion and have large
> variables of strings for instance... you should pass by reference when
> possible.., and you use "delete" at the bottom of the code block when
> possible.
> 
> Thanks,
> Shane.
> 
> > 
> > > 
> > > - Perrin

Re: Modperl/Apache deficiencies... Memory usage.

Posted by sh...@isupportlive.com.

> 
> Go out of scope, yes.  Destroyed, no.  Want to test?  No problem.  Do
> the following in a perl script.
> 
> my($funnything);
> print"Value of funnything is $funnything";
> $funnything="Uh oh... check this out";
> 
> You'll find some interesting results on your second interation :-).
> Even funnier might be the folowing...
> 
> dosomefunnystuff();
> sub dosomefunnystuff {
> 	my($funnystuff);
> 	if($funnystuff eq "funny") {
> 		dosomefunnystuff();
> 	}
> 	print "Funnystuff is $funnystuff";
> 	$funnystuff="funny";
> }
> 
> Try that, and you will truely find out how memory inefficient modperl
> is :-).  I haven't tested that second one, but based on what I know
> about how perl works... it should prove... interesting.
> 
> Thanks,
> Shane.

Quick note about this:  You'll have to hit the same process, so you
might have to reload a couple times for the effect to hit.  Also
something maybe even funner...
my $i=0;
dosomefunnierstuff();
sub dosomefunnierstuff {
	my $funnierstuff;
	if($funnierstuff=~/funnier/) {
		dosomefunnierstuff();
	} else {
		$funnierstuff="funnier".$i++;
	}
	print "Funnierstuff is $funnierstuff\n";
}

That proves the point a bit more clearly.  It will show that each
layer of the "stack" keeps its own discreet copy of the variable.
That's why I've said before recursion!=good for modperl.
Personally... I LOVE recursive algorithms... but it just doesn't make
sense in a mod_perl enviro.  If you do use recursion and have large
variables of strings for instance... you should pass by reference when
possible.., and you use "delete" at the bottom of the code block when
possible.

Thanks,
Shane.

> 
> > 
> > - Perrin

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Autarch <au...@urth.org>.

On Tue, 18 Apr 2000 shane@isupportlive.com wrote:

> Go out of scope, yes.  Destroyed, no.  Want to test?  No problem.  Do
> the following in a perl script.
> 
> my($funnything);
> print"Value of funnything is $funnything";
> $funnything="Uh oh... check this out";

This only happens with Apache::Registry and is documented in the
guide.  If you were to do that inside a handler it would work just
fine.  The Apache::Registry problem is a side effect of the magic that
goes into making it work, but is not surprising and is the expected
behavior.

If performance and memory are a concern you should probably prefer
handlers anyway.

-dave

/*==================
www.urth.org
We await the New Sun
==================*/

Re: Modperl/Apache deficiencies... Memory usage.

Posted by sh...@isupportlive.com.

On Mon, Apr 17, 2000 at 11:12:24AM -0700, Perrin Harkins wrote:
> shane@isupportlive.com wrote:
> > Now with modperl the Perl garbage collector is
> > NEVER used.  Because the reference count of those variables is never
> > decremented... it's because it's all in the registry, and it's hard to
> > tell... hmm... what should I throw away, and what should I keep? ;-).
> 
> What I know about Perl internals could fit on the head of a pin, but
> this strikes me as a very odd statement.  If the garbage collector is
> never used, why do my lexical variables go out of scope and get
> destroyed?  There are mod_perl packages like Apache::Session that
> absolutely depend on garbage collection of lexical variables to work. 
> Are you saying that destroying the variables and actually reclaiming the
> memory are separate, and only the first is happening?

Go out of scope, yes.  Destroyed, no.  Want to test?  No problem.  Do
the following in a perl script.

my($funnything);
print"Value of funnything is $funnything";
$funnything="Uh oh... check this out";

You'll find some interesting results on your second interation :-).
Even funnier might be the folowing...

dosomefunnystuff();
sub dosomefunnystuff {
	my($funnystuff);
	if($funnystuff eq "funny") {
		dosomefunnystuff();
	}
	print "Funnystuff is $funnystuff";
	$funnystuff="funny";
}

Try that, and you will truely find out how memory inefficient modperl
is :-).  I haven't tested that second one, but based on what I know
about how perl works... it should prove... interesting.

Thanks,
Shane.

> 
> - Perrin

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Doug MacEachern <do...@covalent.net>.

On Mon, 17 Apr 2000, Perrin Harkins wrote:

> What I know about Perl internals could fit on the head of a pin, but
> this strikes me as a very odd statement.  If the garbage collector is
> never used, why do my lexical variables go out of scope and get
> destroyed?  There are mod_perl packages like Apache::Session that
> absolutely depend on garbage collection of lexical variables to work. 
> Are you saying that destroying the variables and actually reclaiming the
> memory are separate, and only the first is happening?

excactly, lexicals "go out-of-scope", but most of the memory allocations
are not freed unless you explicitly undef.  e.g.:

sub foo {
    my $string = shift;
    print $string;
}

foo("hello world");

after the subroutine is called,  the SvPVX field of $string hangs onto the
allocated length("hello world") + 1, unless you undef $string;

this is an optimization, Perl assumes if you've allocated for a variable
once, you'll need to do it again, making the second assignment+ much
cheaper.  the elements of lexical arrays and hashes are released (well,
the refcnt is decremented, which triggers free if the refcnt was 1) when
one goes of-of-scope, but the size of the array/hash remains for the same 
reasons.  this is another reason it's always best to pass a reference when
possible, for $strings, @arrays and %hashes, to avoid these copies.

a the B::Size B::LexInfo packages on cpan were written to illustrate this,
though one or both needs fixing for 5.6.0, which i'll get to soon.

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Perrin Harkins <pe...@primenet.com>.

shane@isupportlive.com wrote:
> Now with modperl the Perl garbage collector is
> NEVER used.  Because the reference count of those variables is never
> decremented... it's because it's all in the registry, and it's hard to
> tell... hmm... what should I throw away, and what should I keep? ;-).

What I know about Perl internals could fit on the head of a pin, but
this strikes me as a very odd statement.  If the garbage collector is
never used, why do my lexical variables go out of scope and get
destroyed?  There are mod_perl packages like Apache::Session that
absolutely depend on garbage collection of lexical variables to work. 
Are you saying that destroying the variables and actually reclaiming the
memory are separate, and only the first is happening?

- Perrin

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Gunther Birznieks <gu...@extropia.com>.

While I agree that a true garbage collector would be cool. I wonder what 
the utility would really be when you would primarily want it in mod_perl 
type stuff. Yet, mod_perl is also great because of speed. One of the nice 
things about Perl right now is that it is fast and that is partially due to 
the reference counting instead of running garbage collection algorithms.

In addition, generally for those where speed is a huge huge concern, RAM 
and h/w is cheap relative to those concerns. So I am not sure how worth it, 
it really would be in the end. Likely it would also require quite a bit of 
Perl internals mucking about, so a g-coll routine would take a long time to 
debug and the bugs themselves would be subtle and extremely difficult to 
track down. So I would not see a lot of people being interested in testing 
something that is hard to debug down for a benefit that no one has really 
cried out for (until now?).

Anyway, I'm sorry to naysay this because I think a new garbage collection 
model would be interesting, but I am just wondering about the utility of it 
in the end (versus improving other parts of Perl/mod_perl).

eg the idea that I think you proposed last week to have a detached Perl 
process with multiple Perl objects (with threaded access to them) attached 
to apache servers via socket or IPC communication would have more practical 
uses that I could see and would be easier to get out than a new garbage 
collection algorithm.

Later,
   Gunther

Re: Modperl/Apache deficiencies... Memory usage.

Posted by sh...@isupportlive.com.

Gunther-  (What follows is some servlet talk... and anyone interested
in a mod_perl garbage collector?)
> If you want the ultimate in clean models, you may want to consider coding 
> in Java Servlets. It tends to be longer to write Java than Perl, but it's 
> much cleaner as all memory is shared and thread-pooling libraries do exist 
> to restrict 1-thread (or few threads) per CPU (or the request is blocked) 
> type of situation.

This is sorta OT in this present thread... however.  I'm not so sure
about your statement that Java apps take longer.  My experience is the
opposite.  (And I have quite a lot of experience in writing servlets)
However, your next comment hits the nail on the head... buggy, and one
script blown kills the entire engine.  Not only that, but if memory
usage spikes over the memory setting you used when starting the engine
the whole thing blows up in your face.  (Basically new instances of
requests for servlets get messed up)  The whole memory issue with java
is almost worst... although with competent admining it can be "not too
bad".  Then of course theres the speed issue with java... though
TowerJ would fix this up nicely :-).

> 
> However, I would stress that speed/threading is not the only name in the 
> game. Reliability is another concern. Frankly, I have seen a lot of buggy 
> servlets crash the entire servlet engine making a web site useless. 
> Generally if there is a core dump in an Apache/Mod_perl process, the worst 
> is that one request got hosed.

Gotta love that! :).  Well... I've seen a little worse, but not much.
Doesn't touch the problems you can have with JServ and kin.  Once I
had a really mission critical JServ engine die, and went on a
vacation... it didn't get back up until I got back.  This put JServ on
the non-use place in my book.  You really need 24hr
staffing/notification to use a servlet engine... major bummer.

> 
> I am resigned to the fact that all languages are buggy, and so I like 
> engineering architectures that support buggy languages (as well as buggy 
> app code).

Hehe... the essential problem with modperl comes down to this...
Garbage collector?...???  Nope... it's not there.  I've written a real
basic garbage collector that might be able to be adapted to modperl.
Okay... now here goes:

The problem...: The way the perl garbage collector works is there
is a reference count for each variable on the stack.  There are
different types of variables, but basically two types: "my'd"
lexicals, and those that are not :-).  (There is the local() stuff
too, but that's just stack manipulation of non-my'd type)

Basically within the structure of mod_perl almost every variable you
actually use is of the lexical variety.  Lexical variables are stored
within the CV of whatever package your working with.  There is an AV
(Array value) of AV's within that CV that stores all the lexicals.
The first AV within the variable AV stores the name of the variable.
The rest of the AV's under that store the variable value for various
levels of recursion.  Now with modperl the Perl garbage collector is
NEVER used.  Because the reference count of those variables is never
decremented... it's because it's all in the registry, and it's hard to
tell... hmm... what should I throw away, and what should I keep? ;-).

This contributes to a good deal of speed... because you hardly EVER
have to create more instances of variables after running the program
for the first time.  It's all stored in memory... wow, that's
GREAT!... not.  Well, it is, but basically it sucks up tons and tons
of memory.  This is why that requiring the scripts before fork;ing can
be okay, but not great, because each process still has to store all
these stack variables again for every "package/script" within
mod_perl.  Now... I could see a couple solutions.

1) "Smartly" route script requests to the appropriate set of apache
instances.  So basically not every apache instance would have to store
the variable and code info for every script that has to be run.

2) Make a garbage collector for modperl.  This would be pretty cool...
but programmatically it's really really hard.  And there is no
guarantee that the whole variable design would change in the future
releases of perl.  I could see a garbage collector that killed the
values within the stack, and wiped out totally the existance of
recursive layers.  I'm fully onboard with submitting my code and
helping out if someone besides me is interested in this.  (I don't
have time to do it solo... besides I need someone to bounce ideas off
of.  This is a really hard thing to do, and it would be pretty easy to
break a lot of stuff in the process of building this.)  The use of a
garbage collector would have several benefits... it would get rid of
the "security hole" that I percieve in mod_perl.  It would also lower
the process overhead... and generally be really cool.

(BTW: If anyone that knows the internals of mod_perl is seriously
interested in working on a garbage colector, drop me an email and
we'll talk about it.  I'm pretty interested in doing this.  You don't
have to know a lot of perl internals, but you would have to understand
the c code in mod_perl fairly well.)

Or... do both! :).

Thanks,
Shane.

> 
> Later,
>     Gunther
> 
> __________________________________________________
> Gunther Birznieks (gunther.birznieks@extropia.com)
> Extropia - The Web Technology Company
> http://www.extropia.com/
>

Re: shrinking memory (was Re: Modperl/Apache deficiencies... Memory usage.)

Posted by Stas Bekman <sb...@stason.org>.

> On Tue, 18 Apr 2000, Stas Bekman wrote:
>  
> > What do you say? 1003520 bytes are returned to OS when @x goes out of
> > scope. Note that this doesn't happen if you use a global @x instead.
> 
> because under linux Perl defaults to system malloc:
> % perl -V:usemymalloc
> usemymalloc='n';
> 
> if usemymalloc='y', then Perl uses Perl's malloc, in which case memory is
> not returned to the os.

Does that mean that there is a reason to use the usemymalloc='n' to make
the memory footprint smaller? Or the realloc from the next invocation will
just waste the time, because the memory was freed?


______________________________________________________________________
Stas Bekman             | JAm_pH    --    Just Another mod_perl Hacker
http://stason.org/      | mod_perl Guide  http://perl.apache.org/guide 
mailto:stas@stason.org  | http://perl.org    http://stason.org/TULARC/
http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org
----------------------------------------------------------------------

Re: shrinking memory (was Re: Modperl/Apache deficiencies... Memory usage.)

Posted by Doug MacEachern <do...@covalent.net>.

On Tue, 18 Apr 2000, Stas Bekman wrote:

> What do you say? 1003520 bytes are returned to OS when @x goes out of
> scope. Note that this doesn't happen if you use a global @x instead.

because under linux Perl defaults to system malloc:
% perl -V:usemymalloc
usemymalloc='n';

if usemymalloc='y', then Perl uses Perl's malloc, in which case memory is
not returned to the os.

shrinking memory (was Re: Modperl/Apache deficiencies... Memory usage.)

Posted by Stas Bekman <sb...@stason.org>.

Since you are talking about garbage collection (memory shrinking) you
might want to re-read the thread I've started back in Aug, 99:
http://forum.swarthmore.edu/epigone/modperl/zarwhegerd

It includes the real show case of memory shrinking (at least on Linux). 
Consider this code: 

-------------------------------------------------------
use GTop;
my $gtop = GTop->new;
print "Content-type: text/plain\n\n";

print "before       :", $gtop->proc_mem($$)->size,"\n";
{ push my @x, "A" x 1000000;
  print "in scope     :", $gtop->proc_mem($$)->size,"\n";
}
print "out of scope :", $gtop->proc_mem($$)->size,"\n";
-------------------------------------------------------

prints on a freshly started server:
before       :6111232
in scope     :8118272
out of scope :7114752

and on a second invocation:

before       :7118848
in scope     :8122368
out of scope :7118848

What do you say? 1003520 bytes are returned to OS when @x goes out of
scope. Note that this doesn't happen if you use a global @x instead.

But hey why did I need mod_perl for the test, stupid me :) Running the
above code as a Perl script from the command line gives:

before       :1527808
in scope     :3543040
out of scope :2539520

The machine is running linux x86 (RH6.1) kernel 2.2.12-20smp, perl5.005_03

______________________________________________________________________
Stas Bekman             | JAm_pH    --    Just Another mod_perl Hacker
http://stason.org/      | mod_perl Guide  http://perl.apache.org/guide 
mailto:stas@stason.org  | http://perl.org    http://stason.org/TULARC/
http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org
----------------------------------------------------------------------

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Gunther Birznieks <gu...@extropia.com>.

No. I have not worked with XSL processers. I tend to use XML SAX parsers as 
I know they are faster and deterministic-- specially Sun's XML SAX Parser.

It could be that the XSL processors are making use of a DOM based structure 
which is causing all sorts of references to other data structures, so Java 
gets more confused about garbage collecting not to mention the extra 
objects that are probably created as it tries to turn every node into a DOM 
object like one huge tree..

However, as with any server (including Apache) I would recommend coding in 
automatic restarts every so often as makes sense under real world load.

I have to say, I've had a lot better success with writing stable 
Proxy/Servers in Java than in Perl even using version 5.005 in Perl. WIth 
Apache, it's nice there is a built in apache server restart model. But if 
you code a pure server with PlRPC, you end up having to code that stuff 
yourself.... I have two sets of generic scripts that I *had* to use or my 
PlRPC server would die. One always makes sure that there is one PlRPC 
server attempting to start after the next one, and another that monitors 
whether the original PlRPC server accepts the connections.

Anyway, I am not saying that Apache/mod_perl or Perl is bad, but I do think 
as a programming language for multithreaded work, it really doesn't make a 
lot of sense (for myself) to force Perl or Java to do what it was not 
intended to do. eg I hate writing web apps in Java and I hate writing 
middleware servers in Perl. But that is my only architectural bias.

Later,
    Gunther

At 11:51 PM 4/16/00 -0500, Leslie Mikesell wrote:
>According to Gunther Birznieks:
>
> > If you want the ultimate in clean models, you may want to consider coding
> > in Java Servlets. It tends to be longer to write Java than Perl, but it's
> > much cleaner as all memory is shared and thread-pooling libraries do exist
> > to restrict 1-thread (or few threads) per CPU (or the request is blocked)
> > type of situation.
>
>Do you happen to know of anyone doing xml/xsl processing in
>servlets?  A programmer here has written some nice looking stuff
>but it appears that the JVM is never garbage-collecting and
>will just grow and get slower until someone restarts it.  I
>don't know enough java to tell if it is his code or the xslt
>classes that are causing it.
>
>Yes, I know this is off-topic for mod_perl except to point out
>that the clean java model isn't necessarily trouble free either.
>
>   Les Mikesell
>    les@mcs.com

__________________________________________________
Gunther Birznieks (gunther.birznieks@extropia.com)
Extropia - The Web Technology Company
http://www.extropia.com/

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Gunther Birznieks <gu...@extropia.com>.

At 11:52 PM 4/15/00 +0000, shane@isupportlive.com wrote:
>On Sun, Apr 16, 2000 at 09:28:56AM +0300, Stas Bekman wrote:
> > On Sat, 15 Apr 2000 shane@isupportlive.com wrote:
> >
> > > > > I wrote a very small perl engine
> > > > > for phhttpd that worked within it's threaded paradigm that sucked 
> up a
> > > > > neglibible amount of memory which used a very basic version of
> > > > > Apache's registry.
> > > >
> > > > Can you explain how this uses less memory than mod_perl doing the same
> > > > thing?  Was it just that you were using fewer perl 
> interpreters?  If so, you
> > > > need to improve your use of apache with a multi-server setup.  The 
> only way
> > > > I could see phttpd really using less memory to do the same work is 
> if you
> > > > somehow managed to get perl to share more of its internals in 
> memory.  Did
> > > > you?
> > >
> > > Yep very handily I might add ;-).  Basically phhttpd is not process
> > > based, it's threaded based.  Which means that everything is running
> > > inside of the same address space.  Which means 100% sharing except for
> > > the present local stack of variables... which is very minimal.  In
> > > terms of the perl thing... when you look at your processes and see all
> > > that non-shared memory, most of that is stack variables.  Now most
> > > webservers are running on single processor machines, so they get no
> > > benefit from having 10s or even 100s of copies of these perl stack
> > > variables.  Its much more efficient to have a single process handle
> > > all the perl requests.  On a multiprocessor box that single process
> > > could have multiple threads in order to take advantage of the
> > > processors.  See..., mod_perl stores the stack state of every script
> > > it runs in the apache process... for every script... copies of it,
> > > many many copies of it.  This is not efficient.  What would be
> > > efficient is to have as many threads/processes as you have processors
> > > for the mod_perl engine.  In other words seperate the engine from the
> > > apache process so that there is never unneccesary stack variables
> > > being tracked.
> >
> > I'm not sure you are right by claiming that the best performance will be
> > achieved when you have a single process/thread per given processor. This
> > would be true *only* if the nature of your code would be CPU bound.
> > Unfortunately there are various IO operations and communications with
> > other components like RDBMS engines, which in turn have their IO as well.
> > Given that your CPU is idle while the IO operation is under process, you
> > could use the CPU for processing another request at this time.
> >
> > Hmm, that's the whole point of the multi-process OS. Unless I
> > misunderstood your suggestion, what you are offering is a kinda DOS-like
> > OS where there is only one process that occupies CPU at any given time.
> > (well, assuming that the rest of the OS essential processes are running
> > somewhere too in a multi-processes environment.)
>
>That is an excellent point Stas.  One that I considered a while ago
>but sort of forgot about when I started this thread.  Hrmm... that
>brings up much more complex issues.  Yes, your right that is my
>assumption, and that's because that's the case I'm working under 90%
>of the time.  It's a horrible assumption though for the "community at
>large".  Hmm... well, you've stumped me... that's a very very clear
>problem with the design I had in mind.  There are ways around this I
>can see in my brain, but they are far from eloquent.  If something
>were blocking on a network read it would stop the WHOLE perl engine...
>TERRIBLE!!!!, not usefull at all for anyone that's going to be doing
>something like that.  Well, there must be a way around it... if anyone
>has any ideas please shoot them my way... this is a paradox of the nth
>order.  Actually it's a problem for mod_perl too..., but it's not
>nearly as large of a problem than for the design I had in mind.
>
>Congrats Stas... good thinking.
>Thanks,
>Shane.
>(DOSlike isn't fair though! :>.  Though I see your point... efficiency
>was the key element to what I was thinking, but I had mostly
>considered the CPU bound case... the network bound case hadn't really
>entered my mind.  The way around this is horribly yucky from a
>programatic point of view... e-gads!)

I guess that's why you would want to try different mixes of numbers of 
servers on the back-end to see which gives you the greatest performance. 
Jeffrey Baker also brings IO issues up in his ApacheDBI posts about why a 
single pooled connection of DBI handles is not so hot in the real world 
when compared against the single handle cached per apache process.

If you are CPU bound, then it may be just as well as to have a few servers 
chugging away. And limit the number that can be forked off on the back-end. 
If you are IO bound, then you would launch many more. However, the Apache 
model of restarts and forking is not entirely shabby. That fact is, that if 
you find some apache processes cannot fulfill the task, then another one 
can always be created even if you have set a general upper limit of 
mod_perl processes that are allowed to stick around. In other words, Apache 
can be set up to adapt to the load placed on it. And that should be "OK".

If you want the ultimate in clean models, you may want to consider coding 
in Java Servlets. It tends to be longer to write Java than Perl, but it's 
much cleaner as all memory is shared and thread-pooling libraries do exist 
to restrict 1-thread (or few threads) per CPU (or the request is blocked) 
type of situation.

However, I would stress that speed/threading is not the only name in the 
game. Reliability is another concern. Frankly, I have seen a lot of buggy 
servlets crash the entire servlet engine making a web site useless. 
Generally if there is a core dump in an Apache/Mod_perl process, the worst 
is that one request got hosed.

I am resigned to the fact that all languages are buggy, and so I like 
engineering architectures that support buggy languages (as well as buggy 
app code).

Later,
    Gunther

__________________________________________________
Gunther Birznieks (gunther.birznieks@extropia.com)
Extropia - The Web Technology Company
http://www.extropia.com/

Re: Modperl/Apache deficiencies... Memory usage.

Posted by sh...@isupportlive.com.

On Sun, Apr 16, 2000 at 09:28:56AM +0300, Stas Bekman wrote:
> On Sat, 15 Apr 2000 shane@isupportlive.com wrote:
> 
> > > > I wrote a very small perl engine
> > > > for phhttpd that worked within it's threaded paradigm that sucked up a
> > > > neglibible amount of memory which used a very basic version of
> > > > Apache's registry.
> > > 
> > > Can you explain how this uses less memory than mod_perl doing the same
> > > thing?  Was it just that you were using fewer perl interpreters?  If so, you
> > > need to improve your use of apache with a multi-server setup.  The only way
> > > I could see phttpd really using less memory to do the same work is if you
> > > somehow managed to get perl to share more of its internals in memory.  Did
> > > you?
> > 
> > Yep very handily I might add ;-).  Basically phhttpd is not process
> > based, it's threaded based.  Which means that everything is running
> > inside of the same address space.  Which means 100% sharing except for
> > the present local stack of variables... which is very minimal.  In
> > terms of the perl thing... when you look at your processes and see all
> > that non-shared memory, most of that is stack variables.  Now most
> > webservers are running on single processor machines, so they get no
> > benefit from having 10s or even 100s of copies of these perl stack
> > variables.  Its much more efficient to have a single process handle
> > all the perl requests.  On a multiprocessor box that single process
> > could have multiple threads in order to take advantage of the
> > processors.  See..., mod_perl stores the stack state of every script
> > it runs in the apache process... for every script... copies of it,
> > many many copies of it.  This is not efficient.  What would be
> > efficient is to have as many threads/processes as you have processors
> > for the mod_perl engine.  In other words seperate the engine from the
> > apache process so that there is never unneccesary stack variables
> > being tracked.
> 
> I'm not sure you are right by claiming that the best performance will be
> achieved when you have a single process/thread per given processor. This
> would be true *only* if the nature of your code would be CPU bound. 
> Unfortunately there are various IO operations and communications with
> other components like RDBMS engines, which in turn have their IO as well. 
> Given that your CPU is idle while the IO operation is under process, you
> could use the CPU for processing another request at this time. 
> 
> Hmm, that's the whole point of the multi-process OS. Unless I
> misunderstood your suggestion, what you are offering is a kinda DOS-like
> OS where there is only one process that occupies CPU at any given time.
> (well, assuming that the rest of the OS essential processes are running
> somewhere too in a multi-processes environment.)

That is an excellent point Stas.  One that I considered a while ago
but sort of forgot about when I started this thread.  Hrmm... that
brings up much more complex issues.  Yes, your right that is my
assumption, and that's because that's the case I'm working under 90%
of the time.  It's a horrible assumption though for the "community at
large".  Hmm... well, you've stumped me... that's a very very clear
problem with the design I had in mind.  There are ways around this I
can see in my brain, but they are far from eloquent.  If something
were blocking on a network read it would stop the WHOLE perl engine...
TERRIBLE!!!!, not usefull at all for anyone that's going to be doing
something like that.  Well, there must be a way around it... if anyone
has any ideas please shoot them my way... this is a paradox of the nth
order.  Actually it's a problem for mod_perl too..., but it's not
nearly as large of a problem than for the design I had in mind.

Congrats Stas... good thinking.
Thanks,
Shane.
(DOSlike isn't fair though! :>.  Though I see your point... efficiency
was the key element to what I was thinking, but I had mostly
considered the CPU bound case... the network bound case hadn't really
entered my mind.  The way around this is horribly yucky from a
programatic point of view... e-gads!)

> 
> ______________________________________________________________________
> Stas Bekman             | JAm_pH    --    Just Another mod_perl Hacker
> http://stason.org/      | mod_perl Guide  http://perl.apache.org/guide 
> mailto:stas@stason.org  | http://perl.org    http://stason.org/TULARC/
> http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org
> ----------------------------------------------------------------------
>

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Stas Bekman <sb...@stason.org>.

On Sat, 15 Apr 2000 shane@isupportlive.com wrote:

> > > I wrote a very small perl engine
> > > for phhttpd that worked within it's threaded paradigm that sucked up a
> > > neglibible amount of memory which used a very basic version of
> > > Apache's registry.
> > 
> > Can you explain how this uses less memory than mod_perl doing the same
> > thing?  Was it just that you were using fewer perl interpreters?  If so, you
> > need to improve your use of apache with a multi-server setup.  The only way
> > I could see phttpd really using less memory to do the same work is if you
> > somehow managed to get perl to share more of its internals in memory.  Did
> > you?
> 
> Yep very handily I might add ;-).  Basically phhttpd is not process
> based, it's threaded based.  Which means that everything is running
> inside of the same address space.  Which means 100% sharing except for
> the present local stack of variables... which is very minimal.  In
> terms of the perl thing... when you look at your processes and see all
> that non-shared memory, most of that is stack variables.  Now most
> webservers are running on single processor machines, so they get no
> benefit from having 10s or even 100s of copies of these perl stack
> variables.  Its much more efficient to have a single process handle
> all the perl requests.  On a multiprocessor box that single process
> could have multiple threads in order to take advantage of the
> processors.  See..., mod_perl stores the stack state of every script
> it runs in the apache process... for every script... copies of it,
> many many copies of it.  This is not efficient.  What would be
> efficient is to have as many threads/processes as you have processors
> for the mod_perl engine.  In other words seperate the engine from the
> apache process so that there is never unneccesary stack variables
> being tracked.

I'm not sure you are right by claiming that the best performance will be
achieved when you have a single process/thread per given processor. This
would be true *only* if the nature of your code would be CPU bound. 
Unfortunately there are various IO operations and communications with
other components like RDBMS engines, which in turn have their IO as well. 
Given that your CPU is idle while the IO operation is under process, you
could use the CPU for processing another request at this time. 

Hmm, that's the whole point of the multi-process OS. Unless I
misunderstood your suggestion, what you are offering is a kinda DOS-like
OS where there is only one process that occupies CPU at any given time.
(well, assuming that the rest of the OS essential processes are running
somewhere too in a multi-processes environment.)

______________________________________________________________________
Stas Bekman             | JAm_pH    --    Just Another mod_perl Hacker
http://stason.org/      | mod_perl Guide  http://perl.apache.org/guide 
mailto:stas@stason.org  | http://perl.org    http://stason.org/TULARC/
http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org
----------------------------------------------------------------------

RE: Modperl/Apache deficiencies... Memory usage.

Posted by Jeff Stuart <js...@ohio.risci2.net>.

Shane, question for you.  No offense intended here at all but what do you
have in your apache servers (other than mod_perl) that use 4 to 6 MB?  I've
got one server that I'm working on that handles close 1 Mil hits per day
than runs WITH mod_perl that uses 4 to 6 MB.  ;-)  Without mod_perl, it
takes up around 500 to 800 KB.   Now on another server my mod_perl server
uses about 13 Mb per but it's my devel machine so I've got a lot of stuff
loaded that I wouldn't have in a production server.

--
Jeff Stuart
jstuart@ohio.risci2.net

-----Original Message-----
From: shane@isupportlive.com [mailto:shane@isupportlive.com]
Sent: Saturday, April 15, 2000 6:46 PM
To: Perrin Harkins
Cc: modperl@apache.org
Subject: Re: Modperl/Apache deficiencies... Memory usage.

Your apache processes would be the size of a stock
apache process, like 4-6M or so, and you would have 1 process that
would be 25MB or so that would have all your registry in it.

Re: Modperl/Apache deficiencies... Memory usage.

Posted by sh...@isupportlive.com.

Perrin-
On Sat, Apr 15, 2000 at 11:33:15AM -0700, Perrin Harkins wrote:
> > Each process of apache has
> > it's registry which holds the compiled perl scripts in..., a copy of
> > each for each process.  This has become an issue for one of the
> > companies that I work for, and I noted from monitoring the list that
> > some people have apache processes that are upwards of 25Megs, which is
> > frankly ridiculous.
> 
> I have processes that large, but more than 50% of that is shared through
> copy-on-write.
> 
> > I wrote a very small perl engine
> > for phhttpd that worked within it's threaded paradigm that sucked up a
> > neglibible amount of memory which used a very basic version of
> > Apache's registry.
> 
> Can you explain how this uses less memory than mod_perl doing the same
> thing?  Was it just that you were using fewer perl interpreters?  If so, you
> need to improve your use of apache with a multi-server setup.  The only way
> I could see phttpd really using less memory to do the same work is if you
> somehow managed to get perl to share more of its internals in memory.  Did
> you?

Yep very handily I might add ;-).  Basically phhttpd is not process
based, it's threaded based.  Which means that everything is running
inside of the same address space.  Which means 100% sharing except for
the present local stack of variables... which is very minimal.  In
terms of the perl thing... when you look at your processes and see all
that non-shared memory, most of that is stack variables.  Now most
webservers are running on single processor machines, so they get no
benefit from having 10s or even 100s of copies of these perl stack
variables.  Its much more efficient to have a single process handle
all the perl requests.  On a multiprocessor box that single process
could have multiple threads in order to take advantage of the
processors.  See..., mod_perl stores the stack state of every script
it runs in the apache process... for every script... copies of it,
many many copies of it.  This is not efficient.  What would be
efficient is to have as many threads/processes as you have processors
for the mod_perl engine.  In other words seperate the engine from the
apache process so that there is never unneccesary stack variables
being tracked.

Hmm... can I explain this better.  Let me try.  Okay, for every apache
proccess there is an entire perl engine with all the stack variables
for every script you run recorded there.  What I'm proposing is a
system where by there would be a seperate process that would have only
a perl engine in it... you would make as many of these processes as
you have processors.  (Or multithread them... it doesn't really
matter)  Now your apache processes would not have a bunch of junk
memory in them.  Your apache processes would be the size of a stock
apache process, like 4-6M or so, and you would have 1 process that
would be 25MB or so that would have all your registry in it.  For a
high capacity box this would be an incredible boon to increasing
capacity.  (I'm trying to explain clearly, but I'd be the first to
admit this isn't one of my strong points)

As to how the multithreaded phhttpd can handle tons of load, well...
that's a seperate issue and frankly a question much better handled by
Zach.  I understand it very well, but I don't feel that I could
adequately explain it.  Its based on real time sig_queue software
technology... for a "decent" reference on this you can take a look at
a book by Oreily called "POSIX.4 Programming for the Real World".  I
should say that this book doesn't go into enough depth... but it's the
only book that goes into any depth that I could find.

> 
> > What I'm
> > thinking is essentially we take the perl engine which has the apache
> > registry and all the perl symbols etc., and seperate it into it's own
> > process which would could be multithreaded (via pthreads) for multiple
> > processor boxes.  (above 2 this would be beneficial probably)  On the
> > front side the apache module API would just connect into this other
> > process via shared memory pages (shmget et. al), or Unix pipes or
> > something like that.
> 
> This is how FastCGI, and all the Java servlet runners (JServ, Resin, etc.)
> work.  The thing is, even if you run the perl interpreters in a
> multi-threaded process, it still needs one interpreter per perl thread and I
> don't know how much you'd be able to share between them.  It might not be
> any smaller at all.

But there is no need to have more than one perl thread per processor.
Right now we have a perl "thread" (er.. engine is a better term) per
process.  Since most boxes start up 10 processes or so of Apache we'd
be talking about a memory savings something like this:
6MB stock apache process
25MB (we'll say that's average) mod_perl apache process 50% shared,
leaving 12.5 MB non shared
The way it works now: 12.5 * 10=125MB + 12.5 (shared bit one
instance)= 147.5 MB total.
Suggested way:
6MB stock with about 3MB shared or so.  3MB * 10=30 +25MB mod_perl
process = 55MB total.

That would be an overal difference of 147.5-55... almost 100 MB of
memory.  I have no idea how accurate this is, but I'd put my money on
not too far from the expected result in a high load enviro with lots
of apache scripts.

> 
> My suggestion would be to look at the two-server approach for mod_perl, and
> if that doesn't work for you look at FastCGI, and if that doesn't work for
> you join the effort to get mod_perl working on Apache 2.0 with a
> multi-threaded model.  Or just skip the preliminaries and go straight for
> the hack value...

Well... the second option certainly has a lot of merit.  Maybe I
should get involved in that... actually that has a lot of appeal to
me.  Hmm... I guess it's time to pick apache 2.0 stuff and do some
tinkering! :)  As far as the present problem... I'm not all that
concerned about it.  It actually falls outside of the area of my
responsibilities at our site..., I'm thinking for the other people in
the community mostly.

Thanks!
Shane
> 
> - Perrin
>

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Perrin Harkins <pe...@primenet.com>.

> Each process of apache has
> it's registry which holds the compiled perl scripts in..., a copy of
> each for each process.  This has become an issue for one of the
> companies that I work for, and I noted from monitoring the list that
> some people have apache processes that are upwards of 25Megs, which is
> frankly ridiculous.

I have processes that large, but more than 50% of that is shared through
copy-on-write.

> I wrote a very small perl engine
> for phhttpd that worked within it's threaded paradigm that sucked up a
> neglibible amount of memory which used a very basic version of
> Apache's registry.

Can you explain how this uses less memory than mod_perl doing the same
thing?  Was it just that you were using fewer perl interpreters?  If so, you
need to improve your use of apache with a multi-server setup.  The only way
I could see phttpd really using less memory to do the same work is if you
somehow managed to get perl to share more of its internals in memory.  Did
you?

> What I'm
> thinking is essentially we take the perl engine which has the apache
> registry and all the perl symbols etc., and seperate it into it's own
> process which would could be multithreaded (via pthreads) for multiple
> processor boxes.  (above 2 this would be beneficial probably)  On the
> front side the apache module API would just connect into this other
> process via shared memory pages (shmget et. al), or Unix pipes or
> something like that.

This is how FastCGI, and all the Java servlet runners (JServ, Resin, etc.)
work.  The thing is, even if you run the perl interpreters in a
multi-threaded process, it still needs one interpreter per perl thread and I
don't know how much you'd be able to share between them.  It might not be
any smaller at all.

My suggestion would be to look at the two-server approach for mod_perl, and
if that doesn't work for you look at FastCGI, and if that doesn't work for
you join the effort to get mod_perl working on Apache 2.0 with a
multi-threaded model.  Or just skip the preliminaries and go straight for
the hack value...

- Perrin

Re: Modperl/Apache deficiencies... Memory usage.y

Posted by sh...@isupportlive.com.

On Sat, Apr 15, 2000 at 01:39:38PM -0500, Leslie Mikesell wrote:
> According to shane@isupportlive.com:
> 
> > Does anyone know of any program which has been developed like this?
> > Basically we'd be turning the "module of apache" portion of mod_perl
> > into a front end to the "application server" portion of mod_perl that
> > would do the actual processing.
> 
> This is basically what you get with the 'two-apache' mode.

To be frank... it's not.  Not even close.  Especially in the case that
the present site I'm working on where they have certain boxes for
dynamic, others for static.  This is usefull when you have one box
running dynamic/static requests..., but it's not a solution, it's a
work around.  (I should say we're moving to have some boxes static
some dynamic... at present it's all jumbled up ;-()

> 
> > It seems quite logical that something
> > like this would have been developed, but possibly not.  The seperation
> > of the two components seems like it should be done, but there must be
> > a reason why no one has done it yet... I'm afraid this reason would be
> > the apache module API doesn't lend itself to this.
> 
> The reason it hasn't been done in a threaded model is that perl
> isn't stable running threaded yet, and based on the history
> of making programs thread-safe, I'd expect this to take at
> least a few more years.  But, using a non-mod-perl front
> end proxy with ProxyPass and RewriteRule directives to hand
> off to a mod_perl backend will likely get you a 10-1 reduction
> in backend processes and you already know the configuration
> syntax for the second instance.

Well, now your discussing threaded perl... a whole seperate bag of
tricks :).  That's not what I'm talking about... I'm talking about
running a standard perl inside of a threaded enviro.  I've done this,
and thrown tens of thousands of requests at it with no problems.  I
believe threaded perl is an attempt to allow multiple simultaneous
requests going into a single perl engine that is "multi threaded".
There are problems with this... and it's difficult to accomplish, and
alltogether a slower approach than queing because of the context
switching type overhead.  Not to mention the I/O issue of this...
yikes! makes my head spin.

Thanks,
Shane.
> 
>  Les Mikesell
>    les@mcs.com

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Perrin Harkins <pe...@primenet.com>.

On Tue, 25 Apr 2000 shane@isupportlive.com wrote:
> With mod_proxy you really only need a few mod_perl processes because
> no longer is the mod_perl ("heavy") apache process i/o bound.  It's
> now CPU bound.  (or should be under heavy load)

I think for most of us this is usually not the case, since most web apps
involve using some kind of external data source like a database or search
engine.  They spend most of their time waiting on that resource rather
than using the CPU.

Isn't is common wisdom that parallel processing is better for servers than
sequenential anyway, since it means most people don't have to wait as long
for a response?  The sequential model is great if you're the next in line,
but terrible if there are 50 big requests in front of you and yours is
very small.  Parallelism evens things out.

- Perrin

Re: Modperl/Apache deficiencies... Memory usage.

Posted by sh...@isupportlive.com.

Justin,
On Tue, Apr 25, 2000 at 11:26:00PM -0400, jb@dslreports.com wrote:
> On Sat, 15 Apr 2000 shane@isupportlive.com wrote:
> > 
> > > It is very memory inneffecient basically.  Each process of apache has
> > > it's registry which holds the compiled perl scripts in..., a copy of
> > > each for each process.  This has become an issue for one of the
> > > companies that I work for, and I noted from monitoring the list that
> > > some people have apache processes that are upwards of 25Megs, which is
> > > frankly ridiculous.
> >
> 
> Originally I thought this as well.. but, using mod_rewrite, the ratio
> of heavy to light processes can be very high.. some solid figures here:
> our site (dslreports) now handles 200,000 pages a day, every single one
> of them is dynamically generated.. at peak this is 10-20 modperl pages
> a second.. to handle this, we have httpd modperl with MIN, MAX
> and LIMIT of just 8(!) modperl processes, 2 php httpds (just for true type
> font rendering ;-) and as many front-end httpds (mod_rewrite and mod_proxy)
> as required (usually about 100 but can be 200-300 at times). There is
> almost never all 8 modperls running at one time.. even long pages fit
> in the buffers between back end, mod_proxy, and front end meaning its
> hard to catch mod perl in the act of actually servicing requests.

Yes... mod_proxy is the "right way" to really get any sort of serious
stuff hapening in terms of speed.  With mod_proxy you really only need
a few mod_perl processes because no longer is the mod_perl ("heavy")
apache process i/o bound.  It's now CPU bound.  (or should be under
heavy load)  Sounds like your ratio is pretty good.  (modperl/
mod_proxy)  I'd be willing to gamble you could have even less
less mod_perl processes :-).., because they *should* be completely CPU
bound... so really under a "theoretical" machine the optimum number
would be as many processors as you had.  But since mod_proxy/mod_perl
communicate through a socket as opposed to shared memory, there is
still a little i/o action.  (Which is something I'm thinking about
working on too in mod_proxy..., in local mode allow for more efficient
data transfer methods than sockets)  The closer you get to the number
of processors the less context switching involved... although sounds
like you have lots of other stuff running on this box... so this
becomes less of an issue.

> So I find this setup very stable and very flexible .. and would not swap it
> for a pool of interpreters, or a multi-threaded server for all the tea
> in china .. because I dont think either of those models (elegant though
> they may sound, and absolutely the right direction) will be as stable
> for some considerable time.

Hehe... and it is for this exact reason that the actual httpd serving
mechanism hasn't changed much since... well, they started preforking?
Geez... how long ago was that?..., I guess when they started patching
NCSA... so a LONG time ago.  I really think it's time... cross
platform is great, but don't hold back other platforms functionality
just to keep the number of macros in the source down... I mean really.

Its sort of a chicken and the egg scenario..., POSIX.4 was put
together some time ago.  But it was only implemented in Linux in
2.3.28 or later (or so), well it was in some earlier version, but had
some sort of bug.  I think FreeBSD has it too.  But theres no pressure
on anyone else to do anything about it because theres no apps.  But
theres no apps because theres no OS support... :-)... I hate those
sorts of situations.  Who knows when windows will get it?  Maybe
Windows 3k.

(I should note, there have been lots of "tweaks" along the way, but
nothing essentially different in concept)

Thanks,
Shane.

> 
> -Justin

Re: Modperl/Apache deficiencies... Memory usage.

Posted by jb...@dslreports.com.

On Sat, 15 Apr 2000 shane@isupportlive.com wrote:
> 
> > It is very memory inneffecient basically.  Each process of apache has
> > it's registry which holds the compiled perl scripts in..., a copy of
> > each for each process.  This has become an issue for one of the
> > companies that I work for, and I noted from monitoring the list that
> > some people have apache processes that are upwards of 25Megs, which is
> > frankly ridiculous.
>

Originally I thought this as well.. but, using mod_rewrite, the ratio
of heavy to light processes can be very high.. some solid figures here:
our site (dslreports) now handles 200,000 pages a day, every single one
of them is dynamically generated.. at peak this is 10-20 modperl pages
a second.. to handle this, we have httpd modperl with MIN, MAX
and LIMIT of just 8(!) modperl processes, 2 php httpds (just for true type
font rendering ;-) and as many front-end httpds (mod_rewrite and mod_proxy)
as required (usually about 100 but can be 200-300 at times). There is
almost never all 8 modperls running at one time.. even long pages fit
in the buffers between back end, mod_proxy, and front end meaning its
hard to catch mod perl in the act of actually servicing requests.

On the same box is mysql (40-60 daemons), constant mail handling, various
other custom perl daemons and utilities, MRTG monitoring of 100s of IP
addresses, a busy ultimate bulletin board (plain cgi - ugh), development
and testing as well sometimes.

This is all handled by two 450mhz processors and 1gig of memory, no swapping,
half of the memory ends up being used by linux for disk caching.. modperl
uses up about 256mb I suppose. With that much traffic, memory leaks, and the
odd SQL query that reads too much, the modperl processes grow slowly to about
40mb each, but get reborn, every 1000 requests, at 28mb again. The load
average hovers around 5.0
I read today drkoop (while going broke) now does 600k pages per day..
I bet they have a couple of million bucks worth of solaris, oracle and IIS,
serving out mainly static content? whats the price/performance problem
with modperl again?

So I find this setup very stable and very flexible .. and would not swap it
for a pool of interpreters, or a multi-threaded server for all the tea
in china .. because I dont think either of those models (elegant though
they may sound, and absolutely the right direction) will be as stable
for some considerable time.

-Justin

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Doug MacEachern <do...@covalent.net>.

On Thu, 4 May 2000 shane@isupportlive.com wrote:
 
> Sounds like a good plan.  The first piece to put together is the
> script that can register callbacks, and iterate through the perl
> threads.  Do we have a devel version that's got the mip->avail type
> stuff together, or is this something that will be coming together in
> the next few weeks?

it's in there now.

>  Okay so this is a module that would be loaded via
> httpd.conf so that the thread can be spun off, and it can begin
> analyzing.  It should have some parameters so that it doesn't suck up
> too much CPU time, like sleep n seconds between jumping to the next
> apache embedded interpreter thread, blah, blah.  Are we going to dump
> this info to perlmemorylogs?  (Configurable to some location, etc)  Or
> integrate it with some sort of online status program?
> <mod_perl_status :)>  Or both..., hehe, well, of course that's all
> later, first piece is to get the iterator built that registers
> callbacks.  As an aside, I think the callback thing was a really good
> idea on your part, that way you can analyze how much memory your
> programs are using and whether you need to re-think your design
> strategy or implement a cleaner.  Any cleaner, a real aggresive one,
> or a really kick back one.  In any event, I just wanted to mention
> that this was a really good idea of yours (the callbacks).

assuming your moving forward with this (making you a hero :), you should
be able to put the majority of it together without looking at the
mod_perl-2.0 source first.  the only thing you need to keep in mind is
that the c functions need to accept a PerlInterpreter argument with
-Dusethreads Perls.  Perl has some macros to deal with this "implicit
context", which would look something like so:

void symbol_table_walk(pTHX_ char *package, int recurse, ...)

testing with a vanilla Perl, you would call that like so:

symbol_table_walk(aTHX_ "main", TRUE, ...);

once it's integrated into mod_perl, something like so:

{
#ifdef USE_ITHREADS;
    pTHX;
    modperl_interp_t *interp = modperl_interp_get(...);
    aTHX = interp->perl;
#endif

    symbol_table_walk(aTHX_ "main", TRUE, ...);
    ...
}

that PerlInterpreter structure is used when macros such as:
gv_fetchpv("main", FALSE);

 translate into:

Perl_gv_fetchpv(my_perl, "main", FALSE);

Re: Modperl/Apache deficiencies... Memory usage.

Posted by sh...@isupportlive.com.

[chopping down the digital forest because the light has turned on :)]

> just a seperate thread that plucks a mip->avail interpreter, puts it in
> the mip->busy list, analyzes, puts back, plucks the next mip->avail, over
> and over.

Sounds like a good plan.  The first piece to put together is the
script that can register callbacks, and iterate through the perl
threads.  Do we have a devel version that's got the mip->avail type
stuff together, or is this something that will be coming together in
the next few weeks?  Okay so this is a module that would be loaded via
httpd.conf so that the thread can be spun off, and it can begin
analyzing.  It should have some parameters so that it doesn't suck up
too much CPU time, like sleep n seconds between jumping to the next
apache embedded interpreter thread, blah, blah.  Are we going to dump
this info to perlmemorylogs?  (Configurable to some location, etc)  Or
integrate it with some sort of online status program?
<mod_perl_status :)>  Or both..., hehe, well, of course that's all
later, first piece is to get the iterator built that registers
callbacks.  As an aside, I think the callback thing was a really good
idea on your part, that way you can analyze how much memory your
programs are using and whether you need to re-think your design
strategy or implement a cleaner.  Any cleaner, a real aggresive one,
or a really kick back one.  In any event, I just wanted to mention
that this was a really good idea of yours (the callbacks).

Thanks,
Shane.

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Doug MacEachern <do...@covalent.net>.

On Wed, 3 May 2000 shane@isupportlive.com wrote:

> So what you want is something more general, climbs through the symbol
> table and can register callbacks for various things, right?  One of

right.

> which, the area I'm most interested in, is the PADLIST?  Well, that's

same here!

> certainly something that could be put together.  What I had worked on
> and have right now is something that is adapted out of Peek::Devel,
> (something like that..., I can't remember, by Ilya?) originally it was
> designed to trace through and report on the various variables in the
> PAD, but not too specific on data on that.  So what we're talking

yeah, Devel::Peek by ilya.

> about now is hooking up a module that you could register callbacks for
> PADs into secondary modules, reporters, cleaners, that sort of thing.
> Sounds cool to me :-).  Now... how are we going to give this thing
> execution time?, is it something that is spun out as a thread
> everyonce in a while, or everpresent going through all the
> interpreters data?  (Obviously this would only be possible in a fully
> threaded version of perl), or will it actually be part of the threads
> that are running perl scripts? (So that we won't have to deal with
> locking issues)

the mod_perl-2.0 interpreter pool requires a -Dusethreads flavor of Perl,
which is not 5.005 threads.  it wraps the Perl runtime into the
PerlInterpreter structure, which is thread-safe, assuming only one thread
is calling back into it at any given time.  so if mod_perl were to spawn a
thread at startup, this thread could examine the mip->avail interpreters
without any locking.

> Okay..., so in 2.x we're actually having threads that are dedicated to
> perl?  I.e. seperate and apart from direct serving actions?  Or did I
> misunderstand what you're saying here.  So the cleaner would be an

no, we have a pool of (thread-safe) PerlInterpreters created at startup
(and at runtime as needed).  at request time, inside an apache thread, we
lock the interpreter pool, pop an interpreter from the avail list, push it
into the busy list, unlock and and stash a pointer to the interpreter
inside r->pool (or c->pool for connection handlers).  that interpreter is
used for any callbacks that happen in the request thread and is putback in
the avail list at the end of the request (unless PerlInterpMaxRequests is 
reach or similar).

> internally triggered event, and run in the context of the running
> thread.  It would say: "Hey I'm busy" and not get anymore requests,
> then it would start analyzing it's own data structures.  Or are you
> speaking of a seperate thread running in the own variable space as the
> other threads which tells that thread to stop serving requests, and
> analyzes it's cloned version of the registry?

just a seperate thread that plucks a mip->avail interpreter, puts it in
the mip->busy list, analyzes, puts back, plucks the next mip->avail, over
and over.

Re: Modperl/Apache deficiencies... Memory usage.

Posted by sh...@isupportlive.com.

> what i'd really like to see is a generic symbol table walker (written in
> c), where you can register callbacks foreach symbol type (SVt_*).  then
> once you hit an SVt_PVCV, there can be any number of registered callbacks
> that fiddle with the padlist (cleaner, reporter, etc.)

So what you want is something more general, climbs through the symbol
table and can register callbacks for various things, right?  One of
which, the area I'm most interested in, is the PADLIST?  Well, that's
certainly something that could be put together.  What I had worked on
and have right now is something that is adapted out of Peek::Devel,
(something like that..., I can't remember, by Ilya?) originally it was
designed to trace through and report on the various variables in the
PAD, but not too specific on data on that.  So what we're talking
about now is hooking up a module that you could register callbacks for
PADs into secondary modules, reporters, cleaners, that sort of thing.
Sounds cool to me :-).  Now... how are we going to give this thing
execution time?, is it something that is spun out as a thread
everyonce in a while, or everpresent going through all the
interpreters data?  (Obviously this would only be possible in a fully
threaded version of perl), or will it actually be part of the threads
that are running perl scripts? (So that we won't have to deal with
locking issues)

> > (Maybe like Apache::Memory::Cleaner, and Apache::Memory::Reporter?...
> > how does that sound?  We'd need some mutexes on the Registry so that
> > the cleaner doesn't end up cleaning a running Registry Script <well,
> > duh!>... should this extend to Handlers?)
> 
> each interpreter clone has it's own copy of the symbol table and padlists
> (the syntax tree is shared).
> so, in a threaded apache, the cleaner could could pop a mip->avail
> interpreter and put it in the mip->busy list, in which case, nobody will
> try to use it, no locking needed (other than mip->mip_lock for the avail
> -> busy shift and back again).
> mip == 'm'od_perl 'i'nterpreter 'p'ool, which is in the modperl-2.0 cvs
> tree, not quite nailed down, but getting close.

Okay..., so in 2.x we're actually having threads that are dedicated to
perl?  I.e. seperate and apart from direct serving actions?  Or did I
misunderstand what you're saying here.  So the cleaner would be an
internally triggered event, and run in the context of the running
thread.  It would say: "Hey I'm busy" and not get anymore requests,
then it would start analyzing it's own data structures.  Or are you
speaking of a seperate thread running in the own variable space as the
other threads which tells that thread to stop serving requests, and
analyzes it's cloned version of the registry?

> 
> in a 1.3-ish multi-process model, i suppose the cleaner could run as a
> cleanup.
> 
> > However, if it were possible to override the "read" and "write"
> > functions that would sort of "freeze" execution, and put a lock on
> > this Apache registry entry (and make a unlocked copy BTW), and
> > transfer it to another thread whos only job was read/write through a
> > sigqueue interface... that would be REALLY cool for performance/memory
> > consumption.  Much less context switching overhead, and drastically
> > reduced memory overhead.  The problem is that who in their right mind
> > has time for this sort of thing? (:->)  I was thinking of implementing
> > the writing of mod_proxy like this..., after considering it fully
> > though, I think it would be even better to write a generalized module
> > that could stream bits to clients, and use it as a plug in for any
> > module that doesn't want to waste time streaming out to a 28.8k
> > connection.
> 
> sounds like a piece of cake ;)
> 

hehe, cake, as in the 6 months it takes to make phefernoose cookies
(that is a total mis-spelling).  It was just a crazy idea, but it
would take a lot more people than one to implement.  I'm thinking of
laying the foundation that others could build off of later, like
puting the mod_async thing together with void* pointers so it could
hold a pointer to anything and pass it back to who ever called it
later.

Anyhow..., that's later, I'm working on sending first... it's kind of
more universally applicable anyway.., the reading is a "specific
case".

Thanks,
Shane.

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Doug MacEachern <do...@covalent.net>.

On Tue, 25 Apr 2000 shane@isupportlive.com wrote:
 
> Let me know when you want the garbage collector.  I'll re-write it in
> apache style, and add some debugging stuff.  I figure there should be
> two pieces.  One that analyzes the packages that are running, the
> other that actually kills off variables.  This could be very usefull
> for admins that want to analyze what those huge processes really are.

what i'd really like to see is a generic symbol table walker (written in
c), where you can register callbacks foreach symbol type (SVt_*).  then
once you hit an SVt_PVCV, there can be any number of registered callbacks
that fiddle with the padlist (cleaner, reporter, etc.)

> (Maybe like Apache::Memory::Cleaner, and Apache::Memory::Reporter?...
> how does that sound?  We'd need some mutexes on the Registry so that
> the cleaner doesn't end up cleaning a running Registry Script <well,
> duh!>... should this extend to Handlers?)

each interpreter clone has it's own copy of the symbol table and padlists
(the syntax tree is shared).
so, in a threaded apache, the cleaner could could pop a mip->avail
interpreter and put it in the mip->busy list, in which case, nobody will
try to use it, no locking needed (other than mip->mip_lock for the avail
-> busy shift and back again).
mip == 'm'od_perl 'i'nterpreter 'p'ool, which is in the modperl-2.0 cvs
tree, not quite nailed down, but getting close.

in a 1.3-ish multi-process model, i suppose the cleaner could run as a
cleanup.

> However, if it were possible to override the "read" and "write"
> functions that would sort of "freeze" execution, and put a lock on
> this Apache registry entry (and make a unlocked copy BTW), and
> transfer it to another thread whos only job was read/write through a
> sigqueue interface... that would be REALLY cool for performance/memory
> consumption.  Much less context switching overhead, and drastically
> reduced memory overhead.  The problem is that who in their right mind
> has time for this sort of thing? (:->)  I was thinking of implementing
> the writing of mod_proxy like this..., after considering it fully
> though, I think it would be even better to write a generalized module
> that could stream bits to clients, and use it as a plug in for any
> module that doesn't want to waste time streaming out to a 28.8k
> connection.

sounds like a piece of cake ;)

Re: Modperl/Apache deficiencies... Memory usage.

Posted by sh...@isupportlive.com.

Doug & modperlers...
  
> however, padlists are not shared.  as i mentioned, i'd like to look at
> using your "garbage collector" for 2.0.  if it could run in it's own
> thread and examine the padlists of idle interpreters, it could be big win.
> i wouldn't want it to release all allocations in the padlist by default.
> maybe be configurable to only release things of a certain size.  what i
> would personally like to see is one that just reports anything that's
> larger than X size, so i can fix the Perl code not copy large chunks of
> data, and/or figure out how to make large chunks of data shared between
> interpreters.  i kinda started this with the B::Size hooks in
> Apache::Status, but you have to dig around the symbol table to find how
> big things are, there's no overall reporting mechanism.

Let me know when you want the garbage collector.  I'll re-write it in
apache style, and add some debugging stuff.  I figure there should be
two pieces.  One that analyzes the packages that are running, the
other that actually kills off variables.  This could be very usefull
for admins that want to analyze what those huge processes really are.
Are they all code?, are they stack variables?  A reporting module
would be really cool, and it would answer a lot of questions that I
have.  So you're right..., I think that's the first piece that should
be implemented.  I just need to re-write it for multiple levels of
recursion (Present version only scans the first level).  Basically
re-hack it to exclude actual cleaning, and just reporting, and maybe
as a secondary module a cleaner.  Something like use
Apache::Cleaner..., in a fully threaded apache only one thread would
be needed.  In a process enviro... arg!... I don't want to think about
that right now... I know the answer but I just hate to say it.

(Maybe like Apache::Memory::Cleaner, and Apache::Memory::Reporter?...
how does that sound?  We'd need some mutexes on the Registry so that
the cleaner doesn't end up cleaning a running Registry Script <well,
duh!>... should this extend to Handlers?)

>  
> > One of my concerns is that maybe the apache module API is simply too
> > complex to pull something like this off.  I don't know, but it seems
> > like it should be able to handle something like this.
> 
> if you need a model where the Perl engine is in a different process 
> than Apache, that should be implemented with FastCGI or something else.  
> the overhead of passing the request_rec (and everything it points to)
> between processes for all of the various phases (so mod_perl could
> still everything it can today) would be a nightmare.

I really don't need such a model at this time.  However I was thinking
about it..., and aside from the point that Stas brought up, I think
this would be an interesting avenue to look through... that's
essentially how mato_perl works.  The crazy idea that I forwarded to
Stas was something like this:  The problem with having one thread is
network reads.  This is actually a really big issue all in all.
However, if it were possible to override the "read" and "write"
functions that would sort of "freeze" execution, and put a lock on
this Apache registry entry (and make a unlocked copy BTW), and
transfer it to another thread whos only job was read/write through a
sigqueue interface... that would be REALLY cool for performance/memory
consumption.  Much less context switching overhead, and drastically
reduced memory overhead.  The problem is that who in their right mind
has time for this sort of thing? (:->)  I was thinking of implementing
the writing of mod_proxy like this..., after considering it fully
though, I think it would be even better to write a generalized module
that could stream bits to clients, and use it as a plug in for any
module that doesn't want to waste time streaming out to a 28.8k
connection.  Anyhow... this is my present obsession, and I'm hoping to
have a beta patch to mod_proxy in about a month.  But if I were to
re-implement it as a generic apache module, and give it some state
context, and the ability to read and write, it could be very very
usefull for nearly anything for apache.  (Obviously sigqueue's aren't
implemented on every platform, so either poll(), or select would take
it's place in these scenarios... still more efficient than having a
"heavy" mod perl enabled server sending data to 28.8k client)

Anyhow... thanks Doug,
Shane.
(Man, you've got a good memory... that was over two months ago I
brought up the GC)

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Ken Williams <ke...@forum.swarthmore.edu>.

rmonical@destinations.com (Robert Monical) wrote:
>I am not very knowledgeable but have been lurking on this list for a
>couple of months. The last week (or two) have seen a number of posts
>about having two Apaches. A light weight front end and a mod-perl
>enabled back end.  Since I do not fully understand what folks are
>talking about, these all go in an archive for me to read later. 

Start here:

 http://perl.apache.org/guide/strategy.html#Alternative_architectures_for_ru

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Robert Monical <rm...@destinations.com>.

I am not very knowledgeable but have been lurking on this list for a couple 
of months.
The last week (or two) have seen a number of posts about having two 
Apaches. A light weight front end and a mod-perl enabled back end.  Since I 
do not fully understand what folks are talking about, these all go in an 
archive for me to read later. In the application that I inherited, a front 
ends feeds into a  modperl engine for the database portion of the session. 
All static downloads go to another instance, in our case, on separate hardware.

What are your thoughts about that approach as opposed to your idea?


At 12:03 AM 4/15/00, shane@isupportlive.com wrote:
>Modperlers...,
>
>I'd like to start a discussion about the deficiences in Apache/modperl
>and get you feedback with regard to this issue.  The problem as I see
>it is that the process model that Apache uses is very hard on modperl.
>It is very memory inneffecient basically.  Each process of apache has
>it's registry which holds the compiled perl scripts in..., a copy of
>each for each process.  This has become an issue for one of the
>companies that I work for, and I noted from monitoring the list that
>some people have apache processes that are upwards of 25Megs, which is
>frankly ridiculous.
>
>This is not meant to be a flame, and I'd really like to get down to
>the nitty gritty on how we can solve this problem.  Zach Brown wrote
>phhttpd which is a threaded server which can handle a lot more load
>than apache, but the problem is it doesn't have the features that
>Apache has, and it's not going to catch up any time soon so I think
>it's not going to be the cure-all.  I wrote a very small perl engine
>for phhttpd that worked within it's threaded paradigm that sucked up a
>neglibible amount of memory which used a very basic version of
>Apache's registry.  Once again though it didn't have the feature set
>that Apache/modperl has.  Now to address the issue: I think we have a lot of
>code in Modperl that is basically awesome.  Not to mention the fact
>that Apache itself has a lot of modules and other things which are
>quite usefull.  However I think if it were possible to divorce the
>actually perl engine from the Apache process we could solve this memory
>usage problem.
>
>Basically heres what I'm thinking might be possible, but if it's not
>just let me know.  (Well, I know it's possible, but I mean how much
>work would it take to institute, and has someone else worked on this,
>or taken a look at how much work we'd be talking about)  What I'm
>thinking is essentially we take the perl engine which has the apache
>registry and all the perl symbols etc., and seperate it into it's own
>process which would could be multithreaded (via pthreads) for multiple
>processor boxes.  (above 2 this would be beneficial probably)  On the
>front side the apache module API would just connect into this other
>process via shared memory pages (shmget et. al), or Unix pipes or
>something like that.  The mod_perl process would have a work queue
>that the Apache processes could add work to via our front end API.
>The work threads inside of that mod_perl process would take work
>"orders" out of the work queue and process them and send the result
>back to the waiting apache process.  (Maybe just something as simple
>as a blocking read on a pipe coming out of the mod_perl process...
>this would keep down context switching issues and other nasty bits)
>
>One of my concerns is that maybe the apache module API is simply too
>complex to pull something like this off.  I don't know, but it seems
>like it should be able to handle something like this.
>
>Does anyone know of any program which has been developed like this?
>Basically we'd be turning the "module of apache" portion of mod_perl
>into a front end to the "application server" portion of mod_perl that
>would do the actual processing.  It seems quite logical that something
>like this would have been developed, but possibly not.  The seperation
>of the two components seems like it should be done, but there must be
>a reason why no one has done it yet... I'm afraid this reason would be
>the apache module API doesn't lend itself to this.
>
>Well, thanks to everyone in advance for their thoughts/comments...
>Shane Nay.


Have a great day!

--Robert Monical
--Director of CRM Development
--rmonical@destinations.com


"The Truth is Out There"

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Doug MacEachern <do...@covalent.net>.

On Sat, 15 Apr 2000 shane@isupportlive.com wrote:

> Modperlers...,
> 
> I'd like to start a discussion about the deficiences in Apache/modperl
> and get you feedback with regard to this issue.  The problem as I see
> it is that the process model that Apache uses is very hard on modperl.

mod_perl-2.0/apache-2.0 should solve this (fingers crossed :)

> It is very memory inneffecient basically.  Each process of apache has
> it's registry which holds the compiled perl scripts in..., a copy of
> each for each process.  This has become an issue for one of the
> companies that I work for, and I noted from monitoring the list that
> some people have apache processes that are upwards of 25Megs, which is
> frankly ridiculous.

with 2.0, the syntax tree is shared (at the Perl level)

however, padlists are not shared.  as i mentioned, i'd like to look at
using your "garbage collector" for 2.0.  if it could run in it's own
thread and examine the padlists of idle interpreters, it could be big win.
i wouldn't want it to release all allocations in the padlist by default.
maybe be configurable to only release things of a certain size.  what i
would personally like to see is one that just reports anything that's
larger than X size, so i can fix the Perl code not copy large chunks of
data, and/or figure out how to make large chunks of data shared between
interpreters.  i kinda started this with the B::Size hooks in
Apache::Status, but you have to dig around the symbol table to find how
big things are, there's no overall reporting mechanism.

> One of my concerns is that maybe the apache module API is simply too
> complex to pull something like this off.  I don't know, but it seems
> like it should be able to handle something like this.

if you need a model where the Perl engine is in a different process 
than Apache, that should be implemented with FastCGI or something else.  
the overhead of passing the request_rec (and everything it points to)
between processes for all of the various phases (so mod_perl could
still everything it can today) would be a nightmare.

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Ken Williams <ke...@forum.swarthmore.edu>.

shane@isupportlive.com wrote:
>Modperlers...,
>
>I'd like to start a discussion about the deficiences in Apache/modperl
>and get you feedback with regard to this issue.  The problem as I see
>it is that the process model that Apache uses is very hard on modperl.
>It is very memory inneffecient basically.  Each process of apache has
>it's registry which holds the compiled perl scripts in..., a copy of
>each for each process.  This has become an issue for one of the
>companies that I work for, and I noted from monitoring the list that
>some people have apache processes that are upwards of 25Megs, which is
>frankly ridiculous.

1) Are you preloading the scripts with RegistryLoader?  That puts them in the
parent process, so the memory will be shared by the children.  Each child will
still [seem to] be 25 megs, but the children will have a lot of overlap.

2) Are you using the two-server model, backend & frontend?  It's a must if you
want to make efficient use of your memory.

3) If you're already doing these things and you still aren't satisfied, perhaps
mod_perl isn't for you and you want to look at FastCGI or a similar project. 
I've never had occasion to use it.  You'll no longer have access to the Apache
API, but it sounds like you're not using that anyway.  

  -------------------                            -------------------
  Ken Williams                             Last Bastion of Euclidity
  ken@forum.swarthmore.edu                            The Math Forum

Re: Modperl/Apache deficiencies... Memory usage.

Posted by Tom Mornini <tm...@infomania.com>.

On Sat, 15 Apr 2000 shane@isupportlive.com wrote:

> This has become an issue for one of the
> companies that I work for, and I noted from monitoring the list that
> some people have apache processes that are upwards of 25Megs, which is
> frankly ridiculous.

1) I've seen them bigger than 25 megs.

2) Do you know about the front-end proxy/back-end mod_perl configuration?

-- Tom Mornini
-- InfoMania Printing and Prepress