You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Simon Clewer <si...@superquote.com> on 2004/01/09 02:22:53 UTC

ithreads with modperl

Hi,

We're using ithreads with modperl to run some complicated robots
concurrently ( in a commercial environment ) - there is however an issue.

Huge memory usage ... each ithread uses about 10M  of ram ( image of Apache,
image of mod perl and image of our deep-link robot ), and as we use 5
ithreads plus the original thread that means that each Apache is using 60 M
and because we trade on being the best and the fastest at what we do we need
to keep plenty of Apaches ready and waiting ( about 20 ) - so we're using
heaps of memory.

Does anybody know how we can reduce the amount of memory we use ? - is there
some smart way to actually share the images. Up to now the problem has been
OK because we simply paid to put 2 Gig on board, but we had nearly 1000
users yesterday and they spend about 10 minutes filling in the forms ( car
insurance quote stuff ) then the robots go out and do the deep-linking and
that can be up to 3 minutes ( and usually do - the sites we deep-link into
can be quite slow and we make 10 - 50 requests into each site ), during
which time Apache is handling a single request !

We are simply running out of memory, which is sad because we are nowhere
near running out of processor and it grieves me to simply use a bigger
server when it seems that smarter could solve the problem. Other than that
things are working very nicely and the site serves quickly and reliably -
see at www.superquote.com ( despite the warnings that ithreads is not yet
safe for commercial use ).

We're on a 2.4.20 Linux with Apache 1.3.28  and mod_perl 1.28

Cheers
Simon


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Perrin Harkins <pe...@elem.com>.
On Fri, 2004-01-09 at 11:45, Perrin Harkins wrote:
> However, this is 5.6 with
> ithreads that we're talking about

Correction, Simon says they are actually using 5.8.

- Perrin


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Stas Bekman <st...@stason.org>.
Perrin Harkins wrote:
> On Fri, 2004-01-09 at 14:52, Stas Bekman wrote:
> 
>>We really need more real world benchmarks to make a good judgement. It's 
>>probably quite certain that the performance is going to be worse if you spawn 
>>threads, but don't deploy the benefits available exclusively to threads 
>>(shared opcode tree, shared vars, etc).
> 
> 
> That reminds me, does anyone know what happened with the shared opcode
> tree?  Does it not work, or is it just dwarfed by the size of the
> non-shared stuff?  The size problems these guys are having seem to point
> to little or no sharing happening between threads.

AFAIK, nothing has happened to the shared opcode tree. perl_clone() clones all 
the mutable data, and shares opcodes which were preloaded by the time it's 
run. If you load modules after perl_clone, those opcodes don't get shared. it 
should be easy to add gtop calls before and after perl_clone call in mp2 to 
see how much memory is consumed by perl_clone, which then can be compared to 
memory added by loading modules.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Perrin Harkins <pe...@elem.com>.
On Fri, 2004-01-09 at 16:02, Elizabeth Mattijsen wrote:
> You mean a rewrite of the article?  Or more a bullet list of things?

I was thinking of something that briefly makes these points:

- Threads have a higher startup cost.
- Perl is slower when built with threads.
- Threads tend to use more memory.

And then explain reasons you might want to use them anyway (e.g. you are
on Windows), and any known best practices with mod_perl (e.g. when to
load modules).

Then it could link to your article for more information.

- Perrin


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Elizabeth Mattijsen <li...@dijkmat.nl>.
At 15:51 -0500 1/9/04, Perrin Harkins wrote:
>On Fri, 2004-01-09 at 15:34, Elizabeth Mattijsen wrote:
>>  So yes, in general I think you can say that the data copied for each
>>  thread, quickly dwarves whatever optrees are shared.
>
>Thanks Liz, this is useful data.  Maybe we should add something to the
>mod_perl 2 docs that summarizes the current knowledge of performance and
>memory issues with threads, and links to your article.  Something under
>this heading:
>http://perl.apache.org/docs/2.0/user/coding/coding.html#Threads_Coding_Issues_Under_mod_perl

You mean a rewrite of the article?  Or more a bullet list of things?


Liz

-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Perrin Harkins <pe...@elem.com>.
On Fri, 2004-01-09 at 15:34, Elizabeth Mattijsen wrote:
> So yes, in general I think you can say that the data copied for each 
> thread, quickly dwarves whatever optrees are shared.

Thanks Liz, this is useful data.  Maybe we should add something to the
mod_perl 2 docs that summarizes the current knowledge of performance and
memory issues with threads, and links to your article.  Something under
this heading:
http://perl.apache.org/docs/2.0/user/coding/coding.html#Threads_Coding_Issues_Under_mod_perl

- Perrin


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Elizabeth Mattijsen <li...@dijkmat.nl>.
At 13:26 -0800 1/9/04, Stas Bekman wrote:
>Elizabeth Mattijsen wrote:
>>I'm sure you know my PerlMonks article "Things yuu need to know 
>>before programming Perl ithreads" ( 
>>http://www.perlmonks.org/index.pl?node_id=288022 ).
>>So yes, in general I think you can say that the data copied for 
>>each thread, quickly dwarves whatever optrees are shared.
>How is this different from fork? When you fork, OS shares all memory 
>pages between the parent and the child. As variables are modified, 
>memory pages become dirty and unshared. With forking mutable (vars) 
>and non-mutable (OpCodes) share the same memory pages, so ones a 
>mutable variable changes, the opcodes allocated from the same memory 
>page, get unshared too. So you get more and more memory unshared as 
>you go. in the long run (unless you use size limiting tools) all the 
>memory gets unshared.

Well, yes.  But you forget that when you load module A, usually 
modules B..Z are loaded as well, hidden from your direct view.  And 
Perl has always taken the approach of using more memory rather than 
more CPU.  So most modules are actually optimized by their authors to 
store intermediate results in maybe not so intermediate variables. 
Not to mention, many modules build up internal data-structures that 
may never be altered.  Even compile time constants need to have a CV 
in the stash where they exist, even though they're optimized away in 
the optree at compile time.  And a CV qualifies as "data" as far as 
threads are concerned.


>With ithreads, opcode tree is always shared and mutable data is 
>copied at the very beginning. So your memory consumption should be 
>exactly the same after the first request and after 1000's request 
>(assuming that you don't allocate any memory at run time). Here you 
>get more memory consumed at the beginning of the spawned thread, but 
>it stays the same.

Well, I see it this way: With threads, you're going to get the hit 
for everything possible at the beginning.  With fork, you get hit 
whenever anything _actually_ changes.  And spread out over time.  I 
would take fork() anytime over that.


>So let's say you have 8MB Opcode Tree and 4MB of mutable data. The 
>process totalling at 12MB. Using fork you will start off with all 
>12MB shared and get memory unshared as you go. With threads, you 
>will start off with 4MB upfront memory consumption and it'll stay 
>the same.

But if you start 100 threads, you'll 400 MByte, whereas fork 100 
times, you'll start off witb basically 12 MByte and a bit.  Its the 
_memory_ usage that is causing the problem.

On top of that, I think you will find quite the opposite in the 
amount of OpTree and mutable data usage.  A typical case would easier 
be something like 4MB of optree and 8MB of mutable data.

To prove my point, I have taken my Benchmark::Thread::Size module 
(available from CPAN) and tested the behaviour of POSIX with and 
without anything exported.

Performing each test 5 times
  #   (ref)         none         all  
                                  
  0    1724        +134 ± 6    +710 ± 6
  1    2080        +258 ± 6   +1468    
  2    2368        +334 ± 6   +2060    
  5    3232        +572       +3824    
 10    4656        +980       +6788    
 20    7512       +1796      +12704    
 50   16087 ± 6   +4228      +30448    
100   30380       +8284 ± 2  +60024    

==== none 
==================================================
use POSIX ();

==== all 
====================================================
use POSIX;

==========================================================

Sizes are displayed in Kbytes.  The average of 5 runs is shown, with 
differences from the mean shown as ±N after that (if any difference 
was found).  Each line shows memory used for the number of threads in 
column 1.

The second column shows the reference memory usage: the "bare" 
threads case.  You can see that 100 bare threads take about 30 MByte 
of memory.  That's without _anyting_ apart from "use threads".  No 
other modules (at least not visible: you'd be amazed what actually 
gets loaded when you do a "use threads").  You see that each thread 
takes about 300 K extra (which more or less coincides with my 
Devel::Size benchmark the other day).

The third column shows the _extra_ memory needed when a "use POSIX()" is added.

The fourth column shows what happens when all constants and subs are 
exported with "use POSIX".

Now I realize that many of the exported subs would otherwize have 
been AUTOLOADed.  But still, then they would _only_ create a sub, an 
optree.  And here you clearly.the effect the exported subs (which 
have a CV slot in the stash) have on the memory usage of a thread.

It is _very_ easy to use a _lot_ of memory this way.

Another example: one optimized away constant subroutine versus 2 
optimized away constant subroutines:

Performing each test 5 times
  #   (ref)         one        two   
                                 
  0    1726 ± 6      -2          +0 ± 6
  1    2080          +4         +10 ± 6
  2    2368          +4         +10 ± 6
  5    3232          +4         +10 ± 6
 10    4656         +20         +24    
 20    7512         +40         +48    
 50   16087 ± 8    +100        +112    
100   30380 ± 2    +199        +223    

==== all 
====================================================
sub foo () { 1 }

==== none 
==================================================
sub foo () { 1 }
sub bar () { 1 }

==========================================================

You can see that 2 constant subroutines take up more thread memory 
than 1.  And that's just because of the CV that stays behind in the 
package stash.

In case you're not sure the subs get optimized away, look at these two optrees:

$ perl5.8.2-threaded -MO=Concise -e 'sub foo () { 1 }; foo'
3  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 2 -e:1) v ->3
-     <0> ex-const v ->3

$ perl5.8.2-threaded -MO=Concise -e 'sub foo { 1 }; foo'
6  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 2 -e:1) v ->3
5     <1> entersub[t2] vKS/TARG,1 ->6
-        <1> ex-list K ->5
3           <0> pushmark s ->4
-           <1> ex-rv2cv sK/129 ->-
4              <#> gv[*foo] s ->5


>Now if in your fork setup you braket at 8MB with a size limiting 
>tool to restart, you will get the same 4MB overhead per process. 
>Besides equal memory usage you get better run-time performance with 
>threads, because it doesn't need to copy dirty pages as with forks 
>(everything was done at the perl_clone, which can be arranged long 
>before the request is served)

That may be so...  but you will find that along with huge memory 
usage you will also burn a lot of CPU because of all the stash 
walking that happens during cloning.


>(and you get a slowdown at the same time because of context management).
>So, as you can see it's quite possible that threads will perform 
>better than forks and consume equal or less amount of memory if the 
>opcode tree is bigger than the mutable data.

Probably.  But _only_ if you can get the desired number of threads 
into your physical RAM _and_ you can live with possible _long_ 
startup times (getting into 10's of seconds, depending on number of 
modules loaded and hardware you're running on).



Liz

-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Elizabeth Mattijsen <li...@dijkmat.nl>.
(second attempt: pasting UTF8 into a iso8859-1 message or vice versa 
or something thoroughly messed up my reply.  This time, it should 
work out ok.  Thanks Stas for pointing this out)

At 13:26 -0800 1/9/04, Stas Bekman wrote:
>  Elizabeth Mattijsen wrote:
>  > I'm sure you know my PerlMonks article "Things yuu need to know 
>before programming Perl ithreads" ( 
>http://www.perlmonks.org/index.pl?node_id=288022 ).
>  > So yes, in general I think you can say that the data copied for 
>each thread, quickly dwarves whatever optrees are shared.
>  How is this different from fork? When you fork, OS shares all 
>memory pages between the parent and the child. As variables are 
>modified, memory pages become dirty and unshared. With forking 
>mutable (vars) and non-mutable (OpCodes) share the same memory 
>pages, so ones a mutable variable changes, the opcodes allocated 
>from the same memory page, get unshared too. So you get more and 
>more memory unshared as you go. in the long run (unless you use size 
>limiting tools) all the memory gets unshared.

Well, yes.  But you forget that when you load module A, usually 
modules B..Z are loaded as well, hidden from your direct view.  And 
Perl has always taken the approach of using more memory rather than 
more CPU.  So most modules are actually optimized by their authors to 
store intermediate results in maybe not so intermediate variables. 
Not to mention, many modules build up internal data-structures that 
may never be altered.  Even compile time constants need to have a CV 
in the stash where they exist, even though they're optimized away in 
the optree at compile time.  And a CV qualifies as "data" as far as 
threads are concerned.


>  With ithreads, opcode tree is always shared and mutable data is 
>copied at the very beginning. So your memory consumption should be 
>exactly the same after the first request and after 1000's request 
>(assuming that you don't allocate any memory at run time). Here you 
>get more memory consumed at the beginning of the spawned thread, but 
>it stays the same.

Well, I see it this way: With threads, you're going to get the hit 
for everything possible at the beginning.  With fork, you get hit 
whenever anything _actually_ changes.  And spread out over time.  I 
would take fork() anytime over that.


>  So let's say you have 8MB Opcode Tree and 4MB of mutable data. The 
>process totalling at 12MB. Using fork you will start off with all 
>12MB shared and get memory unshared as you go. With threads, you 
>will start off with 4MB upfront memory consumption and it'll stay 
>the same.

But if you start 100 threads, you'll 400 MByte, whereas fork 100 
times, you'll start off witb basically 12 MByte and a bit.  Its the 
_memory_ usage that is causing the problem.

On top of that, I think you will find quite the opposite in the 
amount of OpTree and mutable data usage.  A typical case would easier 
be something like 4MB of optree and 8MB of mutable data.

To prove my point, I have taken my Benchmark::Thread::Size module 
(available from CPAN) and tested the behaviour of POSIX with and 
without anything exported.

Performing each test 5 times
   #   (ref)        none         all
   0    1726        +129        +708
   1    2080        +256       +1468
   2    2368        +332       +2060
   5    3232        +572       +3824
  10    4656        +980       +6788
  20    7512       +1796      +12706
  50   16084       +4232      +30454
100   30380       +8284      +60023

==== none ========
use POSIX ();

==== all =========
use POSIX;

==================

Sizes are displayed in Kbytes.  The average of 5 runs is shown.  Each 
line shows memory used for the number of threads in column 1.

The second column shows the reference memory usage: the "bare" 
threads case.  You can see that 100 bare threads take about 30 MByte 
of memory.  That's without _anyting_ apart from "use threads".  No 
other modules (at least not visible: you'd be amazed what actually 
gets loaded when you do a "use threads").  You see that each thread 
takes about 300 K extra (which more or less coincides with my 
Devel::Size benchmark the other day).

The third column shows the _extra_ memory needed when a "use POSIX()" is added.

The fourth column shows what happens when all constants and subs are 
exported with "use POSIX".

Now I realize that many of the exported subs would otherwize have 
been AUTOLOADed.  But still, then they would _only_ create a sub, an 
optree.  And here you clearly.the effect the exported subs (which 
have a CV slot in the stash) have on the memory usage of a thread.

It is _very_ easy to use a _lot_ of memory this way.

Another example: one optimized away constant subroutine versus 2 
optimized away constant subroutines:

Performing each test 5 times
(ref)  5 100
none  5 100
all  5 100
   #   (ref)        none         all
   0    1728          -2          -2
   1    2080          +4          +8
   2    2368          +4          +8
   5    3232          +4          +8
  10    4656         +20         +24
  20    7512         +40         +48
  50   16084        +104        +116
100   30380        +200        +224

==== none ========
sub foo () {1}

==== all =========
sub foo () {1}
sub bar () {1}

==================

You can see that 2 constant subroutines take up more thread memory 
than 1.  And that's just because of the CV that stays behind in the 
package stash.

In case you're not sure the subs get optimized away, look at these two optrees:

$ perl5.8.2-threaded -MO=Concise -e 'sub foo () { 1 }; foo'
3  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 2 -e:1) v ->3
-     <0> ex-const v ->3

$ perl5.8.2-threaded -MO=Concise -e 'sub foo { 1 }; foo'
6  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 2 -e:1) v ->3
5     <1> entersub[t2] vKS/TARG,1 ->6
-        <1> ex-list K ->5
3           <0> pushmark s ->4
-           <1> ex-rv2cv sK/129 ->-
4              <#> gv[*foo] s ->5


>  Now if in your fork setup you braket at 8MB with a size limiting 
>tool to restart, you will get the same 4MB overhead per process. 
>Besides equal memory usage you get better run-time performance with 
>threads, because it doesn't need to copy dirty pages as with forks 
>(everything was done at the perl_clone, which can be arranged long 
>before the request is served)

That may be so...  but you will find that along with huge memory 
usage you will also burn a lot of CPU because of all the stash 
walking that happens during cloning.


>  (and you get a slowdown at the same time because of context management).
So, as you can see it's quite possible that threads will perform 
better than forks and consume equal or less amount of memory if the 
opcode tree is bigger than the mutable data.

Probably.  But _only_ if you can get the desired number of threads 
into your physical RAM _and_ you can live with possible _long_ 
startup times (getting into 10's of seconds, depending on number of 
modules loaded and hardware you're running on).



Liz

-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Stas Bekman <st...@stason.org>.
Elizabeth Mattijsen wrote:
> At 15:17 -0500 1/9/04, Perrin Harkins wrote:
> 
>> On Fri, 2004-01-09 at 14:52, Stas Bekman wrote:
>>
>>>  We really need more real world benchmarks to make a good judgement. 
>>> It's
>>>  probably quite certain that the performance is going to be worse if 
>>> you spawn
>>>  threads, but don't deploy the benefits available exclusively to threads
>>
>>  > (shared opcode tree, shared vars, etc).
>> That reminds me, does anyone know what happened with the shared opcode
>> tree?  Does it not work, or is it just dwarfed by the size of the
>> non-shared stuff?  The size problems these guys are having seem to point
>> to little or no sharing happening between threads.
> 
> 
> I'm sure you know my PerlMonks article "Things yuu need to know before 
> programming Perl ithreads" ( 
> http://www.perlmonks.org/index.pl?node_id=288022 ).
> 
> I recently ran a little test that showed (at least to Devel::Size) that 
> you have _at least_ about 250Kbyte of "data" that needs to be copied 
> between threads if you _only_ do:
> 
>   use threads;
>   use threads::shared;
> 
> And I'm not sure whether this number isn't too low, because I don't know 
> for sure whether the CV's in the stash haven't been counted correctly.  
> If they were not, then you would come at about 400Kbyte of "data" for a 
> _bare_ thread.
> 
> Loading a few modules, each with their initializations, add up _very_ 
> quickly to several Mbytes of "data" that needs to be cloned _every_ time 
> you start a thread.  And these are _not_ simple copies: all of the 
> stashes need to be walked to make sure that all the [SAHC]V's are 
> properly copied to the  thread's copy.  So it's taking a _lot_ of CPU as 
> well...
> 
> So yes, in general I think you can say that the data copied for each 
> thread, quickly dwarves whatever optrees are shared.

How is this different from fork? When you fork, OS shares all memory pages 
between the parent and the child. As variables are modified, memory pages 
become dirty and unshared. With forking mutable (vars) and non-mutable 
(OpCodes) share the same memory pages, so ones a mutable variable changes, the 
opcodes allocated from the same memory page, get unshared too. So you get more 
and more memory unshared as you go. in the long run (unless you use size 
limiting tools) all the memory gets unshared.

With ithreads, opcode tree is always shared and mutable data is copied at the 
very beginning. So your memory consumption should be exactly the same after 
the first request and after 1000's request (assuming that you don't allocate 
any memory at run time). Here you get more memory consumed at the beginning of 
the spawned thread, but it stays the same.

So let's say you have 8MB Opcode Tree and 4MB of mutable data. The process 
totalling at 12MB. Using fork you will start off with all 12MB shared and get 
memory unshared as you go. With threads, you will start off with 4MB upfront 
memory consumption and it'll stay the same. Now if in your fork setup you 
braket at 8MB with a size limiting tool to restart, you will get the same 4MB 
overhead per process. Besides equal memory usage you get better run-time 
performance with threads, because it doesn't need to copy dirty pages as with 
forks (everything was done at the perl_clone, which can be arranged long 
before the request is served) (and you get a slowdown at the same time because 
of context management).

So, as you can see it's quite possible that threads will perform better than 
forks and consume equal or less amount of memory if the opcode tree is bigger 
than the mutable data.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Elizabeth Mattijsen <li...@dijkmat.nl>.
At 15:17 -0500 1/9/04, Perrin Harkins wrote:
>On Fri, 2004-01-09 at 14:52, Stas Bekman wrote:
>>  We really need more real world benchmarks to make a good judgement. It's
>>  probably quite certain that the performance is going to be worse 
>>if you spawn
>>  threads, but don't deploy the benefits available exclusively to threads
>  > (shared opcode tree, shared vars, etc).
>That reminds me, does anyone know what happened with the shared opcode
>tree?  Does it not work, or is it just dwarfed by the size of the
>non-shared stuff?  The size problems these guys are having seem to point
>to little or no sharing happening between threads.

I'm sure you know my PerlMonks article "Things yuu need to know 
before programming Perl ithreads" ( 
http://www.perlmonks.org/index.pl?node_id=288022 ).

I recently ran a little test that showed (at least to Devel::Size) 
that you have _at least_ about 250Kbyte of "data" that needs to be 
copied between threads if you _only_ do:

   use threads;
   use threads::shared;

And I'm not sure whether this number isn't too low, because I don't 
know for sure whether the CV's in the stash haven't been counted 
correctly.  If they were not, then you would come at about 400Kbyte 
of "data" for a _bare_ thread.

Loading a few modules, each with their initializations, add up _very_ 
quickly to several Mbytes of "data" that needs to be cloned _every_ 
time you start a thread.  And these are _not_ simple copies: all of 
the stashes need to be walked to make sure that all the [SAHC]V's are 
properly copied to the  thread's copy.  So it's taking a _lot_ of CPU 
as well...

So yes, in general I think you can say that the data copied for each 
thread, quickly dwarves whatever optrees are shared.


Liz

-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Perrin Harkins <pe...@elem.com>.
On Fri, 2004-01-09 at 14:52, Stas Bekman wrote:
> We really need more real world benchmarks to make a good judgement. It's 
> probably quite certain that the performance is going to be worse if you spawn 
> threads, but don't deploy the benefits available exclusively to threads 
> (shared opcode tree, shared vars, etc).

That reminds me, does anyone know what happened with the shared opcode
tree?  Does it not work, or is it just dwarfed by the size of the
non-shared stuff?  The size problems these guys are having seem to point
to little or no sharing happening between threads.

- Perrin


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Stas Bekman <st...@stason.org>.
Perrin Harkins wrote:
> On Fri, 2004-01-09 at 04:14, Stas Bekman wrote:
> 
>>Ah, sorry for chiming in again, it's true regarding the memory, but not that 
>>bad regarding performance. The only real performance overhead is to spawn a 
>>new perl interpreter (which is just terrible if you have many modules 
>>preloaded), which you can prespawn.
> 
> 
> I was actually thinking of how a 5.8 perl compiled with threads is about
> 15% slower than a non-threaded version. 

Yup, which is why, I did write:

"Once it's spawned the run-time performance should be a bit worse than a 
normal perl, bad not as bad as you made it sound  On the other hand you get 
different benefits from using threads, and depending on your application your 
overall performance could be even better using threads. Of course if you are 
on windows, you have no choice but to use threads."

We really need more real world benchmarks to make a good judgement. It's 
probably quite certain that the performance is going to be worse if you spawn 
threads, but don't deploy the benefits available exclusively to threads 
(shared opcode tree, shared vars, etc). Once people will start using threaded 
mpms, we will have more real world data.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Perrin Harkins <pe...@elem.com>.
On Fri, 2004-01-09 at 04:14, Stas Bekman wrote:
> Ah, sorry for chiming in again, it's true regarding the memory, but not that 
> bad regarding performance. The only real performance overhead is to spawn a 
> new perl interpreter (which is just terrible if you have many modules 
> preloaded), which you can prespawn.

I was actually thinking of how a 5.8 perl compiled with threads is about
15% slower than a non-threaded version.  However, this is 5.6 with
ithreads that we're talking about, so I don't really know how much
difference it makes.

- Perrin


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Stas Bekman <st...@stason.org>.
Perrin Harkins wrote:
> On Thu, 2004-01-08 at 20:22, Simon Clewer wrote:
> 
>>Huge memory usage ... each ithread uses about 10M  of ram ( image of Apache,
>>image of mod perl and image of our deep-link robot ), and as we use 5
>>ithreads plus the original thread that means that each Apache is using 60 M
>>and because we trade on being the best and the fastest at what we do we need
>>to keep plenty of Apaches ready and waiting ( about 20 ) - so we're using
>>heaps of memory.
> 
> 
> My question would be, why are you using Perl threads for this?  The talk
> about the 5.8 threads sounds pretty bad, both for memory and
> performance.  I can't imagine ithreads were a whole lot better on either
> front.  I think you'd be better off forking.

Ah, sorry for chiming in again, it's true regarding the memory, but not that 
bad regarding performance. The only real performance overhead is to spawn a 
new perl interpreter (which is just terrible if you have many modules 
preloaded), which you can prespawn. Once it's spawned the run-time performance 
should be a bit worse than a normal perl, bad not as bad as you made it sound 
;) On the other hand you get different benefits from using threads, and 
depending on your application your overall performance could be even better 
using threads. Of course if you are on windows, you have no choice but to use 
threads.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Tom Brettin <ts...@telomere.lanl.gov>.
I would second Perrin's comment, why use ithreads instead of forking and 
perhaps some socket communication. I've tested both fork and threads in a 
high performance envorinment, and fork is simply the better choice as 
long as lines of code is not your judgement criteria.

Best,
Tom


Thomas S. Brettin
Los Alamos National Laboratory

On 8 Jan 2004, Perrin Harkins wrote:

> On Thu, 2004-01-08 at 20:22, Simon Clewer wrote:
> > Huge memory usage ... each ithread uses about 10M  of ram ( image of Apache,
> > image of mod perl and image of our deep-link robot ), and as we use 5
> > ithreads plus the original thread that means that each Apache is using 60 M
> > and because we trade on being the best and the fastest at what we do we need
> > to keep plenty of Apaches ready and waiting ( about 20 ) - so we're using
> > heaps of memory.
> 
> My question would be, why are you using Perl threads for this?  The talk
> about the 5.8 threads sounds pretty bad, both for memory and
> performance.  I can't imagine ithreads were a whole lot better on either
> front.  I think you'd be better off forking.
> 
> > Does anybody know how we can reduce the amount of memory we use ?
> 
> If I were designing a system to do what you're doing, I would make it
> asynchronous.  Do the form interaction with mod_perl, then let another
> program do the data collection without typing up an apache process, and
> let your user wait on a "working..." page until it's done.
> 
> You can do this by forking, or by pushing a request onto a queue (like a
> database table or dbm file) that another process monitors.
> 
> The techniques for forking from mod_perl are described here:
> http://perl.apache.org/docs/1.0/guide/performance.html#Forking_and_Executing_Subprocesses_from_mod_perl
> 
> An example of a "working..." page is given in this column by Randal:
> http://www.stonehenge.com/merlyn/WebTechniques/col20.html
> 
> - Perrin
> 
> 
> -- 
> Reporting bugs: http://perl.apache.org/bugs/
> Mail list info: http://perl.apache.org/maillist/modperl.html
> 
> 

-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Perrin Harkins <pe...@elem.com>.
On Thu, 2004-01-08 at 20:22, Simon Clewer wrote:
> Huge memory usage ... each ithread uses about 10M  of ram ( image of Apache,
> image of mod perl and image of our deep-link robot ), and as we use 5
> ithreads plus the original thread that means that each Apache is using 60 M
> and because we trade on being the best and the fastest at what we do we need
> to keep plenty of Apaches ready and waiting ( about 20 ) - so we're using
> heaps of memory.

My question would be, why are you using Perl threads for this?  The talk
about the 5.8 threads sounds pretty bad, both for memory and
performance.  I can't imagine ithreads were a whole lot better on either
front.  I think you'd be better off forking.

> Does anybody know how we can reduce the amount of memory we use ?

If I were designing a system to do what you're doing, I would make it
asynchronous.  Do the form interaction with mod_perl, then let another
program do the data collection without typing up an apache process, and
let your user wait on a "working..." page until it's done.

You can do this by forking, or by pushing a request onto a queue (like a
database table or dbm file) that another process monitors.

The techniques for forking from mod_perl are described here:
http://perl.apache.org/docs/1.0/guide/performance.html#Forking_and_Executing_Subprocesses_from_mod_perl

An example of a "working..." page is given in this column by Randal:
http://www.stonehenge.com/merlyn/WebTechniques/col20.html

- Perrin


-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Elizabeth Mattijsen <li...@dijkmat.nl>.
At 01:22 +0000 1/9/04, Simon Clewer wrote:
>We are simply running out of memory, which is sad because we are nowhere
>near running out of processor and it grieves me to simply use a bigger
>server when it seems that smarter could solve the problem. Other than that
>things are working very nicely and the site serves quickly and reliably -
>see at www.superquote.com ( despite the warnings that ithreads is not yet
>safe for commercial use ).

You might want to have a look at my forks.pm module if you have CPU 
to spare and wouldn't suffer from extra latency in communication 
between threads.  And you wouldn't need a threaded Perl, which would 
save you some % of CPU usage.


Liz

-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Aleksandr Guidrevitch <pi...@tut.by>.
Hi Simon,

Simon Clewer wrote

>Hi,
>
>We're using ithreads with modperl to run some complicated robots
>concurrently ( in a commercial environment ) - there is however an issue.
>
>Huge memory usage ... each ithread uses about 10M  of ram ( image of Apache,
>image of mod perl and image of our deep-link robot ), and as we use 5
>ithreads plus the original thread that means that each Apache is using 60 M
>and because we trade on being the best and the fastest at what we do we need
>to keep plenty of Apaches ready and waiting ( about 20 ) - so we're using
>heaps of memory.
>  
>
I can suggest  to create a frontend proxy with mod_proxy or mod_accel (a 
better aproach) -
You can keep 2-3 mod_perl processes on a backend server, and 20+ on the 
frontend,
each apache+mod_proxy process will be around 1,5Mb. With 2 Gigs on board 
you can run
20+ mod_perls and 200+ frontent server processes.

Also, i would like to suggest not to fork robots directly from apache,
but make them running as another preforked server. Your apache then
should display a refreshing page ("please wait") until a result is fetched
into cache by some of the robots.
That also means you should intensively cache robot's results where possible.

Surely, I can be completely wrong since I don't know all the details.
This is just a suggestion :)

Sincerely,
Alex



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: ithreads with modperl

Posted by Ged Haywood <ge...@www2.jubileegroup.co.uk>.
Hi there,

I'd echo what Perrin has said, but if only for the archives I need to ask:

On Fri, 9 Jan 2004, Simon Clewer wrote:

> [snip] we're using heaps of memory.
> 
> Does anybody know how we can reduce the amount of memory we use ? - is there
> some smart way to actually share the images.

There are some tips in the mod_perl Guide, have you seen them?

> Up to now the problem has been OK because we simply paid to put 2
> Gig on board, but we had nearly 1000 users yesterday

Not concurrently? :)

> We're on a 2.4.20 Linux with Apache 1.3.28  and mod_perl 1.28

Can you be a bit more specific about the kernel?  You've seen

http://www.gossamer-threads.com/perl/mailarc/gforum.cgi?post=99891;search_string=shared%20children;guest=1993236&t=search_engine#99891

I take it?

It seems to me that your application has grown out of control.  You
need to work on a redesign (which should not limit its scope purely to
the technical issues) but obviously the customers are coming in *now*
so in parallel with firefighting you have to maintain what's there.

I'm sure there's no need to hammer the memory consumption like that.
There's absolutely no need to use a 60M process to get a user to fill
in a form.  Have you considered using multiple servers?  Some people
use a light frontend/heavy backend approach, some use mod_perl servers
for both front and back ends.  You might be able to split the codebase
that way without too much disruption.

Is the load growing rapidly?  Seems to me you aren't far from the
point where you'd benefit from load balancing and failover, even if
only for your peace of mind.  OTOH that's a whole new can of worms.

73,
Ged.



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html