You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apr.apache.org by Mladen Turk <mt...@apache.org> on 2009/03/26 12:29:35 UTC

Poor performance with new apr_pool

Just did some quick bench of the current trunk,
and the results are not very much encouraging.

static void test_performance(abts_case *tc, void *data)
{
     apr_status_t rv;
     int i;
     void *m;
     apr_time_t end, now = apr_time_now();

     for (i = 0; i < 1000000; i++) {
         apr_pool_t *p;

         apr_pool_create(&p, NULL);
         m = apr_palloc(p, 1234);
         apr_pool_destroy(p);

     }
     end = apr_time_now();

     printf("\n\nFinished in %ld\n", end - now);

}

Apr 1.4
Finished in 371490

Apr trunk
Finished in 1359345

So the trunk takes almost 4 times longer to finish!

I thought that the problem might be the fact that
each pool creates its own mutex, well if I
#if 0
     (void)apr_thread_mutex_create(&pool->mutex,
                                   APR_THREAD_MUTEX_NESTED, pool);
#endif
inside apr_pool_create_ex, the results are better
Finished in 798145

However this is still twice slower!

Tests performed on
Linux 2.6.18-92.1.13.el5 #1 SMP Thu Sep 4 03:51:01 EDT 2008 i686 i686 i386 GNU/Linux



Regards
-- 
^(TM)

Re: Poor performance with new apr_pool

Posted by Jim Jagielski <ji...@jaguNET.com>.

On Mar 26, 2009, at 11:24 AM, Joe Orton wrote:

> On Thu, Mar 26, 2009 at 03:10:56PM +0100, Mladen Turk wrote:
>> What's the point?
>
> The null hypothesis is: modern malloc implementations do exactly the
> same optimisation work (e.g. maintaining freelists) that we  
> duplicate in
> APR pools.  By avoiding that duplication, and relying on malloc
> optimisation, we might get better/equivalent performance whilst  
> reducing
> the complexity of APR.
>
> So, we're testing that hypothesis.  If it's shown to be false, then,  
> we
> revert back to the old allocator.  That doesn't mean it's not worth
> trying.
>
> Also, I think it would be more useful to benchmark something like
> Subversion's "make check", or an httpd load test.
>
>

+1

Re: Poor performance with new apr_pool

Posted by Mladen Turk <mt...@apache.org>.

Joe Orton wrote:
 > On Thu, Mar 26, 2009 at 03:10:56PM +0100, Mladen Turk wrote:
 >> What's the point?
 >
 > The null hypothesis is: modern malloc implementations do exactly the
 > same optimisation work (e.g. maintaining freelists) that we duplicate in
 > APR pools.  By avoiding that duplication, and relying on malloc
 > optimisation, we might get better/equivalent performance whilst reducing
 > the complexity of APR.
 >

That's all true, but the pool's purpose is not to be malloc replacement.
This new concept remembers the allocated chunks of data,
so it's pretty unreliable regarding performance predictability.
Smaller chunks cause larger memory usage, because you have
to remember many tiny pointers. With things like concatenating
8 char string, you actually have doubled the memory usage
(8+ bytes for data depending on the malloc implementation +
  4(8) bytes for storing this pointer)
So the memory doesn't depend on the allocated size only,
but on the allocated quantity as well.

That's why I think this new concept doesn't fit in the
apr_pool usage entirely. Well at least for the string, table and
hash operations which are hugely used across httpd.

However, for non string operations like managing
system objects, it probably makes no difference.

> 
> Also, I think it would be more useful to benchmark something like 
> Subversion's "make check", or an httpd load test.
> 

Probably.

Regards
-- 
^(TM)

Re: Poor performance with new apr_pool

Posted by Branko Čibej <br...@xbc.nu>.

Joe Orton wrote:
> On Thu, Mar 26, 2009 at 03:10:56PM +0100, Mladen Turk wrote:
>   
>> What's the point?
>>     
>
> The null hypothesis is: modern malloc implementations do exactly the 
> same optimisation work (e.g. maintaining freelists) that we duplicate in 
> APR pools.  By avoiding that duplication, and relying on malloc 
> optimisation, we might get better/equivalent performance whilst reducing 
> the complexity of APR.
>
> So, we're testing that hypothesis.  If it's shown to be false, then, we 
> revert back to the old allocator.  That doesn't mean it's not worth 
> trying.
>   

Nah, we're not testing that hypothesis. We're testing the hypothesis
that "most recent C libraries have a modern malloc implementation",
which is clearly false. And we're ignoring not-so-recent C libraries,
like Mladen't RHEL-4 example.

If someone is certain that she can do better than pools, she can still
write her own allocator and use that.

> Also, I think it would be more useful to benchmark something like 
> Subversion's "make check", or an httpd load test.
>   

Subversion's "make check" is strongly I/O-bound, so timings would
probably not show a thing. We don't have any load tests, I'm afraid.

-- Brane

Re: Poor performance with new apr_pool

Posted by Joe Orton <jo...@redhat.com>.

On Thu, Mar 26, 2009 at 03:10:56PM +0100, Mladen Turk wrote:
> What's the point?

The null hypothesis is: modern malloc implementations do exactly the 
same optimisation work (e.g. maintaining freelists) that we duplicate in 
APR pools.  By avoiding that duplication, and relying on malloc 
optimisation, we might get better/equivalent performance whilst reducing 
the complexity of APR.

So, we're testing that hypothesis.  If it's shown to be false, then, we 
revert back to the old allocator.  That doesn't mean it's not worth 
trying.

Also, I think it would be more useful to benchmark something like 
Subversion's "make check", or an httpd load test.

joe

Re: Poor performance with new apr_pool

Posted by Mladen Turk <mt...@apache.org>.

Paul Querna wrote:
> Attached is a program that you can use for this.
> 
> please upgrade to trunk, i've eliminated some callocs, and switched
> them to malloc where possible.
> 
> compile with:
> gcc -o pspeed13 `apr-1-config   --link-ld     --cppflags    --cflags
>    --includes    --ldflags    --libs `  poolspeed.c

           VIRT   RES
Apr 1.4   10644  4288
Trunk     13288  9296

So the memory usage is doubled, the speed is
seven (!) times lower. Your numbers on other
platforms are better then on RHEL, but still
cca 3 times less.
We are not talking about percentages, we are
talking about hundreds of them.

What's the point?

Regards
-- 
^(TM)

Re: Poor performance with new apr_pool

Posted by Mladen Turk <mt...@apache.org>.

Paul Querna wrote:
> Attached is a program that you can use for this.
> 
> please upgrade to trunk, i've eliminated some callocs, and switched
> them to malloc where possible.
> 
> compile with:
> gcc -o pspeed13 `apr-1-config   --link-ld     --cppflags    --cflags
>    --includes    --ldflags    --libs `  poolspeed.c

The results are better but still 7.5 times slower the apr 1.4 :(

Apr 1.4 	takes 2 minutes
Trunk r758624	takes 15 minutes to finish

Didn't do memory profiling, but it looks to me that memory usage
is much higher.
With apr 1.4 the test program shows the constant 3.6MB memory usage.
With trunk it cycles from 4.4 to 9.6 MB.

One pragmatic question.
What was wrong with the good old implementation?

Regards
-- 
^(TM)

Re: Poor performance with new apr_pool

Posted by Mladen Turk <mt...@apache.org>.

Paul Querna wrote:
> On Thu, Mar 26, 2009 at 6:05 PM, Branko Čibej <br...@xbc.nu> wrote:
>>
>> We have JNI bindings for Subversion, which uses APR, whose packaging and
>> compilation options we don't control. *boom*
> 
> That is only talking about loading tcmalloc using the normal library.
> 
> you can compile tcmalloc to not use the malloc symbol -- APR would
> have to compile it with different symbol names if we were to bundle
> it.
>

Remind me why we don't bundle expat or pcre :)

Cheers
-- 
^(TM)

Re: Poor performance with new apr_pool

Posted by Ruediger Pluem <rp...@apache.org>.

On 27.03.2009 14:00, Jim Jagielski wrote:
> 
> On Mar 26, 2009, at 7:51 PM, Justin Erenkrantz wrote:
> 
>> 2009/3/26 Branko Čibej <br...@xbc.nu>:
>>> Maybe it's just me, but all that seems like a monumental waste of time.
>>
>> If we can't beat the old system by COB tomorrow consistently, then I
>> think we can simply revert it or we add tcmalloc as a compile-time
>> option if it's not too complex to use that.
> 
> Before we do that, why not spin off a dev branch of what we currently have
> so those of us interested in profiling can still do some work
> on it...
> 
> 

+1

Regards

Rüdiger

Re: Poor performance with new apr_pool

Posted by Jim Jagielski <ji...@jagunet.com>.

On Apr 9, 2009, at 3:32 PM, Bill Stoddard wrote:

> Jim Jagielski wrote:
>>
>> On Apr 8, 2009, at 8:33 PM, Jeff Trawick wrote:
>>
>>> On Wed, Apr 8, 2009 at 11:00 AM, Jim Jagielski <ji...@jagunet.com>  
>>> wrote:
>>>
>>> On Mar 30, 2009, at 3:02 PM, Jim Jagielski wrote:
>>>
>>> Yep... I will try to recreate on Ubuntu in addition to the
>>> OS X testing.
>>>
>>>
>>> OK.... This seems unrelated (maybe) to the new pool...
>>>
>>> The cause for the slowdown was due to httpd constantly dumping
>>> core. Looking at the cores, I see
>>>
>>> (gdb) where
>>> #0  0x9035eeda in read$UNIX2003 ()
>>>
>>> read() is a pretty safe place to sit; what about other threads?
>>>
>>
>> That's the one that dumped...
> You sure about that?
>

Yep. Which is what's weird.

Re: Poor performance with new apr_pool

Posted by Bill Stoddard <wg...@gmail.com>.

Jim Jagielski wrote:
>
> On Apr 8, 2009, at 8:33 PM, Jeff Trawick wrote:
>
>> On Wed, Apr 8, 2009 at 11:00 AM, Jim Jagielski <ji...@jagunet.com> wrote:
>>
>> On Mar 30, 2009, at 3:02 PM, Jim Jagielski wrote:
>>
>> Yep... I will try to recreate on Ubuntu in addition to the
>> OS X testing.
>>
>>
>> OK.... This seems unrelated (maybe) to the new pool...
>>
>> The cause for the slowdown was due to httpd constantly dumping
>> core. Looking at the cores, I see
>>
>> (gdb) where
>> #0  0x9035eeda in read$UNIX2003 ()
>>
>> read() is a pretty safe place to sit; what about other threads?
>>
>
> That's the one that dumped...
You sure about that?

Bill

Re: Poor performance with new apr_pool

Posted by Jim Jagielski <ji...@jaguNET.com>.

On Apr 8, 2009, at 8:33 PM, Jeff Trawick wrote:

> On Wed, Apr 8, 2009 at 11:00 AM, Jim Jagielski <ji...@jagunet.com>  
> wrote:
>
> On Mar 30, 2009, at 3:02 PM, Jim Jagielski wrote:
>
> Yep... I will try to recreate on Ubuntu in addition to the
> OS X testing.
>
>
> OK.... This seems unrelated (maybe) to the new pool...
>
> The cause for the slowdown was due to httpd constantly dumping
> core. Looking at the cores, I see
>
> (gdb) where
> #0  0x9035eeda in read$UNIX2003 ()
>
> read() is a pretty safe place to sit; what about other threads?
>

That's the one that dumped...

Re: Poor performance with new apr_pool

Posted by Jeff Trawick <tr...@gmail.com>.

On Wed, Apr 8, 2009 at 11:00 AM, Jim Jagielski <ji...@jagunet.com> wrote:

>
> On Mar 30, 2009, at 3:02 PM, Jim Jagielski wrote:
>
>>
>> Yep... I will try to recreate on Ubuntu in addition to the
>> OS X testing.
>>
>>
> OK.... This seems unrelated (maybe) to the new pool...
>
> The cause for the slowdown was due to httpd constantly dumping
> core. Looking at the cores, I see
>
> (gdb) where
> #0  0x9035eeda in read$UNIX2003 ()


read() is a pretty safe place to sit; what about other threads?

Re: Poor performance with new apr_pool

Posted by Jim Jagielski <ji...@jaguNET.com>.

On Mar 30, 2009, at 3:02 PM, Jim Jagielski wrote:
>
> Yep... I will try to recreate on Ubuntu in addition to the
> OS X testing.
>

OK.... This seems unrelated (maybe) to the new pool...

The cause for the slowdown was due to httpd constantly dumping
core. Looking at the cores, I see

(gdb) where
#0  0x9035eeda in read$UNIX2003 ()
#1  0x000e13cc in ap_event_pod_check (pod=0x34ad10) at pod.c:56
#2  0x000de1d7 in child_main (child_num_arg=2) at event.c:1797
#3  0x000de388 in make_child (s=0x319120, slot=2) at event.c:1883
#4  0x000de9eb in perform_idle_server_maintenance () at event.c:2096
#5  0x000dec7a in server_main_loop (remaining_children_to_start=0) at  
event.c:2200
#6  0x000dee75 in event_run (_pconf=0x300590, plog=0x315fa0,  
s=0x319120) at event.c:2258
#7  0x00015865 in ap_run_mpm (pconf=0x300590, plog=0x315fa0,  
s=0x319120) at mpm_common.c:88
#8  0x0000acd8 in main (argc=7, argv=0xbffff36c) at main.c:781

This seems to only happen in Darwin. Looking into other
1.4/2.0 related changes that would affect this

Re: Poor performance with new apr_pool

Posted by Jim Jagielski <ji...@jaguNET.com>.

On Mar 30, 2009, at 2:57 PM, Ruediger Pluem wrote:

>
>
> On 03/30/2009 08:20 PM, Jim Jagielski wrote:
>> OK... I've tested httpd-trunk (head) with apr-1.4 (head)
>> and apr-trunk (head). This is on OS X (10.5.6) with both
>> worker and event MPM. I've also tried different default
>> mutexes (OSX is SysV, but I've also forced fcntl) and the
>> rub is that when using the test framework, the
>> t/module/dir.t test is very, very VERY slow.
>>
>> With apr-1.4 I get FULL framework test times of:
>>
>>  t/TEST > /dev/null 2>&1  20.24s user 5.02s system 53% cpu 46.932  
>> total
>>  t/TEST > /dev/null 2>&1  20.39s user 5.08s system 54% cpu 46.692  
>> total
>>
>> with apr-2.0, I get times like
>>
>>  t/TEST t/modules/dir.t > /dev/null 2>&1  1.22s user 0.49s system 0%
>> cpu 7:17.12 total
>>
>> *just* for the dir.t test.
>>
>> When I get back from San Mateo I intend to enable pool debugging and
>> doing some profiling/tracing
>
> Still weird as I cannot see the same on Linux with pthread used as a  
> mutex.
> I assume your APR 2.0 included r759519, correct?
>

Yep... I will try to recreate on Ubuntu in addition to the
OS X testing.

Re: Poor performance with new apr_pool

Posted by Ruediger Pluem <rp...@apache.org>.


On 03/30/2009 08:20 PM, Jim Jagielski wrote:
> OK... I've tested httpd-trunk (head) with apr-1.4 (head)
> and apr-trunk (head). This is on OS X (10.5.6) with both
> worker and event MPM. I've also tried different default
> mutexes (OSX is SysV, but I've also forced fcntl) and the
> rub is that when using the test framework, the
> t/module/dir.t test is very, very VERY slow.
> 
> With apr-1.4 I get FULL framework test times of:
> 
>   t/TEST > /dev/null 2>&1  20.24s user 5.02s system 53% cpu 46.932 total
>   t/TEST > /dev/null 2>&1  20.39s user 5.08s system 54% cpu 46.692 total
> 
> with apr-2.0, I get times like
> 
>   t/TEST t/modules/dir.t > /dev/null 2>&1  1.22s user 0.49s system 0%
> cpu 7:17.12 total
> 
> *just* for the dir.t test.
> 
> When I get back from San Mateo I intend to enable pool debugging and
> doing some profiling/tracing

Still weird as I cannot see the same on Linux with pthread used as a mutex.
I assume your APR 2.0 included r759519, correct?

Regards

Rüdiger

Re: Poor performance with new apr_pool

Posted by Jim Jagielski <ji...@jagunet.com>.

OK... I've tested httpd-trunk (head) with apr-1.4 (head)
and apr-trunk (head). This is on OS X (10.5.6) with both
worker and event MPM. I've also tried different default
mutexes (OSX is SysV, but I've also forced fcntl) and the
rub is that when using the test framework, the
t/module/dir.t test is very, very VERY slow.

With apr-1.4 I get FULL framework test times of:

   t/TEST > /dev/null 2>&1  20.24s user 5.02s system 53% cpu 46.932  
total
   t/TEST > /dev/null 2>&1  20.39s user 5.08s system 54% cpu 46.692  
total

with apr-2.0, I get times like

   t/TEST t/modules/dir.t > /dev/null 2>&1  1.22s user 0.49s system 0%  
cpu 7:17.12 total

*just* for the dir.t test.

When I get back from San Mateo I intend to enable pool debugging and
doing some profiling/tracing

Re: Poor performance with new apr_pool

Posted by Sander Striker <s....@striker.nl>.

On Fri, Mar 27, 2009 at 4:40 PM, Jim Jagielski <ji...@jagunet.com> wrote:
>
> On Mar 27, 2009, at 11:17 AM, Mladen Turk wrote:
>
>> Jim Jagielski wrote:
>>>
>>> On Mar 26, 2009, at 7:51 PM, Justin Erenkrantz wrote:
>>>>
>>>> 2009/3/26 Branko Čibej <br...@xbc.nu>:
>>>>>
>>>>> Maybe it's just me, but all that seems like a monumental waste of time.
>>>>
>>>> If we can't beat the old system by COB tomorrow consistently, then I
>>>> think we can simply revert it or we add tcmalloc as a compile-time
>>>> option if it's not too complex to use that.
>>>
>>> Before we do that, why not spin off a dev branch of what we currently
>>> have
>>> so those of us interested in profiling can still do some work
>>> on it...
>>
>> Sure, you can do that.
>>
>> The problem with design concept will however stay.
>> apr_pool and malloc/free simply don't fit with each other.
>>
>
> Well, kind of depends on what apr_pool actually is :)
>
> BTW: shared this with some of the team. Did a framework test
> against httpd-trunk, apr-trunk and apr-1.4. Using apr-trunk
> the test absolutely fell down when doing t/modules/dir
>
> The current thinking is that it's the whole subpool crud

The whole subpool crud?  Can you clarify?  Do you mean the
"hierarchical lifetime management"?

Cheers,

Sander

Re: Poor performance with new apr_pool

Posted by Jim Jagielski <ji...@jagunet.com>.

On Mar 27, 2009, at 11:17 AM, Mladen Turk wrote:

> Jim Jagielski wrote:
>> On Mar 26, 2009, at 7:51 PM, Justin Erenkrantz wrote:
>>> 2009/3/26 Branko Čibej <br...@xbc.nu>:
>>>> Maybe it's just me, but all that seems like a monumental waste of  
>>>> time.
>>>
>>> If we can't beat the old system by COB tomorrow consistently, then I
>>> think we can simply revert it or we add tcmalloc as a compile-time
>>> option if it's not too complex to use that.
>> Before we do that, why not spin off a dev branch of what we  
>> currently have
>> so those of us interested in profiling can still do some work
>> on it...
>
> Sure, you can do that.
>
> The problem with design concept will however stay.
> apr_pool and malloc/free simply don't fit with each other.
>

Well, kind of depends on what apr_pool actually is :)

BTW: shared this with some of the team. Did a framework test
against httpd-trunk, apr-trunk and apr-1.4. Using apr-trunk
the test absolutely fell down when doing t/modules/dir

The current thinking is that it's the whole subpool crud

Re: Poor performance with new apr_pool

Posted by Mladen Turk <mt...@apache.org>.

Jim Jagielski wrote:
> 
> On Mar 26, 2009, at 7:51 PM, Justin Erenkrantz wrote:
> 
>> 2009/3/26 Branko Čibej <br...@xbc.nu>:
>>> Maybe it's just me, but all that seems like a monumental waste of time.
>>
>> If we can't beat the old system by COB tomorrow consistently, then I
>> think we can simply revert it or we add tcmalloc as a compile-time
>> option if it's not too complex to use that.
> 
> Before we do that, why not spin off a dev branch of what we currently have
> so those of us interested in profiling can still do some work
> on it...
> 

Sure, you can do that.

The problem with design concept will however stay.
apr_pool and malloc/free simply don't fit with each other.

We should however work on allowing the initial
allocator code to be resized. In theory with zero
size each apr_palloc would end in malloc call.
What's left is to effectively merge the memnode with
block_list and everyone happy :)

Regards
-- 
^(TM)

Re: Poor performance with new apr_pool

Posted by Jim Jagielski <ji...@jaguNET.com>.

On Mar 26, 2009, at 7:51 PM, Justin Erenkrantz wrote:

> 2009/3/26 Branko Čibej <br...@xbc.nu>:
>> Maybe it's just me, but all that seems like a monumental waste of  
>> time.
>
> If we can't beat the old system by COB tomorrow consistently, then I
> think we can simply revert it or we add tcmalloc as a compile-time
> option if it's not too complex to use that.

Before we do that, why not spin off a dev branch of what we currently  
have
so those of us interested in profiling can still do some work
on it...

Re: Poor performance with new apr_pool

Posted by Branko Čibej <br...@xbc.nu>.

Justin Erenkrantz wrote:
> 2009/3/26 Branko Čibej <br...@xbc.nu>:
>   
>> Maybe it's just me, but all that seems like a monumental waste of time.
>>     
>
> If we can't beat the old system by COB tomorrow consistently, then I
> think we can simply revert it or we add tcmalloc as a compile-time
> option if it's not too complex to use that.  Either way, it's not that
> big of a deal - and we've spent more time testing it than it did to
> code it.
>   

Well the message I got was, "we're ripping this ancient slow pool stuff
out 'cause malloc is faster."

> Many many folks had claimed that libc's had gotten a lot better - if
> we've now proven they haven't, then that's very useful information and
> we can go back to what we had.  The last time we had really touched
> the pool code was back in 2001, so it was reasonable to explore
> whether or not things had fundamentally changed.  -- justin
>   

Oh, libc's *have* gotten a lot better, at least some of those I've had
the misfortune to be acquainted with. :-P But there's a difference
between "better" and "good".


-- Brane

Re: Poor performance with new apr_pool

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.

2009/3/26 Branko Čibej <br...@xbc.nu>:
> Maybe it's just me, but all that seems like a monumental waste of time.

If we can't beat the old system by COB tomorrow consistently, then I
think we can simply revert it or we add tcmalloc as a compile-time
option if it's not too complex to use that.  Either way, it's not that
big of a deal - and we've spent more time testing it than it did to
code it.

Many many folks had claimed that libc's had gotten a lot better - if
we've now proven they haven't, then that's very useful information and
we can go back to what we had.  The last time we had really touched
the pool code was back in 2001, so it was reasonable to explore
whether or not things had fundamentally changed.  -- justin

Re: Poor performance with new apr_pool

Posted by Bojan Smojver <bo...@rexursive.com>.

On Thu, 2009-03-26 at 21:32 +0100, Branko Čibej wrote:
> Maybe it's just me, but all that seems like a monumental waste of
> time.

I don't really have the right to comment here, because I put zero effort
into this, but it does look that way, doesn't it?

On the other hand, there are patches that enable configurable minimal
pool size. Is that something that we should maybe spend some effort on
integrating? It may be good for situations where large number of pools
needs to be created without using too much memory.

https://issues.apache.org/bugzilla/show_bug.cgi?id=40939

-- 
Bojan

Re: Poor performance with new apr_pool

Posted by Branko Čibej <br...@xbc.nu>.

Paul Querna wrote:
> On Thu, Mar 26, 2009 at 6:05 PM, Branko Čibej <br...@xbc.nu> wrote:
>   
>> Paul Querna wrote:
>>     
>>>> FreeBSD 7.1-STABLE (amd opterons; people.apache.org):
>>>> APR 1.3: 242582048
>>>> APR 2.0: 537562071 (+221%)
>>>>
>>>>         
>>> Same FreeBSD 7.1 machine, using tcmalloc[1]
>>> APR 1.3: 243307182
>>> APR 2.0: 214131712 (-22%)
>>>
>>> I think we should consider bundling tcmalloc, or making it a compile
>>> time option.
>>>
>>>       
>>    For some systems, TCMalloc may not work correctly on with
>>    applications that aren't linked against libpthread.so (or the
>>    equivalent on your OS). It should work on Linux using glibc 2.3, but
>>    other OS/libc combinations have not been tested.
>>
>> *shudder*
>>
>> and:
>>
>>    Don't try to load TCMalloc into a running binary (e.g., using JNI in
>>    Java programs). The binary will have allocated some objects using
>>    the system malloc, and may try to pass them to TCMalloc for
>>    deallocation. TCMalloc will not be able to handle such objects.
>>
>> We have JNI bindings for Subversion, which uses APR, whose packaging and
>> compilation options we don't control. *boom*
>>     
>
> That is only talking about loading tcmalloc using the normal library.
>
> you can compile tcmalloc to not use the malloc symbol -- APR would
> have to compile it with different symbol names if we were to bundle
> it.
>   

I don't get it. You threw out one allocator implementation, replaced it
with another one that uses malloc with higher granularity, now you're
thinking about bundling a fast malloc implementation to fix the
performance regression that your first change caused.

Maybe it's just me, but all that seems like a monumental waste of time.

-- Brane

Re: Poor performance with new apr_pool

Posted by Paul Querna <pa...@querna.org>.

On Thu, Mar 26, 2009 at 6:05 PM, Branko Čibej <br...@xbc.nu> wrote:
> Paul Querna wrote:
>>> FreeBSD 7.1-STABLE (amd opterons; people.apache.org):
>>> APR 1.3: 242582048
>>> APR 2.0: 537562071 (+221%)
>>>
>>
>>
>> Same FreeBSD 7.1 machine, using tcmalloc[1]
>> APR 1.3: 243307182
>> APR 2.0: 214131712 (-22%)
>>
>> I think we should consider bundling tcmalloc, or making it a compile
>> time option.
>>
>
>    For some systems, TCMalloc may not work correctly on with
>    applications that aren't linked against libpthread.so (or the
>    equivalent on your OS). It should work on Linux using glibc 2.3, but
>    other OS/libc combinations have not been tested.
>
> *shudder*
>
> and:
>
>    Don't try to load TCMalloc into a running binary (e.g., using JNI in
>    Java programs). The binary will have allocated some objects using
>    the system malloc, and may try to pass them to TCMalloc for
>    deallocation. TCMalloc will not be able to handle such objects.
>
> We have JNI bindings for Subversion, which uses APR, whose packaging and
> compilation options we don't control. *boom*

That is only talking about loading tcmalloc using the normal library.

you can compile tcmalloc to not use the malloc symbol -- APR would
have to compile it with different symbol names if we were to bundle
it.

Re: Poor performance with new apr_pool

Posted by Branko Čibej <br...@xbc.nu>.

Paul Querna wrote:
>> FreeBSD 7.1-STABLE (amd opterons; people.apache.org):
>> APR 1.3: 242582048
>> APR 2.0: 537562071 (+221%)
>>     
>
>
> Same FreeBSD 7.1 machine, using tcmalloc[1]
> APR 1.3: 243307182
> APR 2.0: 214131712 (-22%)
>
> I think we should consider bundling tcmalloc, or making it a compile
> time option.
>   

    For some systems, TCMalloc may not work correctly on with
    applications that aren't linked against libpthread.so (or the
    equivalent on your OS). It should work on Linux using glibc 2.3, but
    other OS/libc combinations have not been tested.

*shudder*

and:

    Don't try to load TCMalloc into a running binary (e.g., using JNI in
    Java programs). The binary will have allocated some objects using
    the system malloc, and may try to pass them to TCMalloc for
    deallocation. TCMalloc will not be able to handle such objects.

We have JNI bindings for Subversion, which uses APR, whose packaging and
compilation options we don't control. *boom*

So tell me again, why do you have to rip out perfectly good pool code
and replace it with something slower and/or less portable?

> This shows that many libc have crappy malloc....
>   

Why am I not surprised? :) Come on, we've known that for ages, it's one
of the major reasons why we have pools in APR.

-- Brane

Re: Poor performance with new apr_pool

Posted by Paul Querna <pa...@querna.org>.

> FreeBSD 7.1-STABLE (amd opterons; people.apache.org):
> APR 1.3: 242582048
> APR 2.0: 537562071 (+221%)


Same FreeBSD 7.1 machine, using tcmalloc[1]
APR 1.3: 243307182
APR 2.0: 214131712 (-22%)

I think we should consider bundling tcmalloc, or making it a compile
time option.

This shows that many libc have crappy malloc....

[1] - http://goog-perftools.sourceforge.net/

Re: Poor performance with new apr_pool

Posted by Paul Querna <pa...@querna.org>.

OSX 10.5.6 (MBP, Intel Core 2 2.8 Ghz):
APR 1.3: 66583433
APR 2.0: 211770082 (+318%)

FreeBSD 7.1-STABLE (amd opterons; people.apache.org):
APR 1.3: 242582048
APR 2.0: 537562071 (+221%)

Solaris 10 (sparc t2000; eos.apache.org):
APR 1.3: 425809665
APR 2.0: 1344579431 (+315%)

I'd be interested in seeing tcmalloc results if anyone has that setup
on a linux box, I was hoping that newer jemalloc on freebsd 7.x would
be closer, but it still shows better results than the default glibc
malloc.



On Thu, Mar 26, 2009 at 1:49 PM, Paul Querna <pa...@querna.org> wrote:
> Attached is a program that you can use for this.
>
> please upgrade to trunk, i've eliminated some callocs, and switched
> them to malloc where possible.
>
> compile with:
> gcc -o pspeed13 `apr-1-config   --link-ld     --cppflags    --cflags
>   --includes    --ldflags    --libs `  poolspeed.c
>

Re: Poor performance with new apr_pool

Posted by Paul Querna <pa...@querna.org>.

Attached is a program that you can use for this.

please upgrade to trunk, i've eliminated some callocs, and switched
them to malloc where possible.

compile with:
gcc -o pspeed13 `apr-1-config   --link-ld     --cppflags    --cflags
   --includes    --ldflags    --libs `  poolspeed.c

Re: Poor performance with new apr_pool

Posted by Mladen Turk <mt...@apache.org>.

Mladen Turk wrote:
> Just did some quick bench of the current trunk,
> and the results are not very much encouraging.
> 

Inner loops are even worse.
Calling 10000 x 32 bytes allocations
is more then 10 times slower.

This takes less then a second on my box to finish
with apr 1.4 and 11 seconds with trunk :)

for (i = 0; i < 10000; i++) {
     apr_pool_t *p;

     apr_pool_create(&p, NULL);
     for (j = 0; j < 10000; j++) {
         m = apr_palloc(p, 32);
     }
     apr_pool_destroy(p);
}


Regards
-- 
^(TM)