You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Alexander Klimetschek <al...@mindquarry.com> on 2006/11/01 20:06:48 UTC

Performance with blocks protocol

Hi Daniel,

we are noticing that the blocks protocol is currently quite slow. We did 
not yet got a proper profiling setup, so I am asking if you know what 
the bottlenecks might be and how to resolve them.

One thing might be the logging of all requests and apart from commenting 
them out I did not manage to turn them off, since they use the 
servletcontext logger and I don't know how to configure that one 
specifically to be more silent.

And when using the blocks-protocol together with cforms in the 
"backend"-block, we quickly get java.lang.OutOfMemory errors (when 
opening or reloading a form about 10 times). It looks like some 
continuations are still open, but can it be that they are much larger in 
the blocks case? Or am I doing something terribly wrong? The 
continuation request is passed on to the super-block via an URI that 
contains all request params (via that InputModule I talked about) and 
then the continuation is called inside that block.


Alex

-- 
Alexander Klimetschek
http://www.mindquarry.com


Re: Performance with blocks protocol

Posted by Alexander Klimetschek <al...@mindquarry.com>.
Daniel Fagerstrom schrieb:
> I don't find any reasons for that they should be larger in the blocks 
> case. Again profiling would give some clues.
> 
> One possibility is that there is some fault in the component handling so 
> that the block servlet component or tree processor is recreated instead 
> of reused in some cases.

I have done some profiling with JProfiler. First of all there is no 
obvious leak (due to the massive amount of reflection that happens 
inside the sitemap). The heap simply grows and grows even with simple 
page requests without much dynamics like forms. The garbage collection 
does not free much, so you have a linear growth up to something like 70 
MB, when the maximum is reached and an error is thrown.

But from what I have seen, I suppose that the BufferedOutputStream of 
resources (css, images, javascript), which we handle via a 
ResourceReader that calls a super block, which contains all the 
resources, are not properly closed or removed. That stream holds all the 
data in a byte array. Can it be that nobody actually closes or clears 
the OutputStream that is present when the ResourceReader gets a 
BlockSource as input?

I.e. the call looks like: URL -> DispatcherServlet -> BlockServlet -> 
sitemap.SitemapServlet -> internal processing -> ResourceReader -> 
BlockServlet -> SitemapServlet reading the resource from a file.


Alex

-- 
Alexander Klimetschek
http://www.mindquarry.com


Re: Performance with blocks protocol

Posted by Daniel Fagerstrom <da...@nada.kth.se>.
Lars Trieloff skrev:
> Results from a JMeter-drilling test: accessing 
> http://localhost:8080/cocoon/cocoon-blocks-fw-sample3/sub/test in 10 
> parallel threads using the cocoon-blocks-fw-sample.
>
> - 1 DispatcherServlet
> - 4 BlockServlet
> - 4 SitemapServlet
>
> - 6 TreeProcessor
> - 6 EnvironmentHelper
Seem like the singleton settings work this far. A little bit unclear why 
there are two extra TreeProcessors though. One of them might be from a 
"standard" SitemapServlet.

> - 2731 BlockCallHttpServletRequest
>
> - 1 ReadNode
>
> - 2731 ResourceReader
>
> - 5463 HttpEnvironment
> - 5463 HttpRequest
> - 5463 HttpResponse
> - 5463 NonCachingProcessingPipeline
Here it seem like something is wrong with cleanup after sitemap 
processing. The SitemapServlet creates a new HttpEnvironment object that 
refer to the incoming request and response object at each request. The 
environment object is in turn referenced by the 
NonCachingProcessingPipeline object which also is responsible for th 
ResourceReader object.

The problem seem to be that the recycle method of the pipeline is not 
called and that the pooling doesn't work. It is not that easy to follow 
the code, so I don't yet know exactly what happens or where.

Do you have similar figures from Cocoon used without the blocks fw? It 
would help tracking down the problem to know if it is block specific or not.

/Daniel


Re: Performance with blocks protocol

Posted by Lars Trieloff <la...@trieloff.net>.
These are the Results after applying Alexander's patch. I've modified  
the test case to use 32 threads for drilling, as with 10 threads only  
10 ResourceReaders will be created.

> - 2731 BlockCallHttpServletRequest
none
> - 2731 ResourceReader
32
> - 5463 HttpEnvironment
none
> - 5463 HttpRequest
none
> - 5463 HttpResponse
none
> - 5463 NonCachingProcessingPipeline
32

Looks very good! I will commit the patch.
--
Lars Trieloff
visit http://www.mindquarry.com/




Re: Performance with blocks protocol

Posted by Lars Trieloff <la...@trieloff.net>.
Results from a JMeter-drilling test: accessing http://localhost:8080/ 
cocoon/cocoon-blocks-fw-sample3/sub/test in 10 parallel threads using  
the cocoon-blocks-fw-sample.

- 1 DispatcherServlet
- 4 BlockServlet
- 4 SitemapServlet

- 6 TreeProcessor
- 6 EnvironmentHelper

- 2731 BlockCallHttpServletRequest

- 1 ReadNode

- 2731 ResourceReader

- 5463 HttpEnvironment
- 5463 HttpRequest
- 5463 HttpResponse
- 5463 NonCachingProcessingPipeline


>
> Meanwhile I will try to reproduce this problem in the cocoon-blocks- 
> fw-demo so that it can be tested using JMeter and YourKit Java  
> Profiler.

--
Lars Trieloff
visit http://www.mindquarry.com/




Re: Performance with blocks protocol

Posted by Lars Trieloff <la...@trieloff.net>.
Hi,

we should check that there are really no more than 32 instances of  
ResourceReader on the heap. If this is the case, all we need to do is  
to raise the -Xmx limit and add a note to the docs that blocks might  
need some more memory.

Meanwhile I will try to reproduce this problem in the cocoon-blocks- 
fw-demo so that it can be tested using JMeter and YourKit Java Profiler.

Lars

Am 07.11.2006 um 10:25 schrieb Daniel Fagerstrom:

> Alexander Klimetschek skrev:
>> This seems to be the real problem. There are new ResourceReader  
>> instances for each request (along with the BlockSource,  
>> BlockConnection etc. connected with it), but they are never  
>> reused. Instead they keep in memory, referenced by some  
>> ThreadLocal storage.
>>
>> Something in the special handling o.a.c.sitemap.SitemapServlet  
>> might be buggy, but we cannot see what. Do you have time to  
>> further look at that this week? When using forms, the amount of  
>> data that is collected this way increases so fast, that we  
>> sometimes get an OutOfMemory after 3-4 reloads...
>
> Looking at the ResourceReader it is Recyclable and has a default  
> max-pool-size of 32. I don't know enough about the Avalon life  
> style implementation to know exactly what happens. As long as there  
> are no more than 32 instances of ResourceReaders they are, AFAIU  
> supposed to be kept in a pool in the memory. But we would get a  
> problem if the recycle method of the ResourceReader is called to  
> late or not at all in that case the other object would be kept in  
> memory when they should have been garbage collected.


--
Lars Trieloff
visit http://www.mindquarry.com/




Re: Solution found [Re: Performance with blocks protocol]

Posted by Lars Trieloff <la...@trieloff.net>.
Alex,

please submit a patch to JIRA, I will review and test it with the  
same setting as yesterday evening (JMeter+Yourkit) post the figures  
and commit it if Daniel has no objections.

Lars

Am 08.11.2006 um 11:28 schrieb Alexander Klimetschek:

> Alexander Klimetschek schrieb:
>>> Looking in the RequestProcessor that is called by the standard  
>>> SitemapServlet, it calls ProcessingUtil.cleanup(), while leaving  
>>> the processing of a request. I don't call that in the  
>>> corresponding code in the o.a.c.sitemap.SitemapServlet, that  
>>> might be a problem.
>> I have copied that, because I tried to figure out what is  
>> different between the sitemap.SitemapServlet and the  
>> RequestProcessor, but it did not change anything with the memory  
>> leak.
>
> My fault, didn't built the core module after changing that. It is  
> indeed the missing ProcessingUtil.cleanup() method. You probably  
> did not put it into the sitemap.SitemapServlet because it cleans  
> everything, including the current request data, so when called in a  
> block that is called by another block, upon return no further  
> processing is possible because you get NPEs when accessing the  
> original HttpRequest...
>
> So I put that call into the DispatcherServlet, right at the end of  
> the service() method and it seems to work. We will now do some  
> testing with the blocks-fw-samples and then commit it.
>
>
> Alex
>
> -- 
> Alexander Klimetschek
> http://www.mindquarry.com
>

--
Lars Trieloff
visit http://www.mindquarry.com/




Re: Solution found [Re: Performance with blocks protocol]

Posted by Daniel Fagerstrom <da...@nada.kth.se>.
Alexander Klimetschek skrev:
> Alexander Klimetschek schrieb:
>>> Looking in the RequestProcessor that is called by the standard 
>>> SitemapServlet, it calls ProcessingUtil.cleanup(), while leaving the 
>>> processing of a request. I don't call that in the corresponding code 
>>> in the o.a.c.sitemap.SitemapServlet, that might be a problem.
>>
>> I have copied that, because I tried to figure out what is different 
>> between the sitemap.SitemapServlet and the RequestProcessor, but it 
>> did not change anything with the memory leak.
> 
> My fault, didn't built the core module after changing that. It is indeed 
> the missing ProcessingUtil.cleanup() method.

Great!

> You probably did not put it 
> into the sitemap.SitemapServlet because it cleans everything, including 
> the current request data, so when called in a block that is called by 
> another block, upon return no further processing is possible because you 
> get NPEs when accessing the original HttpRequest...

That was a kind interpretation :) I didn't understand the importance of 
the ProcessingUtil.cleanup() and probably just forgot to call it.

AFAICS the ProcessingUtil.cleanup() is only used for handling poolable 
beans in the end of the request.

> So I put that call into the DispatcherServlet, right at the end of the 
> service() method and it seems to work. We will now do some testing with 
> the blocks-fw-samples and then commit it.

We should merge the servlet.SitemapServlet and the 
sitemap.SitemapServlet and then this solution doesn't work. But right 
now the important thing is to get rid of the memory leak so we can solve 
it like that for the moment.

In the long term the handling poolable components probably need some 
oversight. IIRC there have been discussions on the list that most of the 
sitemap components should be created by a factory instead of being 
poolable (see [Generator|Transformer|Serializer]Factory).

Also Spring doesn't have the concept of returning a component to the 
container as Avalon has (ServiceManager.release(Object o)). So it is not 
possible to implement true poolables. Instead the poolables are 
destroyed in the end of the request. Which makes them closer to request 
scoped components in Spring.

One possibility would be to put the ProcessingUtil.cleanup() in a 
servlet request listners destroy method.

/Daniel

Solution found [Re: Performance with blocks protocol]

Posted by Alexander Klimetschek <al...@mindquarry.com>.
Alexander Klimetschek schrieb:
>> Looking in the RequestProcessor that is called by the standard 
>> SitemapServlet, it calls ProcessingUtil.cleanup(), while leaving the 
>> processing of a request. I don't call that in the corresponding code 
>> in the o.a.c.sitemap.SitemapServlet, that might be a problem.
> 
> I have copied that, because I tried to figure out what is different 
> between the sitemap.SitemapServlet and the RequestProcessor, but it did 
> not change anything with the memory leak.

My fault, didn't built the core module after changing that. It is indeed 
the missing ProcessingUtil.cleanup() method. You probably did not put it 
into the sitemap.SitemapServlet because it cleans everything, including 
the current request data, so when called in a block that is called by 
another block, upon return no further processing is possible because you 
get NPEs when accessing the original HttpRequest...

So I put that call into the DispatcherServlet, right at the end of the 
service() method and it seems to work. We will now do some testing with 
the blocks-fw-samples and then commit it.


Alex

-- 
Alexander Klimetschek
http://www.mindquarry.com


Re: Performance with blocks protocol

Posted by Alexander Klimetschek <al...@mindquarry.com>.
Daniel Fagerstrom schrieb:
> First The BlockServlets are configured to be singletons and created by 
> the Spring container. A BlockServlet in turn creates a SitemapServlet 
> (or whatever servlet type it is configured to create) in its init 
> method. The Sitemap servlet creates a TreeProcessor in its init method. 
> The DispatcherServlet connects to the BlockServlets in its init method. 
> All components mentioned this far should AFAICS be singletons. Does this 
> hold in practice in your tests?

Yes. The exact numbers are (we have n=7 blocks), after 2 reloads of a 
static page with a number of resources (images, css):

- 1 DispatcherServlet
- 7 BlockServlet
- 7 SitemapServlet

- 8 (n+1) TreeProcessor
- 8 (n+1) EnvironmentHelper

- 24 BlockCallHttpServletRequest

- 27 ReadNode

- 67 ResourceReader

- 69 HttpEnvironment
- 69 HttpRequest
- 69 HttpResponse
- 69 NonCachingProcessingPipeline

> The TreeProcessor is a singleton but it gets and populates a new 
> pipeline at each request, so new ResourceReaders are supposed to be 
> created. Sources are factory based so a new BlockSource is created and 
> it creates in turn a new BlockConnection. All this is expected, so the 
> question is if they are destroyed correctly.

It very much looks like they are not destroyed.

> Looking at the ResourceReader it is Recyclable and has a default 
> max-pool-size of 32. I don't know enough about the Avalon life style 
> implementation to know exactly what happens. As long as there are no 
> more than 32 instances of ResourceReaders they are, AFAIU supposed to be 
> kept in a pool in the memory. But we would get a problem if the recycle 
> method of the ResourceReader is called to late or not at all in that 
> case the other object would be kept in memory when they should have been 
> garbage collected.

There is no config, so the default value of 32 is used, but as listed 
above, there are actually more than 32 instances.

> The BlockCallStack contains stacks in thread local memory that contain 
> BlockContexts. AFAICS they don't have any references to things that 
> should be garbage collected, but if they do we would have a problem.

I will look into that now.

> Looking in the RequestProcessor that is called by the standard 
> SitemapServlet, it calls ProcessingUtil.cleanup(), while leaving the 
> processing of a request. I don't call that in the corresponding code in 
> the o.a.c.sitemap.SitemapServlet, that might be a problem.

I have copied that, because I tried to figure out what is different 
between the sitemap.SitemapServlet and the RequestProcessor, but it did 
not change anything with the memory leak.

> Also it would be worthwhile to scrutinize that all streams are closed as 
> they should.

I also implemented the flush() and close() methods in 
BlockCallHttpServletResponse.getOutputStream() (in that anonymous inner 
class), passing on the stuff to the internal stream, where flush() quite 
often gets called, but close() not. Then I saw that there "should" be a 
close() call in BlockConnection.getInputStream() and I added a 
"os.close();" right before "return new ByteArrayInputStream(out);".


Alex

-- 
Alexander Klimetschek
http://www.mindquarry.com


Re: Performance with blocks protocol

Posted by Daniel Fagerstrom <da...@nada.kth.se>.
Alexander Klimetschek skrev:
> Daniel Fagerstrom schrieb:
>> One possibility is that there is some fault in the component handling 
>> so that the block servlet component or tree processor is recreated 
>> instead of reused in some cases.
>
> This seems to be the real problem. There are new ResourceReader 
> instances for each request (along with the BlockSource, 
> BlockConnection etc. connected with it), but they are never reused. 
> Instead they keep in memory, referenced by some ThreadLocal storage.
>
> Something in the special handling o.a.c.sitemap.SitemapServlet might 
> be buggy, but we cannot see what. Do you have time to further look at 
> that this week? When using forms, the amount of data that is collected 
> this way increases so fast, that we sometimes get an OutOfMemory after 
> 3-4 reloads...
I have not done any tests yet but have looked at the code.

First The BlockServlets are configured to be singletons and created by 
the Spring container. A BlockServlet in turn creates a SitemapServlet 
(or whatever servlet type it is configured to create) in its init 
method. The Sitemap servlet creates a TreeProcessor in its init method. 
The DispatcherServlet connects to the BlockServlets in its init method. 
All components mentioned this far should AFAICS be singletons. Does this 
hold in practice in your tests?

The TreeProcessor is a singleton but it gets and populates a new 
pipeline at each request, so new ResourceReaders are supposed to be 
created. Sources are factory based so a new BlockSource is created and 
it creates in turn a new BlockConnection. All this is expected, so the 
question is if they are destroyed correctly.

Looking at the ResourceReader it is Recyclable and has a default 
max-pool-size of 32. I don't know enough about the Avalon life style 
implementation to know exactly what happens. As long as there are no 
more than 32 instances of ResourceReaders they are, AFAIU supposed to be 
kept in a pool in the memory. But we would get a problem if the recycle 
method of the ResourceReader is called to late or not at all in that 
case the other object would be kept in memory when they should have been 
garbage collected.

The BlockCallStack contains stacks in thread local memory that contain 
BlockContexts. AFAICS they don't have any references to things that 
should be garbage collected, but if they do we would have a problem.

Looking in the RequestProcessor that is called by the standard 
SitemapServlet, it calls ProcessingUtil.cleanup(), while leaving the 
processing of a request. I don't call that in the corresponding code in 
the o.a.c.sitemap.SitemapServlet, that might be a problem.

Also it would be worthwhile to scrutinize that all streams are closed as 
they should.

/Daniel


Re: Performance with blocks protocol

Posted by Alexander Klimetschek <al...@mindquarry.com>.
Daniel Fagerstrom schrieb:
> One possibility is that there is some fault in the component handling so 
> that the block servlet component or tree processor is recreated instead 
> of reused in some cases.

This seems to be the real problem. There are new ResourceReader 
instances for each request (along with the BlockSource, BlockConnection 
etc. connected with it), but they are never reused. Instead they keep in 
memory, referenced by some ThreadLocal storage.

Something in the special handling o.a.c.sitemap.SitemapServlet might be 
buggy, but we cannot see what. Do you have time to further look at that 
this week? When using forms, the amount of data that is collected this 
way increases so fast, that we sometimes get an OutOfMemory after 3-4 
reloads...

Alex

-- 
Alexander Klimetschek
http://www.mindquarry.com


Re: Performance with blocks protocol

Posted by Daniel Fagerstrom <da...@nada.kth.se>.
Alexander Klimetschek skrev:
> Hi Daniel,
> 
> we are noticing that the blocks protocol is currently quite slow. We did 
> not yet got a proper profiling setup, so I am asking if you know what 
> the bottlenecks might be and how to resolve them.

While I have not optimized the code there is no obvious bottlenecks 
except for that XML output (and input) should be done with SAX in some 
cases, so that we get rid of the serialization followed by parsing  step.

Calling blocks should nopt be more than some minimal overhead so I would 
guess that there is something unintended that happen.

Profiling would probably be the easiest way to find out where the 
bottleneck is.

> One thing might be the logging of all requests and apart from commenting 
> them out I did not manage to turn them off, since they use the 
> servletcontext logger and I don't know how to configure that one 
> specifically to be more silent.

The use of the servlet context logger is a leftover from before, when 
component setup was much more complicated and done within Cocoon. Then I 
needed to minimize component requirements within the blocks fw. Now when 
the components are setup already in a servlet context listener that is 
not a problem anymore and the code should be changed to the ordinary logger.

> And when using the blocks-protocol together with cforms in the 
> "backend"-block, we quickly get java.lang.OutOfMemory errors (when 
> opening or reloading a form about 10 times). It looks like some 
> continuations are still open, but can it be that they are much larger in 
> the blocks case?

I don't find any reasons for that they should be larger in the blocks 
case. Again profiling would give some clues.

One possibility is that there is some fault in the component handling so 
that the block servlet component or tree processor is recreated instead 
of reused in some cases.

> Or am I doing something terribly wrong? The 
> continuation request is passed on to the super-block via an URI that 
> contains all request params (via that InputModule I talked about) and 
> then the continuation is called inside that block.

Don't seem like something that should be a problem.

/Daniel