You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Joerg Heinicke <jo...@gmx.de> on 2008/04/28 05:43:52 UTC

Re: Avoiding OutOfMemory Errors by limiting data in pipeline

On 24.04.2008 16:08, Bruce Atherton wrote:
> Thanks for the response. About setting the buffer size, this looks like 
> it could be what I am looking for. A few questions:
> 
> 1. Do I have to set the buffer size on each transformer and the 
> serializer as well as the generator? What about setting it on the pipeline?

It is on the pipeline and only there. You can set it on the map:pipe 
element in the map:components section, so that it is applied to each 
pipeline of that type. Or on any individual map:pipeline element in the 
map:pipelines section.

> 2. Does the full amount of the buffer automatically get allocated for 
> each request, or does it grow gradually based on the xml stream size?
> 
> I have a lot of steps in the pipeline, so I am worried about the impact 
> of creating too many buffers even if they are relatively small. A 1 Meg 
> buffer might be too much if it is created for every element of every 
> pipeline for every request.

That's a very good question - with a negative answer: A buffer of that 
particular size is created initially. That's why I want to bring this 
issue up on dev again: With my changes for COCOON-2168 [1] it's now not 
only a problem for applications with over-sized downloads but 
potentially for everyone relying on Cocoon's default configuration. One 
idea would be to change our BufferedOutputStream implementation to take 
2 parameters: one for the initial buffer size and one for the flush 
size. The flush treshold would be the configurable outputBufferSize, the 
initial buffer size does not need to be configurable I think.

What do other think?

> On an unrelated note, is there some way to configure caching so that 
> nothing is cached that is larger than a certain size? I'm worried that 
> this might be a caching issue rather than a buffer issue.

Not that I'm aware of. Why do you think it's caching? Caching is at 
least configurable in terms of number of cache entries and I also think 
in terms of max cache size. But beyond a certain cache size the cache 
entries are written to disk anyway so it's unlikely resulting in a 
memory issue.

> How do you read the object graph from the heap dump? To tell you the 
> truth, I'm not sure. This is the hierarchy generated by the Heap 
> Analyzer tool from IBM, and is from a heap dump on an AIX box running 
> the IBM JRE. My guess as to the Object referencing the 
> ComponentsSelector is that the ArrayList is not generified, so the 
> analyzer doesn't know the actual type of the Object being referenced. 
> What the object actually is would depend on what 
> CachingProcessorPipeline put into the ArrayList. That is just a guess, 
> though. And I have no explanation for the link between 
> FOM_Cocoon$CallContext and ConcreteCallProcessor. Perhaps things were 
> different in the 2.1.9 release?

No serious changes since 2.1.9 which is rev 392241 [2].

Joerg

[1] https://issues.apache.org/jira/browse/COCOON-2168
[2] 
http://svn.apache.org/viewvc/cocoon/branches/BRANCH_2_1_X/src/java/org/apache/cocoon/components/flow/javascript/fom/FOM_Cocoon.java?view=log

Re: Avoiding OutOfMemory Errors by limiting data in pipeline

Posted by Bruce Atherton <br...@callenish.com>.
Thanks for the effort, Joerg. I appreciate it. (And that of all the 
other developers for all the hard work).

Joerg Heinicke wrote:
> Joerg Heinicke <joerg.heinicke <at> gmx.de> writes:
>
>   
>>> 2. Does the full amount of the buffer automatically get allocated for 
>>> each request, or does it grow gradually based on the xml stream size?
>>>       
>> That's a very good question - with a negative answer: A buffer of that 
>> particular size is created initially. [..] One idea would be to change our
>> BufferedOutputStream implementation to take 2 parameters: one for the initial
>> buffer size and one for the flush size.
>>     
>
> Hi Bruce (et al.),
>
> this functionality is now in SVN. The initial buffer size is 8192 bytes (but not
> yet configurable), the default flush buffer size is 1 MB and configurable on the
> pipeline.
>
> Joerg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Avoiding OutOfMemory Errors by limiting data in pipeline

Posted by Joerg Heinicke <jo...@gmx.de>.
Joerg Heinicke <joerg.heinicke <at> gmx.de> writes:

> > 2. Does the full amount of the buffer automatically get allocated for 
> > each request, or does it grow gradually based on the xml stream size?
> 
> That's a very good question - with a negative answer: A buffer of that 
> particular size is created initially. [..] One idea would be to change our
> BufferedOutputStream implementation to take 2 parameters: one for the initial
> buffer size and one for the flush size.

Hi Bruce (et al.),

this functionality is now in SVN. The initial buffer size is 8192 bytes (but not
yet configurable), the default flush buffer size is 1 MB and configurable on the
pipeline.

Joerg


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Avoiding OutOfMemory Errors by limiting data in pipeline

Posted by Peter Hunsberger <pe...@gmail.com>.
On Sun, May 11, 2008 at 6:59 PM, Joerg Heinicke <jo...@gmx.de> wrote:
> On 09.05.2008 09:41, Peter Hunsberger wrote:
>
>
> >
> > I haven't looked at the code here, but couldn't you just introduce a
> > second getOutputStream( int bufferSize ) method where the current
> > interface method continues with the current default logic if it is
> > used?
> >
>
>  getOutputStream() actually already takes an int parameter, the flush buffer
> size.

Yeah, I saw that...

> Whether to add another getOutputStream() method or modify the existing
> one there is not really a difference IMO. Environment is a kind of internal
> interface (or SPI how it used to be called lately, isn't it?). This means
> there should be only very few implementations besides the one we provide if
> at all (Forrest, Lenya, CLI environment?). And in Cocoon we would change all
> usages of the single-parameterized method to the one with 2 parameters. So
> whoever provides such an Environment implementation has to adapt his
> implementation in a meaningful way anyway (empty implementation returning
> null, throwing NotSupportedException, whatever would not work). So it's the
> same effort for them whether to add a new method or changing existing one on
> the interface.

I don't see that, you can continue the existing behaviour for those
who don't change?

>  IMO the decision should be made purely from a design perspective. Should  a
> configuration parameter passed around as method parameter though it is
> static through the whole lifecycle of the Environment instance? In a perfect
> world I'd say no :)

That makes sense.  Guess the question in that case is; are the are any
use cases where people could use such a parameter as not static?

> Which leaves the question how to inject the parameter.
> One place is on instantiation (e.g. CocoonServlet.getEnvironment(..) in 2.1,
> RequestProcessor.getEnvironment(..) in 2.2) which leaves us with the web.xml
> init parameter (or analogical alternatives for other environments) as
> described.
>
>  Another option I found is to setup the environment (i.e. injecting the
> parameter) while setting up the pipeline. AbstractProcessingPipeline is the
> place where we have access to the current flush buffer size parameter and
> call getOutputStream(..) on the environment. It has a method
> setupPipeline(Environment). Why not injecting the parameter(s) here? Due to
> its lifecycle changing the property of the environment should not cause any
> problem since it's a one-time usage object, no threading problems or
> something like that.
>

Seems reasonable.

[snip/]

-- 
Peter Hunsberger

Re: Avoiding OutOfMemory Errors by limiting data in pipeline

Posted by Joerg Heinicke <jo...@gmx.de>.
On 09.05.2008 09:41, Peter Hunsberger wrote:

>> I think this is rather hard to do. The place where we instantiate the
>> BufferedOutputStreams (both java.io and o.a.c.util) is
>> AbstractEnvironment.getOutputStream(int bufferSize). So in order to pass a
>> second buffer size argument to the BufferedOutputStream constructor we need
>> to have it available there. One option would be to add it to
>> getOutputStream() - which is an interface change and not really nice.
> 
> I haven't looked at the code here, but couldn't you just introduce a
> second getOutputStream( int bufferSize ) method where the current
> interface method continues with the current default logic if it is
> used?

getOutputStream() actually already takes an int parameter, the flush 
buffer size. Whether to add another getOutputStream() method or modify 
the existing one there is not really a difference IMO. Environment is a 
kind of internal interface (or SPI how it used to be called lately, 
isn't it?). This means there should be only very few implementations 
besides the one we provide if at all (Forrest, Lenya, CLI environment?). 
And in Cocoon we would change all usages of the single-parameterized 
method to the one with 2 parameters. So whoever provides such an 
Environment implementation has to adapt his implementation in a 
meaningful way anyway (empty implementation returning null, throwing 
NotSupportedException, whatever would not work). So it's the same effort 
for them whether to add a new method or changing existing one on the 
interface.

IMO the decision should be made purely from a design perspective. Should 
  a configuration parameter passed around as method parameter though it 
is static through the whole lifecycle of the Environment instance? In a 
perfect world I'd say no :) Which leaves the question how to inject the 
parameter. One place is on instantiation (e.g. 
CocoonServlet.getEnvironment(..) in 2.1, 
RequestProcessor.getEnvironment(..) in 2.2) which leaves us with the 
web.xml init parameter (or analogical alternatives for other 
environments) as described.

Another option I found is to setup the environment (i.e. injecting the 
parameter) while setting up the pipeline. AbstractProcessingPipeline is 
the place where we have access to the current flush buffer size 
parameter and call getOutputStream(..) on the environment. It has a 
method setupPipeline(Environment). Why not injecting the parameter(s) 
here? Due to its lifecycle changing the property of the environment 
should not cause any problem since it's a one-time usage object, no 
threading problems or something like that.

I'm just curious what the original reason was to pass the parameter 
along rather than injecting it? Maybe there is a flaw in my thoughts :) 
Whoever knows the code, are my statements correct and what do you think 
about the approach injecting the parameters rather than passing them 
along? Second, if it is a valid approach which way to go?

1) Don't provide a separate configuration option for initial buffer size.
2) Pass both parameters to getOutputStream(..).
3) Leave the current flush buffer size as parameter to 
getOutputStream(..) but inject the other one
a) from web.xml.
b) from pipeline configuration.
4) Inject both buffer sizes, eventually reactivating/reintroducing 
getOutputStream() without any parameter and deprecating the other one.

Many questions, yet another one: What do you think? :)

Joerg

Re: Avoiding OutOfMemory Errors by limiting data in pipeline

Posted by Peter Hunsberger <pe...@gmail.com>.
On Fri, May 9, 2008 at 12:08 AM, Joerg Heinicke <jo...@gmx.de> wrote:
> On 08.05.2008 11:53, Bruce Atherton wrote:

<snip/>

> I think this is rather hard to do. The place where we instantiate the
> BufferedOutputStreams (both java.io and o.a.c.util) is
> AbstractEnvironment.getOutputStream(int bufferSize). So in order to pass a
> second buffer size argument to the BufferedOutputStream constructor we need
> to have it available there. One option would be to add it to
> getOutputStream() - which is an interface change and not really nice.

I haven't looked at the code here, but couldn't you just introduce a
second getOutputStream( int bufferSize ) method where the current
interface method continues with the current default logic if it is
used?

>
> The second option would be to pass it to the Environment instance. Since
> environments can be wrapped it needs again an interface change (but just
> adding a method, which is much better). And you have to look where
> environments are instantiated, e.g. HttpServletEnvironment in CocoonServlet.
> From what I see from a quick look only potential way to provide a
> configuration would be as servlet init parameter. That makes it two
> different places to configure these two different buffer sizes - not very
> intuitive.

Yuck.

-- 
Peter Hunsberger

Re: Avoiding OutOfMemory Errors by limiting data in pipeline

Posted by Joerg Heinicke <jo...@gmx.de>.
On 08.05.2008 11:53, Bruce Atherton wrote:

> My only comment is that I think it would be good to allow the initial 
> buffer size to be configurable. If you know the bulk of your responses 
> are greater than 32K, then performing the ramp-up from 8K every time 
> would be a waste of resources. For another web site, if most responses 
> were smaller than 6K then an 8K buffer would be perfect. Allowing 
> someone to tweak that based on their situation seems useful to me.
> 
> Not critical though, if it is hard to do. Allowing the buffer to scale 
> is the important thing.

I think this is rather hard to do. The place where we instantiate the 
BufferedOutputStreams (both java.io and o.a.c.util) is 
AbstractEnvironment.getOutputStream(int bufferSize). So in order to pass 
a second buffer size argument to the BufferedOutputStream constructor we 
need to have it available there. One option would be to add it to 
getOutputStream() - which is an interface change and not really nice.

The second option would be to pass it to the Environment instance. Since 
environments can be wrapped it needs again an interface change (but just 
adding a method, which is much better). And you have to look where 
environments are instantiated, e.g. HttpServletEnvironment in 
CocoonServlet. From what I see from a quick look only potential way to 
provide a configuration would be as servlet init parameter. That makes 
it two different places to configure these two different buffer sizes - 
not very intuitive.

Joerg

Re: Avoiding OutOfMemory Errors by limiting data in pipeline

Posted by Bruce Atherton <br...@callenish.com>.
My only comment is that I think it would be good to allow the initial 
buffer size to be configurable. If you know the bulk of your responses 
are greater than 32K, then performing the ramp-up from 8K every time 
would be a waste of resources. For another web site, if most responses 
were smaller than 6K then an 8K buffer would be perfect. Allowing 
someone to tweak that based on their situation seems useful to me.

Not critical though, if it is hard to do. Allowing the buffer to scale 
is the important thing.

Joerg Heinicke wrote:
> On 27.04.2008 23:43, Joerg Heinicke wrote:
>
>>> 2. Does the full amount of the buffer automatically get allocated 
>>> for each request, or does it grow gradually based on the xml stream 
>>> size?
>>>
>>> I have a lot of steps in the pipeline, so I am worried about the 
>>> impact of creating too many buffers even if they are relatively 
>>> small. A 1 Meg buffer might be too much if it is created for every 
>>> element of every pipeline for every request.
>>
>> That's a very good question - with a negative answer: A buffer of 
>> that particular size is created initially. That's why I want to bring 
>> this issue up on dev again: With my changes for COCOON-2168 [1] it's 
>> now not only a problem for applications with over-sized downloads but 
>> potentially for everyone relying on Cocoon's default configuration. 
>> One idea would be to change our BufferedOutputStream implementation 
>> to take 2 parameters: one for the initial buffer size and one for the 
>> flush size. The flush treshold would be the configurable 
>> outputBufferSize, the initial buffer size does not need to be 
>> configurable I think.
>>
>> What do other think?
>
> No interest or no objections? :)
>
> Joerg


Re: Avoiding OutOfMemory Errors by limiting data in pipeline

Posted by Joerg Heinicke <jo...@gmx.de>.
On 08.05.2008 12:16, Antonio Gallardo wrote:

> One question, what are supposed to be the default values for both 
> parameters?

For the initial buffer size I thought of 8K, maybe 16K. It should be a 
reasonable size that's not overly large (i.e. unnecessarily reserved 
memory) for most of the resources.

For the flush buffer size we already talked about 1MB as default value 
[1]. This size should be nearly never hit.

Joerg

[1] http://marc.info/?t=120473411300003&r=1&w=4

Re: Avoiding OutOfMemory Errors by limiting data in pipeline

Posted by Antonio Gallardo <ag...@agssa.net>.
Hi Joerg,

I am +1.

One question, what are supposed to be the default values for both 
parameters?

Best Regards,

Antonio Gallardo.

Joerg Heinicke escribió:
> On 27.04.2008 23:43, Joerg Heinicke wrote:
>
>>> 2. Does the full amount of the buffer automatically get allocated 
>>> for each request, or does it grow gradually based on the xml stream 
>>> size?
>>>
>>> I have a lot of steps in the pipeline, so I am worried about the 
>>> impact of creating too many buffers even if they are relatively 
>>> small. A 1 Meg buffer might be too much if it is created for every 
>>> element of every pipeline for every request.
>>
>> That's a very good question - with a negative answer: A buffer of 
>> that particular size is created initially. That's why I want to bring 
>> this issue up on dev again: With my changes for COCOON-2168 [1] it's 
>> now not only a problem for applications with over-sized downloads but 
>> potentially for everyone relying on Cocoon's default configuration. 
>> One idea would be to change our BufferedOutputStream implementation 
>> to take 2 parameters: one for the initial buffer size and one for the 
>> flush size. The flush treshold would be the configurable 
>> outputBufferSize, the initial buffer size does not need to be 
>> configurable I think.
>>
>> What do other think?
>
> No interest or no objections? :)
>
> Joerg


Re: Avoiding OutOfMemory Errors by limiting data in pipeline

Posted by Joerg Heinicke <jo...@gmx.de>.
On 27.04.2008 23:43, Joerg Heinicke wrote:

>> 2. Does the full amount of the buffer automatically get allocated for 
>> each request, or does it grow gradually based on the xml stream size?
>>
>> I have a lot of steps in the pipeline, so I am worried about the 
>> impact of creating too many buffers even if they are relatively small. 
>> A 1 Meg buffer might be too much if it is created for every element of 
>> every pipeline for every request.
> 
> That's a very good question - with a negative answer: A buffer of that 
> particular size is created initially. That's why I want to bring this 
> issue up on dev again: With my changes for COCOON-2168 [1] it's now not 
> only a problem for applications with over-sized downloads but 
> potentially for everyone relying on Cocoon's default configuration. One 
> idea would be to change our BufferedOutputStream implementation to take 
> 2 parameters: one for the initial buffer size and one for the flush 
> size. The flush treshold would be the configurable outputBufferSize, the 
> initial buffer size does not need to be configurable I think.
> 
> What do other think?

No interest or no objections? :)

Joerg