You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Sylvain Wallez <sy...@apache.org> on 2005/06/07 19:02:57 UTC

Weird multithreading bug in Cron block

Hi all,

I'm currently working on a publication application with complex database 
queries where we want to prefetch some of the pages linked to by the 
page currently being produced, in order to speed up response time on 
pages that are likely to be asked for by users.

To achieve this, we have a "PrefetchTransformer" that grabs elements 
having a prefetch="true" attribute and starts a background job to load 
the corresponding "src" or "href" URL using a "cocoon:".

At first I used JobScheduler.fireJob() to schedule for immediate 
execution, but went into *weird* bugs with strange NPEs all around in 
pipeline components. After analysis, it appeared that while the 
scheduler thread was processing the pipeline, the http thread was 
recycling the *background environment*, thus nulling the object model 
and other class attributes used by pipeline components.

I spent the *whole day* trying to find the cause for this, without 
success (how frustrating).

Then I decided to try another approach and use 
JobScheduler.fireJobAt(new Date()), meaning "schedule the job for later 
execution... now!". And it worked!

Weird, weird, weird! Anybody having a hint about why fireJob() is doing 
this environment mixture?

Sylvain

-- 
Sylvain Wallez                        Anyware Technologies
http://apache.org/~sylvain            http://anyware-tech.com
Apache Software Foundation Member     Research & Technology Director


Re: Weird multithreading bug in Cron block

Posted by Sylvain Wallez <sy...@apache.org>.
Vadim Gritsenko wrote:

> Sylvain Wallez wrote:
>
>> Vadim Gritsenko wrote:
>>
>>> Anyways, RunnableManager's pools MUST NOT inherit any variables, 
>>> IMHO. I think you meant the same.
>>
>>
>>
>> Exactly. And my modifications ensure this CAN NOT happen, by simply 
>> suppressing the automatic inheritance of environment stack between 
>> child threads and their parent.
>
>
> Ok, Good. But now there is an incompatibility - hope you mention it at 
> least in status file.


Yep, I will. In the meantime, I've seen that DefaultIncludeCacheManager 
in 2.2 uses RunnableManager instead of "new Thread()", which is buggy as 
the thread used to load the URI won't inherit the environment of the 
request processing thread.

So I decided to refactor the CocoonThread I just added in a 
CocoonRunnable, which wraps a Runnable and sets up the environment stack 
before calling the wrapped object's run() method. That allows to use 
both "new Thread()" and thread pools.

>> For places where we *need* inheritance to happen such as in parallel 
>> include transformer, we now MUST use CocoonThread that does the 
>> environment stack copy, thus preserving the information needed to 
>> correctly resolve URIs in background threads. And in that case, we 
>> cannot use RunnableManager since its threads don't (and shouldn't) 
>> inherit the environement stack.
>
>
> IncludeTransformer does not rely on inheritable thread locals, as it 
> is now. It used to, though.


Note that with the InheritableThreadLocal, thread had no choice but 
inherit the environment stack of the parent thread. And using runnable 
manager means using a inherited context which isn't the right one. The 
CocoonRunnable should fix this.

>>> What's missing? :-?
>>
>>
>> The asynchronous nature of threads combined with the create-on-demand 
>> behaviour of pools.
>
>
> Gotcha.


Yeah, not easy :-)

Sylvain

-- 
Sylvain Wallez                        Anyware Technologies
http://apache.org/~sylvain            http://anyware-tech.com
Apache Software Foundation Member     Research & Technology Director


Re: Weird multithreading bug in Cron block

Posted by Vadim Gritsenko <va...@reverycodes.com>.
Sylvain Wallez wrote:
> Vadim Gritsenko wrote:
> 
>> Anyways, RunnableManager's pools MUST NOT inherit any variables, IMHO. 
>> I think you meant the same.
> 
> 
> Exactly. And my modifications ensure this CAN NOT happen, by simply 
> suppressing the automatic inheritance of environment stack between child 
> threads and their parent.

Ok, Good. But now there is an incompatibility - hope you mention it at least in 
status file.


> For places where we *need* inheritance to happen such as in parallel 
> include transformer, we now MUST use CocoonThread that does the 
> environment stack copy, thus preserving the information needed to 
> correctly resolve URIs in background threads. And in that case, we 
> cannot use RunnableManager since its threads don't (and shouldn't) 
> inherit the environement stack.

IncludeTransformer does not rely on inheritable thread locals, as it is now. It 
used to, though.


>> What's missing? :-?
> 
> The asynchronous nature of threads combined with the create-on-demand 
> behaviour of pools.

Gotcha.

Vadim

Re: Weird multithreading bug in Cron block

Posted by Sylvain Wallez <sy...@apache.org>.
Vadim Gritsenko wrote:

> Sylvain Wallez wrote:
>
>> Vadim Gritsenko wrote:
>>
>>> Remedy is to use thread pool(s), and not create local Threads - with 
>>> the exception of situation when local thread lives no longer than 
>>> original request.
>>
>>
>> I used the scheduler's thread pool, and that's when the problem 
>> appeared.
>
>
> Hm...
>
>
>>> If thread lives longer than request, use RunnableManager (or Cron), 
>>> which are using thread pools. Thread from the thread pool should be 
>>> set up with environment / processor, and it will be independent of 
>>> http environment which triggered the job.
>>
>>
>>
>> Yes, I know that (having largely contributed to the 
>> CocoonJobExecutor). The problem is the inheritance of the enviroment 
>> stack between threads (either in the pool or brand new ones) and 
>> their parent, which happens to be the servlet engine's thread that 
>> processed the very first request in the case of RunnableManager.
>
>
> Wait a sec, RunnableManager is setup when Cocoon servlet is 
> initialized, and this is happening on the web application startup. At 
> that time, there is no any environemnt setup - as no Servlet.process() 
> was called yet. If yours RunnableManager is setup during 
> Servlet.process(), it's some other problem, IMHO.


You're right that when the first http request hits the servlet engine, 
the following occurs
- call CocoonServlet.init()
-     CocoonServlet.createCocoon()
-     Cocoon.initialize()
-     Load all components including RunnableManager
- end CocoonServlet.init()
- CocoonServlet.service()

Now CocoonComponentManager uses an InheritableThreadLocal that clones 
the environment stack in child threads.

So this stack is cloned not when the RunnableManager is created, but 
when it needs to create a new thread. So actually I'm wrong in saying 
the parent thread is the one processing the very first request. It 
actually can be any of the servlet engine's threads... or none if the 
server is idle when a background thread is created.

Which actually makes this bug likely to happen when the server is under 
high load.

> Anyways, RunnableManager's pools MUST NOT inherit any variables, IMHO. 
> I think you meant the same.


Exactly. And my modifications ensure this CAN NOT happen, by simply 
suppressing the automatic inheritance of environment stack between child 
threads and their parent.

For places where we *need* inheritance to happen such as in parallel 
include transformer, we now MUST use CocoonThread that does the 
environment stack copy, thus preserving the information needed to 
correctly resolve URIs in background threads. And in that case, we 
cannot use RunnableManager since its threads don't (and shouldn't) 
inherit the environement stack.

>>> CocoonQuartzJobExecutor already has enterEnv/leaveEnv, so it should 
>>> work, if you have issues with it - what are they?
>>
>>
>> Well, read the explanation above :-)
>
>
> Did not made sense to me :-( I clearly see each time in the logs:
>
> INFO  (2005-05-30) 23:18.23:295 [core.runnable] (Unknown-URI) 
> Unknown-Thread/DefaultRunnableManager: ThreadPool named "daemon" 
> created with no 
> queue,max-pool-size=2147483647,min-pool-size=1,priority=5,isDaemon=true,keep-alive-time-ms=60000,block-policy=ABORT,shutdown-wait-time-ms=-1


The difference between min-pool-size and max-pool-size is key here, as 
it indicates other threads will be created later, i.e. after 
CocoonServlet.init().

> So pool has been created before any http environment has been created, 
> and from my reading of InheritableThreadLocal javadoc, it creates a 
> copy of self for the child thread, not reference - so threads are not 
> sharing same ThreadLocal, so this pool should not ever get a http 
> environment.
>
> What's missing? :-?


The asynchronous nature of threads combined with the create-on-demand 
behaviour of pools.

Sylvain

-- 
Sylvain Wallez                        Anyware Technologies
http://apache.org/~sylvain            http://anyware-tech.com
Apache Software Foundation Member     Research & Technology Director


Re: Weird multithreading bug in Cron block

Posted by Vadim Gritsenko <va...@reverycodes.com>.
Sylvain Wallez wrote:
> Vadim Gritsenko wrote:
> 
>> Remedy is to use thread pool(s), and not create local Threads - with 
>> the exception of situation when local thread lives no longer than 
>> original request.
> 
> I used the scheduler's thread pool, and that's when the problem appeared.

Hm...


>> If thread lives longer than request, use RunnableManager (or Cron), 
>> which are using thread pools. Thread from the thread pool should be 
>> set up with environment / processor, and it will be independent of 
>> http environment which triggered the job.
> 
> 
> Yes, I know that (having largely contributed to the CocoonJobExecutor). 
> The problem is the inheritance of the enviroment stack between threads 
> (either in the pool or brand new ones) and their parent, which happens 
> to be the servlet engine's thread that processed the very first request 
> in the case of RunnableManager.

Wait a sec, RunnableManager is setup when Cocoon servlet is initialized, and 
this is happening on the web application startup. At that time, there is no any 
environemnt setup - as no Servlet.process() was called yet. If yours 
RunnableManager is setup during Servlet.process(), it's some other problem, IMHO.

Anyways, RunnableManager's pools MUST NOT inherit any variables, IMHO. I think 
you meant the same.


>> CocoonQuartzJobExecutor already has enterEnv/leaveEnv, so it should 
>> work, if you have issues with it - what are they?
> 
> Well, read the explanation above :-)

Did not made sense to me :-( I clearly see each time in the logs:

INFO  (2005-05-30) 23:18.23:295 [core.runnable] (Unknown-URI) 
Unknown-Thread/DefaultRunnableManager: ThreadPool named "daemon" created with no 
queue,max-pool-size=2147483647,min-pool-size=1,priority=5,isDaemon=true,keep-alive-time-ms=60000,block-policy=ABORT,shutdown-wait-time-ms=-1

So pool has been created before any http environment has been created, and from 
my reading of InheritableThreadLocal javadoc, it creates a copy of self for the 
child thread, not reference - so threads are not sharing same ThreadLocal, so 
this pool should not ever get a http environment.

What's missing? :-?


> If a CronJob uses the SourceResolver to process a "cocoon:" URI, the 
> corresponding pipeline occasionally gets recycled in the middle of its 
> processing.

Is it reproducable with the samples? Probably adding pipeline with a delay will 
show this?


Sylvain Wallez wrote:
 >
 > Vadim, please read my explanation. Threads from the pool HAVE inherited
 > variables from the thread that processed the first http request, that
 > led to loading cocoon.xconf and created the thread pool.

I did read now (email is async!) :-)

I don't see the behavior you are describing. Pools are created way before that.

Vadim

Re: Weird multithreading bug in Cron block

Posted by Sylvain Wallez <sy...@apache.org>.
Vadim Gritsenko wrote:

> Sylvain Wallez wrote:
>
>> Sylvain Wallez wrote:
>>
>>> Weird, weird, weird! Anybody having a hint about why fireJob() is 
>>> doing this environment mixture?
>>
>>
>>
>> Actually fireJobAt() is broken also when using another test case. 
>
>
> What's wrong with it?
>
>
>> Desperately searching for the cause, I went back to basics, i.e. "new 
>> Thread(runnable).start()". Also broken, but helped me to finally find 
>> the cause :-)
>>
>> The problems lies in CocoonComponentManager.addForAutomaticRelease().
>>
>> The environmentStack is a CloningInheritableThreadLocal. That means 
>> that when we create a new thread, it inherits the environment stack 
>> of its parent thread.
>>
>> The result is that threads created by Cocoon *always* inherit an 
>> environment stack of at least size 1:
>> - in the cron block, that's the environment of the first http 
>> request, which created the Cocoon object
>> - for "new Thread()", that's the same as above, plus all sitemaps 
>> that we've been through when we create the thread.
>>
>> Now let's look at InvokeContext.getProcessingPipeline() (in 
>> treeprocessor): if this is an internal request, the pipeline object 
>> is added for automatic release. I guess this is to avoid memory leaks 
>> if ever we forget to call resolver.release() on a sitemap source.
>>
>> Following this path, let's go to 
>> CocoonComponentManager.addForAutomaticRelease(). The component that 
>> has to be autoreleased is added to a list attached to the *first* 
>> environment of the stack (because of "stack.get(0)"), and is 
>> therefore released when we exit this environment.
>>
>> Now what happens when we create a thread that runs in the background? 
>> The end of processing of the *http* request releases pipeline objects 
>> of all child threads of the servlet engine's thread (the one which 
>> processes the http request). If the background thread uses a 
>> "cocoon:" URL and is currently executing the corresponding pipeline, 
>> recycle() is called on all pipeline components and bang, NPEs all 
>> around the place!!
>>
>> And this leads to very random bugs: since servlet engines uses a 
>> thread pools, this erroneous pipeline release happens only when the 
>> servlet engine reuses the thread that intially loaded CocoonServlet. 
>> And NPEs happen when this first thread is used *and* a scheduler 
>> thread is executing a "cocoon:" pipeline. Weird...
>>
>> So the question is:
>> - why does the environment stack have to be inherited by child 
>> threads? Is it to keep the current context? Then isn't inheriting the 
>> current processor and uri context enough?
>> - why is the pipeline automatically released? Is to avoid memory leaks?
>>
>> Possible remedies would be to remove one of the above features, but I 
>> guest they're there for a reason.
>
>
> Remedy is to use thread pool(s), and not create local Threads - with 
> the exception of situation when local thread lives no longer than 
> original request.


I used the scheduler's thread pool, and that's when the problem appeared.

> If thread lives longer than request, use RunnableManager (or Cron), 
> which are using thread pools. Thread from the thread pool should be 
> set up with environment / processor, and it will be independent of 
> http environment which triggered the job.


Yes, I know that (having largely contributed to the CocoonJobExecutor). 
The problem is the inheritance of the enviroment stack between threads 
(either in the pool or brand new ones) and their parent, which happens 
to be the servlet engine's thread that processed the very first request 
in the case of RunnableManager.

> CocoonQuartzJobExecutor already has enterEnv/leaveEnv, so it should 
> work, if you have issues with it - what are they?


Well, read the explanation above :-)

If a CronJob uses the SourceResolver to process a "cocoon:" URI, the 
corresponding pipeline occasionally gets recycled in the middle of its 
processing.

Sylvain

-- 
Sylvain Wallez                        Anyware Technologies
http://apache.org/~sylvain            http://anyware-tech.com
Apache Software Foundation Member     Research & Technology Director


Re: Weird multithreading bug in Cron block

Posted by Vadim Gritsenko <va...@reverycodes.com>.
Sylvain Wallez wrote:
> Sylvain Wallez wrote:
> 
>> Weird, weird, weird! Anybody having a hint about why fireJob() is 
>> doing this environment mixture?
> 
> 
> Actually fireJobAt() is broken also when using another test case. 

What's wrong with it?


> Desperately searching for the cause, I went back to basics, i.e. "new 
> Thread(runnable).start()". Also broken, but helped me to finally find 
> the cause :-)
> 
> The problems lies in CocoonComponentManager.addForAutomaticRelease().
> 
> The environmentStack is a CloningInheritableThreadLocal. That means that 
> when we create a new thread, it inherits the environment stack of its 
> parent thread.
> 
> The result is that threads created by Cocoon *always* inherit an 
> environment stack of at least size 1:
> - in the cron block, that's the environment of the first http request, 
> which created the Cocoon object
> - for "new Thread()", that's the same as above, plus all sitemaps that 
> we've been through when we create the thread.
> 
> Now let's look at InvokeContext.getProcessingPipeline() (in 
> treeprocessor): if this is an internal request, the pipeline object is 
> added for automatic release. I guess this is to avoid memory leaks if 
> ever we forget to call resolver.release() on a sitemap source.
> 
> Following this path, let's go to 
> CocoonComponentManager.addForAutomaticRelease(). The component that has 
> to be autoreleased is added to a list attached to the *first* 
> environment of the stack (because of "stack.get(0)"), and is therefore 
> released when we exit this environment.
> 
> Now what happens when we create a thread that runs in the background? 
> The end of processing of the *http* request releases pipeline objects of 
> all child threads of the servlet engine's thread (the one which 
> processes the http request). If the background thread uses a "cocoon:" 
> URL and is currently executing the corresponding pipeline, recycle() is 
> called on all pipeline components and bang, NPEs all around the place!!
> 
> And this leads to very random bugs: since servlet engines uses a thread 
> pools, this erroneous pipeline release happens only when the servlet 
> engine reuses the thread that intially loaded CocoonServlet. And NPEs 
> happen when this first thread is used *and* a scheduler thread is 
> executing a "cocoon:" pipeline. Weird...
> 
> So the question is:
> - why does the environment stack have to be inherited by child threads? 
> Is it to keep the current context? Then isn't inheriting the current 
> processor and uri context enough?
> - why is the pipeline automatically released? Is to avoid memory leaks?
> 
> Possible remedies would be to remove one of the above features, but I 
> guest they're there for a reason.

Remedy is to use thread pool(s), and not create local Threads - with the 
exception of situation when local thread lives no longer than original request.

If thread lives longer than request, use RunnableManager (or Cron), which are 
using thread pools. Thread from the thread pool should be set up with 
environment / processor, and it will be independent of http environment which 
triggered the job.

CocoonQuartzJobExecutor already has enterEnv/leaveEnv, so it should work, if you 
have issues with it - what are they?

Vadim


> So what about adding CocoonComponentManager.clearEnvironmentStack(), 
> that could be called by CocoonQuartzJobExecutor, or even 
> DefaultThreadFactory (in o.a.c.c.thread) before running the job?
> 
> Thoughts?
> 
> Sylvain

Re: Weird multithreading bug in Cron block

Posted by Vadim Gritsenko <va...@reverycodes.com>.
Sylvain Wallez wrote:
> Vadim Gritsenko wrote:
> 
>> After second look, does not look too harmful, even though unnecessary.
> 
> 
> Man, I spend two full days chasing the bug and finding its cause. I 
> don't call this unnecessary...

Now your turn to read email ;-) My guess there is a problem in your setup.

Vadim

Re: Weird multithreading bug in Cron block

Posted by Sylvain Wallez <sy...@apache.org>.
Vadim Gritsenko wrote:

> After second look, does not look too harmful, even though unnecessary.


Man, I spend two full days chasing the bug and finding its cause. I 
don't call this unnecessary...

Sylvain

-- 
Sylvain Wallez                        Anyware Technologies
http://apache.org/~sylvain            http://anyware-tech.com
Apache Software Foundation Member     Research & Technology Director


Re: Weird multithreading bug in Cron block

Posted by Vadim Gritsenko <va...@reverycodes.com>.
Sylvain Wallez wrote:
> Carsten Ziegeler wrote:
> 
>> Sylvain Wallez wrote:
>>  
>>> So the question is:
>>> - why is the pipeline automatically released? Is to avoid memory leaks?
>>>   
>>
>> Yes, in 2.1.x there was no good place for explicitly releasing the
>> pipeline; this required some incompatible changes we did in 2.2; the
>> pipelines are released there explicitly and not automatically.
>> If the pipeline is not released properly, this means that all pipeline
>> components are not released, creating a hugh memory leak.
>>
> 
> I see, and cleaning up is better than letting memory leak.
> 
> But I have a problem with the InheritableThreadLocal, which can also 
> create leaks, as one of its effects is that child threads inherits the 
> environment stack of their parent thread at the time where they (the 
> children) were created.
> 
> If these child threads are created and terminated within the scope of 
> the parent's thread request, then this is no problem (e.g. in parallel 
> cinclude), but it is a problem for long-running background threads such 
> as those created by the RunnableManager (used by the scheduler)

RunnableManager does not create threads by itself.


> or, as in my case, by user code.

Please advise user to use RunnableManager, he should not (as far as possible), 
create threads.


> So IMO, we should change the way environment inheritance between threads 
> is managed:
> - if a child thread runs in the scope of its parent, then a special 
> CocoonThread class (extends Thread) should be used, which will copy the 
> needed environment data before launching its Runnable.
> - normal threads created with "new Thread()" inherit nothing and can 
> therefore work in a clean environment.
> 
> This approach requires two changes in the 2.1 code base (haven't checked 
> 2.2 yet):
> - DefaultIncludeCacheManager.load() in o.a.c.transformation.helpers

DefaultIncludeCacheManager has bigger problems than that; in one place it uses 
RunnableManager (line 116), while in another it uses new Thread (line 186). It's 
better be changed to RunnableManager, and thread.join() must go (line 234), so 
in the end there is no need for CocoonThread here.


> - maybe PortalManagerImpl.loadCoplet() in o.a.c.webapps.portal.components

Same, should be switched to RunnableManager.


> In these two places, the CocoonThread proposed above have to be used to 
> inherit the parent environment. In all other places where "new Thread()" 
> is called, inheriting the parent environment is not needed and worse, 
> can be harmful.
> 
> WDYT?

After second look, does not look too harmful, even though unnecessary. Might 
break some folks who are using Threads, though. Not sure if it can be changed in 
2.1...

Vadim

Re: Weird multithreading bug in Cron block

Posted by Sylvain Wallez <sy...@apache.org>.
Vadim Gritsenko wrote:

> Sylvain Wallez wrote:
>
>> Carsten Ziegeler wrote:
>>
>>> Sylvain Wallez wrote:
>>>
>>>> So IMO, we should change the way environment inheritance between 
>>>> threads is managed:
>>>> - if a child thread runs in the scope of its parent, then a special 
>>>> CocoonThread class (extends Thread) should be used, which will copy 
>>>> the needed environment data before launching its Runnable.
>>>> - normal threads created with "new Thread()" inherit nothing and 
>>>> can therefore work in a clean environment.
>>>>
>>>> This approach requires two changes in the 2.1 code base (haven't 
>>>> checked 2.2 yet):
>>>> - DefaultIncludeCacheManager.load() in o.a.c.transformation.helpers
>>>> - maybe PortalManagerImpl.loadCoplet() in 
>>>> o.a.c.webapps.portal.components
>>>>
>>>> In these two places, the CocoonThread proposed above have to be 
>>>> used to inherit the parent environment. In all other places where 
>>>> "new Thread()" is called, inheriting the parent environment is not 
>>>> needed and worse, can be harmful.
>>>>
>>>
>>> Looks ok for me; I always thought that we should create our own thread
>>> class, but never had time... ;)
>>>  
>>
>>
>> Great! I'm doing it right now as my customer is waiting for the 
>> bugfix :-)
>
>
> Please stop.
>
> RunnableManager does not create threads, it uses thread pool, threads 
> from the pool HAVE NO inherited variables.


Vadim, please read my explanation. Threads from the pool HAVE inherited 
variables from the thread that processed the first http request, that 
led to loading cocoon.xconf and created the thread pool.

Sylvain

-- 
Sylvain Wallez                        Anyware Technologies
http://apache.org/~sylvain            http://anyware-tech.com
Apache Software Foundation Member     Research & Technology Director


Re: Weird multithreading bug in Cron block

Posted by Vadim Gritsenko <va...@reverycodes.com>.
Sylvain Wallez wrote:
> Carsten Ziegeler wrote:
> 
>> Sylvain Wallez wrote:
>>
>>> So IMO, we should change the way environment inheritance between 
>>> threads is managed:
>>> - if a child thread runs in the scope of its parent, then a special 
>>> CocoonThread class (extends Thread) should be used, which will copy 
>>> the needed environment data before launching its Runnable.
>>> - normal threads created with "new Thread()" inherit nothing and can 
>>> therefore work in a clean environment.
>>>
>>> This approach requires two changes in the 2.1 code base (haven't 
>>> checked 2.2 yet):
>>> - DefaultIncludeCacheManager.load() in o.a.c.transformation.helpers
>>> - maybe PortalManagerImpl.loadCoplet() in 
>>> o.a.c.webapps.portal.components
>>>
>>> In these two places, the CocoonThread proposed above have to be used 
>>> to inherit the parent environment. In all other places where "new 
>>> Thread()" is called, inheriting the parent environment is not needed 
>>> and worse, can be harmful.
>>>
>>
>> Looks ok for me; I always thought that we should create our own thread
>> class, but never had time... ;)
>>  
> 
> Great! I'm doing it right now as my customer is waiting for the bugfix :-)

Please stop.

RunnableManager does not create threads, it uses thread pool, threads from the 
pool HAVE NO inherited variables.

Vadim

Re: Weird multithreading bug in Cron block

Posted by Sylvain Wallez <sy...@apache.org>.
Carsten Ziegeler wrote:

>Sylvain Wallez wrote:
>
>  
>
>>So IMO, we should change the way environment inheritance between threads 
>>is managed:
>>- if a child thread runs in the scope of its parent, then a special 
>>CocoonThread class (extends Thread) should be used, which will copy the 
>>needed environment data before launching its Runnable.
>>- normal threads created with "new Thread()" inherit nothing and can 
>>therefore work in a clean environment.
>>
>>This approach requires two changes in the 2.1 code base (haven't checked 
>>2.2 yet):
>>- DefaultIncludeCacheManager.load() in o.a.c.transformation.helpers
>>- maybe PortalManagerImpl.loadCoplet() in o.a.c.webapps.portal.components
>>
>>In these two places, the CocoonThread proposed above have to be used to 
>>inherit the parent environment. In all other places where "new Thread()" 
>>is called, inheriting the parent environment is not needed and worse, 
>>can be harmful.
>>
>>    
>>
>Looks ok for me; I always thought that we should create our own thread
>class, but never had time... ;)
>  
>

Great! I'm doing it right now as my customer is waiting for the bugfix :-)

Sylvain

-- 
Sylvain Wallez                        Anyware Technologies
http://apache.org/~sylvain            http://anyware-tech.com
Apache Software Foundation Member     Research & Technology Director


Re: Weird multithreading bug in Cron block

Posted by Carsten Ziegeler <cz...@apache.org>.
Sylvain Wallez wrote:

> So IMO, we should change the way environment inheritance between threads 
> is managed:
> - if a child thread runs in the scope of its parent, then a special 
> CocoonThread class (extends Thread) should be used, which will copy the 
> needed environment data before launching its Runnable.
> - normal threads created with "new Thread()" inherit nothing and can 
> therefore work in a clean environment.
> 
> This approach requires two changes in the 2.1 code base (haven't checked 
> 2.2 yet):
> - DefaultIncludeCacheManager.load() in o.a.c.transformation.helpers
> - maybe PortalManagerImpl.loadCoplet() in o.a.c.webapps.portal.components
> 
> In these two places, the CocoonThread proposed above have to be used to 
> inherit the parent environment. In all other places where "new Thread()" 
> is called, inheriting the parent environment is not needed and worse, 
> can be harmful.
> 
Looks ok for me; I always thought that we should create our own thread
class, but never had time... ;)

Carsten

-- 
Carsten Ziegeler - Open Source Group, S&N AG
http://www.s-und-n.de
http://www.osoco.org/weblogs/rael/

Re: Weird multithreading bug in Cron block

Posted by Sylvain Wallez <sy...@apache.org>.
Carsten Ziegeler wrote:

>Sylvain Wallez wrote:
>  
>
>>So the question is:
>>- why is the pipeline automatically released? Is to avoid memory leaks?
>>    
>>
>Yes, in 2.1.x there was no good place for explicitly releasing the
>pipeline; this required some incompatible changes we did in 2.2; the
>pipelines are released there explicitly and not automatically.
>If the pipeline is not released properly, this means that all pipeline
>components are not released, creating a hugh memory leak.
>  
>

I see, and cleaning up is better than letting memory leak.

But I have a problem with the InheritableThreadLocal, which can also 
create leaks, as one of its effects is that child threads inherits the 
environment stack of their parent thread at the time where they (the 
children) were created.

If these child threads are created and terminated within the scope of 
the parent's thread request, then this is no problem (e.g. in parallel 
cinclude), but it is a problem for long-running background threads such 
as those created by the RunnableManager (used by the scheduler) or, as 
in my case, by user code.

So IMO, we should change the way environment inheritance between threads 
is managed:
- if a child thread runs in the scope of its parent, then a special 
CocoonThread class (extends Thread) should be used, which will copy the 
needed environment data before launching its Runnable.
- normal threads created with "new Thread()" inherit nothing and can 
therefore work in a clean environment.

This approach requires two changes in the 2.1 code base (haven't checked 
2.2 yet):
- DefaultIncludeCacheManager.load() in o.a.c.transformation.helpers
- maybe PortalManagerImpl.loadCoplet() in o.a.c.webapps.portal.components

In these two places, the CocoonThread proposed above have to be used to 
inherit the parent environment. In all other places where "new Thread()" 
is called, inheriting the parent environment is not needed and worse, 
can be harmful.

WDYT?

Sylvain

-- 
Sylvain Wallez                        Anyware Technologies
http://apache.org/~sylvain            http://anyware-tech.com
Apache Software Foundation Member     Research & Technology Director


Re: Weird multithreading bug in Cron block

Posted by Carsten Ziegeler <cz...@apache.org>.
Sylvain Wallez wrote:
> So the question is:
> - why is the pipeline automatically released? Is to avoid memory leaks?
> 
Yes, in 2.1.x there was no good place for explicitly releasing the
pipeline; this required some incompatible changes we did in 2.2; the
pipelines are released there explicitly and not automatically.
If the pipeline is not released properly, this means that all pipeline
components are not released, creating a hugh memory leak.

Carsten

-- 
Carsten Ziegeler - Open Source Group, S&N AG
http://www.s-und-n.de
http://www.osoco.org/weblogs/rael/

Re: Weird multithreading bug in Cron block

Posted by Sylvain Wallez <sy...@apache.org>.
Sylvain Wallez wrote:

> Hi all,
>
> I'm currently working on a publication application with complex 
> database queries where we want to prefetch some of the pages linked to 
> by the page currently being produced, in order to speed up response 
> time on pages that are likely to be asked for by users.
>
> To achieve this, we have a "PrefetchTransformer" that grabs elements 
> having a prefetch="true" attribute and starts a background job to load 
> the corresponding "src" or "href" URL using a "cocoon:".
>
> At first I used JobScheduler.fireJob() to schedule for immediate 
> execution, but went into *weird* bugs with strange NPEs all around in 
> pipeline components. After analysis, it appeared that while the 
> scheduler thread was processing the pipeline, the http thread was 
> recycling the *background environment*, thus nulling the object model 
> and other class attributes used by pipeline components.
>
> I spent the *whole day* trying to find the cause for this, without 
> success (how frustrating).
>
> Then I decided to try another approach and use 
> JobScheduler.fireJobAt(new Date()), meaning "schedule the job for 
> later execution... now!". And it worked!
>
> Weird, weird, weird! Anybody having a hint about why fireJob() is 
> doing this environment mixture?


Actually fireJobAt() is broken also when using another test case. 
Desperately searching for the cause, I went back to basics, i.e. "new 
Thread(runnable).start()". Also broken, but helped me to finally find 
the cause :-)

The problems lies in CocoonComponentManager.addForAutomaticRelease().

The environmentStack is a CloningInheritableThreadLocal. That means that 
when we create a new thread, it inherits the environment stack of its 
parent thread.

The result is that threads created by Cocoon *always* inherit an 
environment stack of at least size 1:
- in the cron block, that's the environment of the first http request, 
which created the Cocoon object
- for "new Thread()", that's the same as above, plus all sitemaps that 
we've been through when we create the thread.

Now let's look at InvokeContext.getProcessingPipeline() (in 
treeprocessor): if this is an internal request, the pipeline object is 
added for automatic release. I guess this is to avoid memory leaks if 
ever we forget to call resolver.release() on a sitemap source.

Following this path, let's go to 
CocoonComponentManager.addForAutomaticRelease(). The component that has 
to be autoreleased is added to a list attached to the *first* 
environment of the stack (because of "stack.get(0)"), and is therefore 
released when we exit this environment.

Now what happens when we create a thread that runs in the background? 
The end of processing of the *http* request releases pipeline objects of 
all child threads of the servlet engine's thread (the one which 
processes the http request). If the background thread uses a "cocoon:" 
URL and is currently executing the corresponding pipeline, recycle() is 
called on all pipeline components and bang, NPEs all around the place!!

And this leads to very random bugs: since servlet engines uses a thread 
pools, this erroneous pipeline release happens only when the servlet 
engine reuses the thread that intially loaded CocoonServlet. And NPEs 
happen when this first thread is used *and* a scheduler thread is 
executing a "cocoon:" pipeline. Weird...

So the question is:
- why does the environment stack have to be inherited by child threads? 
Is it to keep the current context? Then isn't inheriting the current 
processor and uri context enough?
- why is the pipeline automatically released? Is to avoid memory leaks?

Possible remedies would be to remove one of the above features, but I 
guest they're there for a reason.

So what about adding CocoonComponentManager.clearEnvironmentStack(), 
that could be called by CocoonQuartzJobExecutor, or even 
DefaultThreadFactory (in o.a.c.c.thread) before running the job?

Thoughts?

Sylvain

-- 
Sylvain Wallez                        Anyware Technologies
http://apache.org/~sylvain            http://anyware-tech.com
Apache Software Foundation Member     Research & Technology Director