You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cxf.apache.org by Alessio Soldano <as...@redhat.com> on 2016/09/14 20:55:58 UTC

Concurrency issue with EHCacheTokenStore

Hi,

I'm currently seeing an intermittent issue in the JBossWS-CXF testsuite 
(stacktrace at https://paste.fedoraproject.org/428145/14738847/raw/ ), 
with the EHCacheTokenStore creation failing due to the CacheManager 
having been shutdown. The testsuite includes multiple tests, almost all 
of them create jaxws clients and in most of them the current thread bus 
is used (few of them do create a new bus, set it as default thread bus, 
run and eventually shutdown the bus). What I suspect is some kind of 
concurrency issue in the CacheManager lifecycle management.

I've looked a bit at the code and noticed that there's basically a 1-1 
relationship between Bus instances and CacheManager instances. Given I 
have some tests that do not explicitly shutdown the bus (or the client) 
after execution, it can happen that a client is closed because the JDK 
eventually finalize ClientProxy, which in the end causes the 
CacheCleanupListener to close the token store and hence to 
release/shutdown the cache manager (see the invocation flow at 
https://paste.fedoraproject.org/428150/47388530/raw/ ). Unfortunately 
that exact cache manager could possibly be in use to serve another 
client running in the same bus. AFAICS, there's an attempt to avoid 
problems like this in WSS4J's EHCacheManagerHolder (which deals with CXF 
requests of creating/releasing cache managers), as it has a 
ConcurrentHashMap<String, AtomicInteger> attribute to keep track of how 
many consumers of a given cache manager are there and avoid shutting 
down a manager if it's still in use. Looking at its getCacheManager and 
releaseCacheManager methods I can see a possible concurrency flaw which 
could be the root of my failure. The releaseCacheManager method could be 
called with cacheManager X as parameter while a different thread is 
running getCacheManager and is just before line 106 (that is just before 
the AtomicInteger is got from the map) with local cacheManager variable 
already resolved to X. That should later deal to an attempt to use an 
already shutdown cache manager. I would be tempted to suggest making 
those two methods syncronized (the map could then probably be a plain 
hash map).

WDYT? I might be missing something, so posting here before opening up a 
jira. Any idea?

Cheers

Alessio


-- 
Alessio Soldano
Web Service Lead, JBoss


Re: Concurrency issue with EHCacheTokenStore

Posted by Colm O hEigeartaigh <co...@apache.org>.
Please go ahead and commit the fix Alessio.

Colm.

On Mon, Sep 19, 2016 at 9:18 AM, Alessio Soldano <as...@redhat.com>
wrote:

> ok, no failures during the weekend testsuite runs.
> Ive created https://issues.apache.org/jira/browse/WSS-587 and here is the
> patch I've tried https://github.com/asoldano/ws
> s4j/commit/5a7897f7440940a11c0c853fbb8fb26c644fa898.diff .
> Colm, if that's fine with you I can go ahead and commit and/or send a PR.
>
> Cheers
> Alessio
>
> Il 16/09/2016 22:41, Alessio Soldano ha scritto:
>
>> OK, I have a patched wss4j 2.1.8-asoldano-SNAPSHOT on the snapshot
>> repository and I'm letting the CI server here run with it for few days.
>> Let's see if the failures pop up or not...
>>
>> Cheers
>> Alessio
>>
>> Il 15/09/2016 11:20, Alessio Soldano ha scritto:
>>
>>> mmh... I need to build a patched wss4j snapshot and have it consumed by
>>> the remote machine that is reproducing the issue a bit more frequently
>>> (locally it's very rare). Will let you know :-)
>>>
>>> Il 15/09/2016 10:35, Colm O hEigeartaigh ha scritto:
>>>
>>>> Hi Alessio,
>>>>
>>>> Yes, that makes sense to me. If you perform the fix locally, do the
>>>> intermittent failures go away?
>>>>
>>>> Colm.
>>>>
>>>> On Wed, Sep 14, 2016 at 9:55 PM, Alessio Soldano <as...@redhat.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> I'm currently seeing an intermittent issue in the JBossWS-CXF testsuite
>>>>> (stacktrace at https://paste.fedoraproject.org/428145/14738847/raw/ ),
>>>>> with the EHCacheTokenStore creation failing due to the CacheManager
>>>>> having
>>>>> been shutdown. The testsuite includes multiple tests, almost all of
>>>>> them
>>>>> create jaxws clients and in most of them the current thread bus is used
>>>>> (few of them do create a new bus, set it as default thread bus, run and
>>>>> eventually shutdown the bus). What I suspect is some kind of
>>>>> concurrency
>>>>> issue in the CacheManager lifecycle management.
>>>>>
>>>>> I've looked a bit at the code and noticed that there's basically a 1-1
>>>>> relationship between Bus instances and CacheManager instances. Given I
>>>>> have
>>>>> some tests that do not explicitly shutdown the bus (or the client)
>>>>> after
>>>>> execution, it can happen that a client is closed because the JDK
>>>>> eventually
>>>>> finalize ClientProxy, which in the end causes the CacheCleanupListener
>>>>> to
>>>>> close the token store and hence to release/shutdown the cache manager
>>>>> (see
>>>>> the invocation flow at https://paste.fedoraproject.or
>>>>> g/428150/47388530/raw/ ). Unfortunately that exact cache manager could
>>>>> possibly be in use to serve another client running in the same bus.
>>>>> AFAICS,
>>>>> there's an attempt to avoid problems like this in WSS4J's
>>>>> EHCacheManagerHolder (which deals with CXF requests of
>>>>> creating/releasing
>>>>> cache managers), as it has a ConcurrentHashMap<String, AtomicInteger>
>>>>> attribute to keep track of how many consumers of a given cache manager
>>>>> are
>>>>> there and avoid shutting down a manager if it's still in use. Looking
>>>>> at
>>>>> its getCacheManager and releaseCacheManager methods I can see a
>>>>> possible
>>>>> concurrency flaw which could be the root of my failure. The
>>>>> releaseCacheManager method could be called with cacheManager X as
>>>>> parameter
>>>>> while a different thread is running getCacheManager and is just before
>>>>> line
>>>>> 106 (that is just before the AtomicInteger is got from the map) with
>>>>> local
>>>>> cacheManager variable already resolved to X. That should later deal to
>>>>> an
>>>>> attempt to use an already shutdown cache manager. I would be tempted to
>>>>> suggest making those two methods syncronized (the map could then
>>>>> probably
>>>>> be a plain hash map).
>>>>>
>>>>> WDYT? I might be missing something, so posting here before opening up a
>>>>> jira. Any idea?
>>>>>
>>>>> Cheers
>>>>>
>>>>> Alessio
>>>>>
>>>>>
>>>>> --
>>>>> Alessio Soldano
>>>>> Web Service Lead, JBoss
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
> --
> Alessio Soldano
> Web Service Lead, JBoss
>
>


-- 
Colm O hEigeartaigh

Talend Community Coder
http://coders.talend.com

Re: Concurrency issue with EHCacheTokenStore

Posted by Alessio Soldano <as...@redhat.com>.
ok, no failures during the weekend testsuite runs.
Ive created https://issues.apache.org/jira/browse/WSS-587 and here is 
the patch I've tried 
https://github.com/asoldano/wss4j/commit/5a7897f7440940a11c0c853fbb8fb26c644fa898.diff 
.
Colm, if that's fine with you I can go ahead and commit and/or send a PR.

Cheers
Alessio

Il 16/09/2016 22:41, Alessio Soldano ha scritto:
> OK, I have a patched wss4j 2.1.8-asoldano-SNAPSHOT on the snapshot 
> repository and I'm letting the CI server here run with it for few 
> days. Let's see if the failures pop up or not...
>
> Cheers
> Alessio
>
> Il 15/09/2016 11:20, Alessio Soldano ha scritto:
>> mmh... I need to build a patched wss4j snapshot and have it consumed 
>> by the remote machine that is reproducing the issue a bit more 
>> frequently (locally it's very rare). Will let you know :-)
>>
>> Il 15/09/2016 10:35, Colm O hEigeartaigh ha scritto:
>>> Hi Alessio,
>>>
>>> Yes, that makes sense to me. If you perform the fix locally, do the
>>> intermittent failures go away?
>>>
>>> Colm.
>>>
>>> On Wed, Sep 14, 2016 at 9:55 PM, Alessio Soldano <as...@redhat.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm currently seeing an intermittent issue in the JBossWS-CXF 
>>>> testsuite
>>>> (stacktrace at https://paste.fedoraproject.org/428145/14738847/raw/ ),
>>>> with the EHCacheTokenStore creation failing due to the CacheManager 
>>>> having
>>>> been shutdown. The testsuite includes multiple tests, almost all of 
>>>> them
>>>> create jaxws clients and in most of them the current thread bus is 
>>>> used
>>>> (few of them do create a new bus, set it as default thread bus, run 
>>>> and
>>>> eventually shutdown the bus). What I suspect is some kind of 
>>>> concurrency
>>>> issue in the CacheManager lifecycle management.
>>>>
>>>> I've looked a bit at the code and noticed that there's basically a 1-1
>>>> relationship between Bus instances and CacheManager instances. 
>>>> Given I have
>>>> some tests that do not explicitly shutdown the bus (or the client) 
>>>> after
>>>> execution, it can happen that a client is closed because the JDK 
>>>> eventually
>>>> finalize ClientProxy, which in the end causes the 
>>>> CacheCleanupListener to
>>>> close the token store and hence to release/shutdown the cache 
>>>> manager (see
>>>> the invocation flow at https://paste.fedoraproject.or
>>>> g/428150/47388530/raw/ ). Unfortunately that exact cache manager could
>>>> possibly be in use to serve another client running in the same bus. 
>>>> AFAICS,
>>>> there's an attempt to avoid problems like this in WSS4J's
>>>> EHCacheManagerHolder (which deals with CXF requests of 
>>>> creating/releasing
>>>> cache managers), as it has a ConcurrentHashMap<String, AtomicInteger>
>>>> attribute to keep track of how many consumers of a given cache 
>>>> manager are
>>>> there and avoid shutting down a manager if it's still in use. 
>>>> Looking at
>>>> its getCacheManager and releaseCacheManager methods I can see a 
>>>> possible
>>>> concurrency flaw which could be the root of my failure. The
>>>> releaseCacheManager method could be called with cacheManager X as 
>>>> parameter
>>>> while a different thread is running getCacheManager and is just 
>>>> before line
>>>> 106 (that is just before the AtomicInteger is got from the map) 
>>>> with local
>>>> cacheManager variable already resolved to X. That should later deal 
>>>> to an
>>>> attempt to use an already shutdown cache manager. I would be 
>>>> tempted to
>>>> suggest making those two methods syncronized (the map could then 
>>>> probably
>>>> be a plain hash map).
>>>>
>>>> WDYT? I might be missing something, so posting here before opening 
>>>> up a
>>>> jira. Any idea?
>>>>
>>>> Cheers
>>>>
>>>> Alessio
>>>>
>>>>
>>>> -- 
>>>> Alessio Soldano
>>>> Web Service Lead, JBoss
>>>>
>>>>
>>>
>>
>>
>
>


-- 
Alessio Soldano
Web Service Lead, JBoss


Re: Concurrency issue with EHCacheTokenStore

Posted by Alessio Soldano <as...@redhat.com>.
OK, I have a patched wss4j 2.1.8-asoldano-SNAPSHOT on the snapshot 
repository and I'm letting the CI server here run with it for few days. 
Let's see if the failures pop up or not...

Cheers
Alessio

Il 15/09/2016 11:20, Alessio Soldano ha scritto:
> mmh... I need to build a patched wss4j snapshot and have it consumed 
> by the remote machine that is reproducing the issue a bit more 
> frequently (locally it's very rare). Will let you know :-)
>
> Il 15/09/2016 10:35, Colm O hEigeartaigh ha scritto:
>> Hi Alessio,
>>
>> Yes, that makes sense to me. If you perform the fix locally, do the
>> intermittent failures go away?
>>
>> Colm.
>>
>> On Wed, Sep 14, 2016 at 9:55 PM, Alessio Soldano <as...@redhat.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm currently seeing an intermittent issue in the JBossWS-CXF testsuite
>>> (stacktrace at https://paste.fedoraproject.org/428145/14738847/raw/ ),
>>> with the EHCacheTokenStore creation failing due to the CacheManager 
>>> having
>>> been shutdown. The testsuite includes multiple tests, almost all of 
>>> them
>>> create jaxws clients and in most of them the current thread bus is used
>>> (few of them do create a new bus, set it as default thread bus, run and
>>> eventually shutdown the bus). What I suspect is some kind of 
>>> concurrency
>>> issue in the CacheManager lifecycle management.
>>>
>>> I've looked a bit at the code and noticed that there's basically a 1-1
>>> relationship between Bus instances and CacheManager instances. Given 
>>> I have
>>> some tests that do not explicitly shutdown the bus (or the client) 
>>> after
>>> execution, it can happen that a client is closed because the JDK 
>>> eventually
>>> finalize ClientProxy, which in the end causes the 
>>> CacheCleanupListener to
>>> close the token store and hence to release/shutdown the cache 
>>> manager (see
>>> the invocation flow at https://paste.fedoraproject.or
>>> g/428150/47388530/raw/ ). Unfortunately that exact cache manager could
>>> possibly be in use to serve another client running in the same bus. 
>>> AFAICS,
>>> there's an attempt to avoid problems like this in WSS4J's
>>> EHCacheManagerHolder (which deals with CXF requests of 
>>> creating/releasing
>>> cache managers), as it has a ConcurrentHashMap<String, AtomicInteger>
>>> attribute to keep track of how many consumers of a given cache 
>>> manager are
>>> there and avoid shutting down a manager if it's still in use. 
>>> Looking at
>>> its getCacheManager and releaseCacheManager methods I can see a 
>>> possible
>>> concurrency flaw which could be the root of my failure. The
>>> releaseCacheManager method could be called with cacheManager X as 
>>> parameter
>>> while a different thread is running getCacheManager and is just 
>>> before line
>>> 106 (that is just before the AtomicInteger is got from the map) with 
>>> local
>>> cacheManager variable already resolved to X. That should later deal 
>>> to an
>>> attempt to use an already shutdown cache manager. I would be tempted to
>>> suggest making those two methods syncronized (the map could then 
>>> probably
>>> be a plain hash map).
>>>
>>> WDYT? I might be missing something, so posting here before opening up a
>>> jira. Any idea?
>>>
>>> Cheers
>>>
>>> Alessio
>>>
>>>
>>> -- 
>>> Alessio Soldano
>>> Web Service Lead, JBoss
>>>
>>>
>>
>
>


-- 
Alessio Soldano
Web Service Lead, JBoss


Re: Concurrency issue with EHCacheTokenStore

Posted by Alessio Soldano <as...@redhat.com>.
mmh... I need to build a patched wss4j snapshot and have it consumed by 
the remote machine that is reproducing the issue a bit more frequently 
(locally it's very rare). Will let you know :-)

Il 15/09/2016 10:35, Colm O hEigeartaigh ha scritto:
> Hi Alessio,
>
> Yes, that makes sense to me. If you perform the fix locally, do the
> intermittent failures go away?
>
> Colm.
>
> On Wed, Sep 14, 2016 at 9:55 PM, Alessio Soldano <as...@redhat.com>
> wrote:
>
>> Hi,
>>
>> I'm currently seeing an intermittent issue in the JBossWS-CXF testsuite
>> (stacktrace at https://paste.fedoraproject.org/428145/14738847/raw/ ),
>> with the EHCacheTokenStore creation failing due to the CacheManager having
>> been shutdown. The testsuite includes multiple tests, almost all of them
>> create jaxws clients and in most of them the current thread bus is used
>> (few of them do create a new bus, set it as default thread bus, run and
>> eventually shutdown the bus). What I suspect is some kind of concurrency
>> issue in the CacheManager lifecycle management.
>>
>> I've looked a bit at the code and noticed that there's basically a 1-1
>> relationship between Bus instances and CacheManager instances. Given I have
>> some tests that do not explicitly shutdown the bus (or the client) after
>> execution, it can happen that a client is closed because the JDK eventually
>> finalize ClientProxy, which in the end causes the CacheCleanupListener to
>> close the token store and hence to release/shutdown the cache manager (see
>> the invocation flow at https://paste.fedoraproject.or
>> g/428150/47388530/raw/ ). Unfortunately that exact cache manager could
>> possibly be in use to serve another client running in the same bus. AFAICS,
>> there's an attempt to avoid problems like this in WSS4J's
>> EHCacheManagerHolder (which deals with CXF requests of creating/releasing
>> cache managers), as it has a ConcurrentHashMap<String, AtomicInteger>
>> attribute to keep track of how many consumers of a given cache manager are
>> there and avoid shutting down a manager if it's still in use. Looking at
>> its getCacheManager and releaseCacheManager methods I can see a possible
>> concurrency flaw which could be the root of my failure. The
>> releaseCacheManager method could be called with cacheManager X as parameter
>> while a different thread is running getCacheManager and is just before line
>> 106 (that is just before the AtomicInteger is got from the map) with local
>> cacheManager variable already resolved to X. That should later deal to an
>> attempt to use an already shutdown cache manager. I would be tempted to
>> suggest making those two methods syncronized (the map could then probably
>> be a plain hash map).
>>
>> WDYT? I might be missing something, so posting here before opening up a
>> jira. Any idea?
>>
>> Cheers
>>
>> Alessio
>>
>>
>> --
>> Alessio Soldano
>> Web Service Lead, JBoss
>>
>>
>


-- 
Alessio Soldano
Web Service Lead, JBoss


Re: Concurrency issue with EHCacheTokenStore

Posted by Colm O hEigeartaigh <co...@apache.org>.
Hi Alessio,

Yes, that makes sense to me. If you perform the fix locally, do the
intermittent failures go away?

Colm.

On Wed, Sep 14, 2016 at 9:55 PM, Alessio Soldano <as...@redhat.com>
wrote:

> Hi,
>
> I'm currently seeing an intermittent issue in the JBossWS-CXF testsuite
> (stacktrace at https://paste.fedoraproject.org/428145/14738847/raw/ ),
> with the EHCacheTokenStore creation failing due to the CacheManager having
> been shutdown. The testsuite includes multiple tests, almost all of them
> create jaxws clients and in most of them the current thread bus is used
> (few of them do create a new bus, set it as default thread bus, run and
> eventually shutdown the bus). What I suspect is some kind of concurrency
> issue in the CacheManager lifecycle management.
>
> I've looked a bit at the code and noticed that there's basically a 1-1
> relationship between Bus instances and CacheManager instances. Given I have
> some tests that do not explicitly shutdown the bus (or the client) after
> execution, it can happen that a client is closed because the JDK eventually
> finalize ClientProxy, which in the end causes the CacheCleanupListener to
> close the token store and hence to release/shutdown the cache manager (see
> the invocation flow at https://paste.fedoraproject.or
> g/428150/47388530/raw/ ). Unfortunately that exact cache manager could
> possibly be in use to serve another client running in the same bus. AFAICS,
> there's an attempt to avoid problems like this in WSS4J's
> EHCacheManagerHolder (which deals with CXF requests of creating/releasing
> cache managers), as it has a ConcurrentHashMap<String, AtomicInteger>
> attribute to keep track of how many consumers of a given cache manager are
> there and avoid shutting down a manager if it's still in use. Looking at
> its getCacheManager and releaseCacheManager methods I can see a possible
> concurrency flaw which could be the root of my failure. The
> releaseCacheManager method could be called with cacheManager X as parameter
> while a different thread is running getCacheManager and is just before line
> 106 (that is just before the AtomicInteger is got from the map) with local
> cacheManager variable already resolved to X. That should later deal to an
> attempt to use an already shutdown cache manager. I would be tempted to
> suggest making those two methods syncronized (the map could then probably
> be a plain hash map).
>
> WDYT? I might be missing something, so posting here before opening up a
> jira. Any idea?
>
> Cheers
>
> Alessio
>
>
> --
> Alessio Soldano
> Web Service Lead, JBoss
>
>


-- 
Colm O hEigeartaigh

Talend Community Coder
http://coders.talend.com

Re: Concurrency issue with EHCacheTokenStore

Posted by Alessio Soldano <as...@redhat.com>.
Il 14/09/2016 22:55, Alessio Soldano ha scritto:
> That should later deal to an attempt to use an already shutdown cache 
> manager.
s/deal/lead/


-- 
Alessio Soldano
Web Service Lead, JBoss