You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by poetwang <gi...@git.apache.org> on 2018/03/22 21:41:07 UTC

[GitHub] curator pull request #261: Fix race condition on cache in ServiceCacheImpl

GitHub user poetwang opened a pull request:

    https://github.com/apache/curator/pull/261

    Fix race condition on cache in ServiceCacheImpl

    There is a race condition on variable cache, both the main thread(from start) and the "Curator-ServiceProvider" thread(from childEvent-> addInstance->cache.clearDataBytes) try to access the data in cache.
    The following case will cause NPE in sonInstanceSerializer:
    main thread finishes the cache.start(true);
    and then an event(CHILD_ADDED or CHILD_UPDATED) comes, it(Curator-ServiceProvider thread) will try to clear the cache data(in addInstance).
    Then main thread continues to call addInstance on cached data, which is null. It will cause NPE on line 193.
    
    I think this is the cause of https://stackoverflow.com/questions/42007102/apache-curator-npe-in-jsoninstanceserializer .
    And it causes the druid-0.11.0 indexing tasks fail to start.
    `
    2018-03-22T12:44:17,004 ERROR [main] io.druid.cli.CliPeon - Error when starting up.  Failing.
    java.lang.reflect.InvocationTargetException
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_121]
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_121]
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
            at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
            at io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:414) ~[java-util-0.11.0.jar:0.11.0]
            at io.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:311) ~[java-util-0.11.0.jar:0.11.0]
            at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:156) ~[druid-api-0.11.0.jar:0.11.0]
            at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:101) [druid-services-0.11.0.jar:0.11.0]
            at io.druid.cli.CliPeon.run(CliPeon.java:283) [druid-services-0.11.0.jar:0.11.0]
            at io.druid.cli.Main.main(Main.java:108) [druid-services-0.11.0.jar:0.11.0]
    Caused by: java.lang.NullPointerException
            at org.codehaus.jackson.JsonFactory.createJsonParser(JsonFactory.java:604) ~[jackson-core-asl-1.9.13.jar:1.9.13]
            at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1973) ~[jackson-mapper-asl-1.9.13.jar:1.9.13]
            at org.apache.curator.x.discovery.details.JsonInstanceSerializer.deserialize(JsonInstanceSerializer.java:86) ~[curator-x-discovery-4.0.0.jar:?]
            at org.apache.curator.x.discovery.details.ServiceCacheImpl.addInstance(ServiceCacheImpl.java:200) ~[curator-x-discovery-4.0.0.jar:?]
            at org.apache.curator.x.discovery.details.ServiceCacheImpl.start(ServiceCacheImpl.java:102) ~[curator-x-discovery-4.0.0.jar:?]
            at org.apache.curator.x.discovery.details.ServiceProviderImpl.start(ServiceProviderImpl.java:75) ~[curator-x-discovery-4.0.0.jar:?]
            at io.druid.curator.discovery.ServerDiscoverySelector.start(ServerDiscoverySelector.java:132) ~[druid-server-0.11.0.jar:0.11.0]
            ... 10 more
    `


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/poetwang/curator fix-cache-race-condition

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/curator/pull/261.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #261
    
----
commit abf78960c7000478e90d220845d0eda873ec3c7e
Author: Wei Wang <we...@...>
Date:   2018-03-22T18:31:13Z

    Fix race condition on cache in ServiceCacheImpl

----


---

[GitHub] curator issue #261: CURATOR-452 Fix race condition on cache in ServiceCacheI...

Posted by poetwang <gi...@git.apache.org>.
Github user poetwang commented on the issue:

    https://github.com/apache/curator/pull/261
  
    @Randgalt I think this is the bug mentioned in CURATOR-452, so I just link it to the issue.
    https://issues.apache.org/jira/browse/CURATOR-452


---

[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...

Posted by Randgalt <gi...@git.apache.org>.
Github user Randgalt commented on the issue:

    https://github.com/apache/curator/pull/261
  
    Can you create a test that shows the problem and include it in this PR?


---

[GitHub] curator pull request #261: [CURATOR-452] Fix race condition on cache in Serv...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/curator/pull/261


---

[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...

Posted by Randgalt <gi...@git.apache.org>.
Github user Randgalt commented on the issue:

    https://github.com/apache/curator/pull/261
  
    Sorry for the delay - I'll try to get to this soon.


---

[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...

Posted by Randgalt <gi...@git.apache.org>.
Github user Randgalt commented on the issue:

    https://github.com/apache/curator/pull/261
  
    Thank you - one of the committers will try to get this soon. Please be patient. 


---

[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...

Posted by poetwang <gi...@git.apache.org>.
Github user poetwang commented on the issue:

    https://github.com/apache/curator/pull/261
  
    @Randgalt It is really hard to test without breaking the start method into tow parts. I tried my best to do so.


---

[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...

Posted by poetwang <gi...@git.apache.org>.
Github user poetwang commented on the issue:

    https://github.com/apache/curator/pull/261
  
    The event (CHILD_ADDED or CHILD_UPDATED) comes because PathChildrenCache will force refresh when ConnectionState turns to CONNECTED(connectionStateListener), 


---

[GitHub] curator issue #261: Fix race condition on cache in ServiceCacheImpl

Posted by Randgalt <gi...@git.apache.org>.
Github user Randgalt commented on the issue:

    https://github.com/apache/curator/pull/261
  
    Please create an issue in our Jira. https://cwiki.apache.org/confluence/display/CURATOR/Submitting+Pull+Requests


---

[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...

Posted by cmckn <gi...@git.apache.org>.
Github user cmckn commented on the issue:

    https://github.com/apache/curator/pull/261
  
    I realize this issue has been closed, but I'm still seeing unexpected exceptions in some of my applications during `ServiceCacheImpl#start`.
    The merged changes to address this added a null-check around calls to `ServiceCacheImpl#addInstance` within `ServiceCacheImpl#start` (correct me if this is wrong).
    From what I've seen in my own apps, the issue is not caused by a `null` `char[]` being passed to `ServiceCacheImpl#addInstance` -- it is caused by an *empty* `char[]` being passed. This empty array causes a `NullPointerException` to be thrown by my deserializer, or (if the empty case is handled there, returning `null` from the deserializer) causes a `NullPointerException` when the `null` instance is put in `instances` (a `ConcurrentMap`).
    The implementation of `ServiceCacheImpl#addInstance` below is what resolved things for me, patched into the `4.0.1` release. The `if` at the open of the function is the only change. Just wanted to throw this out there, because I've only been able to resolve the issue by using this patch -- no dice on the current `master` branch. This patch doesn't appear to cause any issues, but I'm not sure if the `cache.clearDataBytes...` call should be performed regardless of whether or not the `byte[]` is empty (i.e. I'm not sure if bailing out of the function at this particular step is appropriate or will have side-effects).
    
    ```
    private void addInstance(ChildData childData, boolean onlyIfAbsent) throws Exception
        {
    +    	if( childData.getData() == null || childData.getData().length == 0 )
    +    	{
    +    		return;
    +    	}
            String                  instanceId = instanceIdFromData(childData);
            ServiceInstance<T>      serviceInstance = discovery.getSerializer().deserialize(childData.getData());
            if ( onlyIfAbsent )
            {
                instances.putIfAbsent(instanceId, serviceInstance);
            }
            else
            {
                instances.put(instanceId, serviceInstance);
            }
            cache.clearDataBytes(childData.getPath(), childData.getStat().getVersion());
        }
    ```
    Happy to provide more information if needed.


---

[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...

Posted by poetwang <gi...@git.apache.org>.
Github user poetwang commented on the issue:

    https://github.com/apache/curator/pull/261
  
    Glad to hear that. Thanks


---

[GitHub] curator issue #261: Fix race condition on cache in ServiceCacheImpl

Posted by poetwang <gi...@git.apache.org>.
Github user poetwang commented on the issue:

    https://github.com/apache/curator/pull/261
  
    thanks, will do


---

[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...

Posted by egor-ryashin <gi...@git.apache.org>.
Github user egor-ryashin commented on the issue:

    https://github.com/apache/curator/pull/261
  
    This bug is constantly reproduced in our production environment.


---