You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by poetwang <gi...@git.apache.org> on 2018/03/22 21:41:07 UTC
[GitHub] curator pull request #261: Fix race condition on cache in ServiceCacheImpl
GitHub user poetwang opened a pull request:
https://github.com/apache/curator/pull/261
Fix race condition on cache in ServiceCacheImpl
There is a race condition on variable cache, both the main thread(from start) and the "Curator-ServiceProvider" thread(from childEvent-> addInstance->cache.clearDataBytes) try to access the data in cache.
The following case will cause NPE in sonInstanceSerializer:
main thread finishes the cache.start(true);
and then an event(CHILD_ADDED or CHILD_UPDATED) comes, it(Curator-ServiceProvider thread) will try to clear the cache data(in addInstance).
Then main thread continues to call addInstance on cached data, which is null. It will cause NPE on line 193.
I think this is the cause of https://stackoverflow.com/questions/42007102/apache-curator-npe-in-jsoninstanceserializer .
And it causes the druid-0.11.0 indexing tasks fail to start.
`
2018-03-22T12:44:17,004 ERROR [main] io.druid.cli.CliPeon - Error when starting up. Failing.
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_121]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_121]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
at io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:414) ~[java-util-0.11.0.jar:0.11.0]
at io.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:311) ~[java-util-0.11.0.jar:0.11.0]
at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:156) ~[druid-api-0.11.0.jar:0.11.0]
at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:101) [druid-services-0.11.0.jar:0.11.0]
at io.druid.cli.CliPeon.run(CliPeon.java:283) [druid-services-0.11.0.jar:0.11.0]
at io.druid.cli.Main.main(Main.java:108) [druid-services-0.11.0.jar:0.11.0]
Caused by: java.lang.NullPointerException
at org.codehaus.jackson.JsonFactory.createJsonParser(JsonFactory.java:604) ~[jackson-core-asl-1.9.13.jar:1.9.13]
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1973) ~[jackson-mapper-asl-1.9.13.jar:1.9.13]
at org.apache.curator.x.discovery.details.JsonInstanceSerializer.deserialize(JsonInstanceSerializer.java:86) ~[curator-x-discovery-4.0.0.jar:?]
at org.apache.curator.x.discovery.details.ServiceCacheImpl.addInstance(ServiceCacheImpl.java:200) ~[curator-x-discovery-4.0.0.jar:?]
at org.apache.curator.x.discovery.details.ServiceCacheImpl.start(ServiceCacheImpl.java:102) ~[curator-x-discovery-4.0.0.jar:?]
at org.apache.curator.x.discovery.details.ServiceProviderImpl.start(ServiceProviderImpl.java:75) ~[curator-x-discovery-4.0.0.jar:?]
at io.druid.curator.discovery.ServerDiscoverySelector.start(ServerDiscoverySelector.java:132) ~[druid-server-0.11.0.jar:0.11.0]
... 10 more
`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/poetwang/curator fix-cache-race-condition
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/curator/pull/261.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #261
----
commit abf78960c7000478e90d220845d0eda873ec3c7e
Author: Wei Wang <we...@...>
Date: 2018-03-22T18:31:13Z
Fix race condition on cache in ServiceCacheImpl
----
---
[GitHub] curator issue #261: CURATOR-452 Fix race condition on cache in ServiceCacheI...
Posted by poetwang <gi...@git.apache.org>.
Github user poetwang commented on the issue:
https://github.com/apache/curator/pull/261
@Randgalt I think this is the bug mentioned in CURATOR-452, so I just link it to the issue.
https://issues.apache.org/jira/browse/CURATOR-452
---
[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...
Posted by Randgalt <gi...@git.apache.org>.
Github user Randgalt commented on the issue:
https://github.com/apache/curator/pull/261
Can you create a test that shows the problem and include it in this PR?
---
[GitHub] curator pull request #261: [CURATOR-452] Fix race condition on cache in Serv...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/curator/pull/261
---
[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...
Posted by Randgalt <gi...@git.apache.org>.
Github user Randgalt commented on the issue:
https://github.com/apache/curator/pull/261
Sorry for the delay - I'll try to get to this soon.
---
[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...
Posted by Randgalt <gi...@git.apache.org>.
Github user Randgalt commented on the issue:
https://github.com/apache/curator/pull/261
Thank you - one of the committers will try to get this soon. Please be patient.
---
[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...
Posted by poetwang <gi...@git.apache.org>.
Github user poetwang commented on the issue:
https://github.com/apache/curator/pull/261
@Randgalt It is really hard to test without breaking the start method into tow parts. I tried my best to do so.
---
[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...
Posted by poetwang <gi...@git.apache.org>.
Github user poetwang commented on the issue:
https://github.com/apache/curator/pull/261
The event (CHILD_ADDED or CHILD_UPDATED) comes because PathChildrenCache will force refresh when ConnectionState turns to CONNECTED(connectionStateListener),
---
[GitHub] curator issue #261: Fix race condition on cache in ServiceCacheImpl
Posted by Randgalt <gi...@git.apache.org>.
Github user Randgalt commented on the issue:
https://github.com/apache/curator/pull/261
Please create an issue in our Jira. https://cwiki.apache.org/confluence/display/CURATOR/Submitting+Pull+Requests
---
[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...
Posted by cmckn <gi...@git.apache.org>.
Github user cmckn commented on the issue:
https://github.com/apache/curator/pull/261
I realize this issue has been closed, but I'm still seeing unexpected exceptions in some of my applications during `ServiceCacheImpl#start`.
The merged changes to address this added a null-check around calls to `ServiceCacheImpl#addInstance` within `ServiceCacheImpl#start` (correct me if this is wrong).
From what I've seen in my own apps, the issue is not caused by a `null` `char[]` being passed to `ServiceCacheImpl#addInstance` -- it is caused by an *empty* `char[]` being passed. This empty array causes a `NullPointerException` to be thrown by my deserializer, or (if the empty case is handled there, returning `null` from the deserializer) causes a `NullPointerException` when the `null` instance is put in `instances` (a `ConcurrentMap`).
The implementation of `ServiceCacheImpl#addInstance` below is what resolved things for me, patched into the `4.0.1` release. The `if` at the open of the function is the only change. Just wanted to throw this out there, because I've only been able to resolve the issue by using this patch -- no dice on the current `master` branch. This patch doesn't appear to cause any issues, but I'm not sure if the `cache.clearDataBytes...` call should be performed regardless of whether or not the `byte[]` is empty (i.e. I'm not sure if bailing out of the function at this particular step is appropriate or will have side-effects).
```
private void addInstance(ChildData childData, boolean onlyIfAbsent) throws Exception
{
+ if( childData.getData() == null || childData.getData().length == 0 )
+ {
+ return;
+ }
String instanceId = instanceIdFromData(childData);
ServiceInstance<T> serviceInstance = discovery.getSerializer().deserialize(childData.getData());
if ( onlyIfAbsent )
{
instances.putIfAbsent(instanceId, serviceInstance);
}
else
{
instances.put(instanceId, serviceInstance);
}
cache.clearDataBytes(childData.getPath(), childData.getStat().getVersion());
}
```
Happy to provide more information if needed.
---
[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...
Posted by poetwang <gi...@git.apache.org>.
Github user poetwang commented on the issue:
https://github.com/apache/curator/pull/261
Glad to hear that. Thanks
---
[GitHub] curator issue #261: Fix race condition on cache in ServiceCacheImpl
Posted by poetwang <gi...@git.apache.org>.
Github user poetwang commented on the issue:
https://github.com/apache/curator/pull/261
thanks, will do
---
[GitHub] curator issue #261: [CURATOR-452] Fix race condition on cache in ServiceCach...
Posted by egor-ryashin <gi...@git.apache.org>.
Github user egor-ryashin commented on the issue:
https://github.com/apache/curator/pull/261
This bug is constantly reproduced in our production environment.
---