You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2015/05/14 22:47:00 UTC

[jira] [Commented] (LUCENE-6482) Class loading deadlock relating to NamedSPILoader

    [ https://issues.apache.org/jira/browse/LUCENE-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544336#comment-14544336 ] 

Uwe Schindler commented on LUCENE-6482:
---------------------------------------

Hi,
the problem here has nothing to do with NamedSPI loader. The problem can be one of the following:
- there is one codec in the classpath that uses a wrong initialization like the issue you mentioned. The problematic thing is in most cases a Codec/PostingsFormat/DocvaluesFormat.forName() in a static initializer. We also have this in Lucene, but the order is important here. I don't like this, but code is not manageable otherwise. So order of static class initialization is important. If one of the codecs hangs in one of such clinit locks, the stack trace is easy. All those threads that seem to hang while RUNNING are blocked, because they access a class that is currently in initialization phase (as you said).
- Elasticsearch has several own codecs, maybe those had a bug as described before. 1.3.4 is a older one, maybe update to latest 1.3.9 version. We have never seen this with plain Lucene.

In addition, make sure that you use latest JVM versions. several Java 7 realeases had class loading bugs with deadlocks (e.g. 1.7.0_25). To me it looks more like one of those issues, because otherwise other people would have reported bugs like this already.

What is you Java version? Any special JVM settings?

Uwe

> Class loading deadlock relating to NamedSPILoader
> -------------------------------------------------
>
>                 Key: LUCENE-6482
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6482
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Shikhar Bhushan
>
> This issue came up for us several times with Elasticsearch 1.3.4 (Lucene 4.9.1), with many threads seeming deadlocked but RUNNABLE:
> {noformat}
> "elasticsearch[search77-es2][generic][T#43]" #160 daemon prio=5 os_prio=0 tid=0x00007f79180c5800 nid=0x3d1f in Object.wait() [0x00007f79d9289000]
>    java.lang.Thread.State: RUNNABLE
> 	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359)
> 	at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457)
> 	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912)
> 	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758)
> 	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453)
> 	at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98)
> 	at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126)
> 	at org.elasticsearch.index.store.Store.access$300(Store.java:76)
> 	at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465)
> 	at org.elasticsearch.index.store.Store$MetadataSnapshot.<init>(Store.java:456)
> 	at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281)
> 	at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186)
> 	at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140)
> 	at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61)
> 	at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277)
> 	at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268)
> 	at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> It didn't really make sense to see RUNNABLE threads in Object.wait(), but this seems to be symptomatic of deadlocks in static initialization (http://ternarysearch.blogspot.ru/2013/07/static-initialization-deadlock.html).
> I found LUCENE-5573 as an instance of this having come up with Lucene code before.
> I'm not sure what exactly is going on, but the deadlock in this case seems to involve these threads:
> {noformat}
> "elasticsearch[search77-es2][clusterService#updateTask][T#1]" #79 daemon prio=5 os_prio=0 tid=0x00007f7b155ff800 nid=0xd49 in Object.wait() [0x00007f79daed8000]
>    java.lang.Thread.State: RUNNABLE
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> 	at java.lang.Class.newInstance(Class.java:433)
> 	at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67)
> 	- locked <0x000000061fef4968> (a org.apache.lucene.util.NamedSPILoader)
> 	at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:47)
> 	at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:37)
> 	at org.apache.lucene.codecs.PostingsFormat.<clinit>(PostingsFormat.java:44)
> 	at org.elasticsearch.index.codec.postingsformat.PostingFormats.<clinit>(PostingFormats.java:67)
> 	at org.elasticsearch.index.codec.CodecModule.configurePostingsFormats(CodecModule.java:126)
> 	at org.elasticsearch.index.codec.CodecModule.configure(CodecModule.java:178)
> 	at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
> 	- locked <0x000000061fef49e8> (a org.elasticsearch.index.codec.CodecModule)
> 	at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204)
> 	at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85)
> 	at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
> 	at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
> 	- locked <0x000000061fef4c10> (a org.elasticsearch.common.inject.InheritingState)
> 	at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
> 	at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
> 	at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:296)
> 	- locked <0x000000061fef4cd0> (a org.elasticsearch.indices.InternalIndicesService)
> 	at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:312)
> 	at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:181)
> 	- locked <0x000000061fef4e70> (a java.lang.Object)
> 	at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
> 	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> {noformat}
> "elasticsearch[search77-es2][generic][T#1]" #80 daemon prio=5 os_prio=0 tid=0x00007f794400a000 nid=0xd4b in Object.wait() [0x00007f79dac56000]
>    java.lang.Thread.State: RUNNABLE
> 	at org.apache.lucene.codecs.simpletext.SimpleTextCodec.<init>(SimpleTextCodec.java:37)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> 	at java.lang.Class.newInstance(Class.java:433)
> 	at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67)
> 	- locked <0x000000061fcf1f50> (a org.apache.lucene.util.NamedSPILoader)
> 	at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:47)
> 	at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:37)
> 	at org.apache.lucene.codecs.Codec.<clinit>(Codec.java:41)
> 	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359)
> 	at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457)
> 	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:912)
> 	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:758)
> 	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453)
> 	at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:98)
> 	at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:126)
> 	at org.elasticsearch.index.store.Store.access$300(Store.java:76)
> 	at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:465)
> 	at org.elasticsearch.index.store.Store$MetadataSnapshot.<init>(Store.java:456)
> 	at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281)
> 	at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:186)
> 	at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:140)
> 	at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:61)
> 	at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277)
> 	at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268)
> 	at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Full thread dump: https://gist.github.com/shikhar/d0f6d2d008f45d2d4f91



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org