You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Uwe Schindler (Jira)" <ji...@apache.org> on 2023/01/05 18:21:00 UTC

[jira] [Comment Edited] (SOLR-15254) Random deadlock during cores initialization

    [ https://issues.apache.org/jira/browse/SOLR-15254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655085#comment-17655085 ] 

Uwe Schindler edited comment on SOLR-15254 at 1/5/23 6:20 PM:
--------------------------------------------------------------

At least this was the last deadlock regarding to SPI, all others looking similar were fixed before (including one with TermsEnum). In generel the problem is when a superclass in its static initializers somehow indriectly initializes subclasses. This can lead to a deadlock, because while the Superclass initializes itsself with clinit() method the subclasses load and those refer indirectly to their superclass. If they call something from there it breaks completely. The new code also detects this "call something before init was done".

Actually there are checks in static analysis, but neither forbiddenapis, ECJ compiler, errorprone have abilities to detect it. Interestingly the Netbeans IDE finds some of them, but not those that rely on reflection/dynamic loading (like SPI).


was (Author: thetaphi):
At least this was the last deadlock regarding to SPI, all others looking similar were fixed before (including one with TermsEnum). In generel the problem is when a superclass in its static initializers somehow indriectly initializes subclasses. This can lead to a deadlock.

> Random deadlock during cores initialization
> -------------------------------------------
>
>                 Key: SOLR-15254
>                 URL: https://issues.apache.org/jira/browse/SOLR-15254
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 8.5.1
>         Environment: openjdk 11.0.9.1
>            Reporter: Thomas Mortagne
>            Priority: Major
>
> At XWiki we are using Solr in an embbedded way with 5 cores, and sometimes we end up with what appears to be a race condition leading to a deadlock between several of the cores's configuration loading while loading some required classes.
> Here is an extract of a thread dump (also attached is a more complete version of the thread dump) we get in such a case:
> {noformat}
> "coreLoadExecutor-9-thread-1" #41 prio=5 os_prio=0 cpu=305.06ms elapsed=101.24s tid=0x00007fb5cfc8f800 nid=0x57a in Object.wait()  [0x00007fb5717ec000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.lang.Class.forName0(java.base@11.0.9.1/Native Method)
> 	at java.lang.Class.forName(java.base@11.0.9.1/Class.java:398)
> 	at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:554)
> 	at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:639)
> 	at org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:339)
> 	at org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:332)
> 	at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)
> 	at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:356)
> 	at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:90)
> 	at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:48)
> 	at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)
> 	at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
> 	at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:178)
> 	at org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:105)
> 	at org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:178)
> 	at org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:46)
> 	at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:145)
> 	at org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:83)
> 	at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1302)
> 	at org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:802)
> 	at org.apache.solr.core.CoreContainer$$Lambda$569/0x0000000840bc7040.call(Unknown Source)
> 	at org.glowroot.agent.plugin.executor.CallableWrapper.call(CallableWrapper.java:38)
> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202)
> 	at java.util.concurrent.FutureTask.run(java.base@11.0.9.1/FutureTask.java:264)
> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$438/0x0000000840b29440.run(Unknown Source)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.9.1/ThreadPoolExecutor.java:1128)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.9.1/ThreadPoolExecutor.java:628)
> 	at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834)
> "coreLoadExecutor-9-thread-3" #43 prio=5 os_prio=0 cpu=294.37ms elapsed=101.24s tid=0x00007fb5cfcdf800 nid=0x57c in Object.wait()  [0x00007fb5715ea000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.lang.Class.forName0(java.base@11.0.9.1/Native Method)
> 	at java.lang.Class.forName(java.base@11.0.9.1/Class.java:398)
> 	at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:554)
> 	at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:639)
> 	at org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:339)
> 	at org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:332)
> 	at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)
> 	at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:356)
> 	at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:90)
> 	at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:48)
> 	at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)
> 	at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
> 	at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:178)
> 	at org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:105)
> 	at org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:178)
> 	at org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:46)
> 	at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:145)
> 	at org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:83)
> 	at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1302)
> 	at org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:802)
> 	at org.apache.solr.core.CoreContainer$$Lambda$569/0x0000000840bc7040.call(Unknown Source)
> 	at org.glowroot.agent.plugin.executor.CallableWrapper.call(CallableWrapper.java:38)
> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202)
> 	at java.util.concurrent.FutureTask.run(java.base@11.0.9.1/FutureTask.java:264)
> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$438/0x0000000840b29440.run(Unknown Source)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.9.1/ThreadPoolExecutor.java:1128)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.9.1/ThreadPoolExecutor.java:628)
> 	at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834)
> "coreLoadExecutor-9-thread-2" #42 prio=5 os_prio=0 cpu=388.81ms elapsed=101.24s tid=0x00007fb5cfcde800 nid=0x57b in Object.wait()  [0x00007fb5716eb000]
>    java.lang.Thread.State: RUNNABLE
> 	at jdk.internal.misc.Unsafe.ensureClassInitialized0(java.base@11.0.9.1/Native Method)
> 	at jdk.internal.misc.Unsafe.ensureClassInitialized(java.base@11.0.9.1/Unsafe.java:1042)
> 	at jdk.internal.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(java.base@11.0.9.1/UnsafeFieldAccessorFactory.java:43)
> 	at jdk.internal.reflect.ReflectionFactory.newFieldAccessor(java.base@11.0.9.1/ReflectionFactory.java:186)
> 	at java.lang.reflect.Field.acquireFieldAccessor(java.base@11.0.9.1/Field.java:1105)
> 	at java.lang.reflect.Field.getFieldAccessor(java.base@11.0.9.1/Field.java:1086)
> 	at java.lang.reflect.Field.get(java.base@11.0.9.1/Field.java:418)
> 	at org.apache.lucene.analysis.util.AbstractAnalysisFactory.lookupSPIName(AbstractAnalysisFactory.java:358)
> 	at org.apache.lucene.analysis.util.AnalysisSPILoader.reload(AnalysisSPILoader.java:90)
> 	- locked <0x00000000e98e20a0> (a org.apache.lucene.analysis.util.AnalysisSPILoader)
> 	at org.apache.lucene.analysis.util.AnalysisSPILoader.<init>(AnalysisSPILoader.java:66)
> 	at org.apache.lucene.analysis.util.AnalysisSPILoader.<init>(AnalysisSPILoader.java:52)
> 	at org.apache.lucene.analysis.util.TokenFilterFactory.<clinit>(TokenFilterFactory.java:33)
> 	at org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:256)
> 	at org.apache.solr.core.SolrResourceLoader.<init>(SolrResourceLoader.java:196)
> 	at org.apache.solr.core.ConfigSetService$Standalone.createCoreResourceLoader(ConfigSetService.java:207)
> 	at org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:66)
> 	at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1302)
> 	at org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:802)
> 	at org.apache.solr.core.CoreContainer$$Lambda$569/0x0000000840bc7040.call(Unknown Source)
> 	at org.glowroot.agent.plugin.executor.CallableWrapper.call(CallableWrapper.java:38)
> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202)
> 	at java.util.concurrent.FutureTask.run(java.base@11.0.9.1/FutureTask.java:264)
> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$438/0x0000000840b29440.run(Unknown Source)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.9.1/ThreadPoolExecutor.java:1128)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.9.1/ThreadPoolExecutor.java:628)
> 	at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834)
> "main" #1 prio=5 os_prio=0 cpu=25108.09ms elapsed=133.50s tid=0x00007fb5cc018800 nid=0x541 waiting on condition  [0x00007fb5d2f73000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at jdk.internal.misc.Unsafe.park(java.base@11.0.9.1/Native Method)
> 	- parking to wait for  <0x00000000ea480000> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.9.1/LockSupport.java:234)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.9.1/AbstractQueuedSynchronizer.java:2123)
> 	at java.util.concurrent.ThreadPoolExecutor.awaitTermination(java.base@11.0.9.1/ThreadPoolExecutor.java:1454)
> 	at com.codahale.metrics.InstrumentedExecutorService.awaitTermination(InstrumentedExecutorService.java:163)
> 	at org.apache.solr.common.util.ExecutorUtil.awaitTermination(ExecutorUtil.java:85)
> 	at org.apache.solr.common.util.ExecutorUtil.shutdownAndAwaitTermination(ExecutorUtil.java:77)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:843)
> "h2sc-4-thread-1" #38 prio=5 os_prio=0 cpu=0.20ms elapsed=101.89s tid=0x00007fb5cfc16800 nid=0x577 runnable  [0x00007fb571cf0000]
>    java.lang.Thread.State: RUNNABLE
> 	at sun.nio.ch.EPoll.wait(java.base@11.0.9.1/Native Method)
> 	at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.9.1/EPollSelectorImpl.java:120)
> 	at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.9.1/SelectorImpl.java:124)
> 	- locked <0x00000000e9819cf0> (a sun.nio.ch.Util$2)
> 	- locked <0x00000000e9819c98> (a sun.nio.ch.EPollSelectorImpl)
> 	at sun.nio.ch.SelectorImpl.select(java.base@11.0.9.1/SelectorImpl.java:141)
> 	at org.eclipse.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:149)
> 	at org.eclipse.jetty.io.ManagedSelector.select(ManagedSelector.java:156)
> 	at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:572)
> 	at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:509)
> 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:360)
> 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:184)
> 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
> 	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
> 	at org.eclipse.jetty.io.ManagedSelector$$Lambda$437/0x0000000840b28c40.run(Unknown Source)
> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$438/0x0000000840b29440.run(Unknown Source)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.9.1/ThreadPoolExecutor.java:1128
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.9.1/ThreadPoolExecutor.java:628)
> 	at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834)
> {noformat}
> Notice the coreLoadExecutor-9-thread-* thread which despite their "RUNNABLE" state are actually internally blocked waiting on some unknown object. The main thread is stuck because it's waiting for the cores's initialization to finish.
> I included "h2sc-4-thread-1" to have all the threads doing stuff but it does not seems to have much to do with the deadlock (other than being a Solr thing too).
> It's not very easy to debug since we cannot always reproduce but maybe you guys have a clue for where this issue might be comming from based on those informations.
> When we hit this problem we can "workaround" it by restarting a few times until it pass the Solr init.
> We never had this issue back when we only had 1 or 2 cores but having 5 of those is very recent for us so I cannot really tell if it's some recent Solr regression or an old bug we did not had before because we did not had enough cores to trigger it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org