You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Arnon Yogev <ar...@gmail.com> on 2015/09/01 08:53:18 UTC
Solr cloud hangs, log4j contention issue observed
We have a Solr cloud (4.7) consisting of 5 servers.
At some point we noticed that one of the servers had a very high CPU and
was not responding. A few minutes later, the other 4 servers were
responding very slowly. A restart was required.
Looking at the Solr logs, we mainly saw symptoms, i.e. errors that happened
a few minutes after the high CPU started (connection timeouts etc).
When looking at the javacore of the problematic server, we found that one
thread was waiting on a log4j method, and 538 threads (!) were waiting on
the same lock.
The thread's stack trace is:
3XMTHREADINFO "http-bio-8443-exec-37460"
J9VMThread:0x00007FED88044600, j9thread_t:0x00007FE73E4D04A0,
java/lang/Thread:0x00007FF267995468, state:CW, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0xA1AC9, isDaemon:true)
3XMTHREADINFO1 (native thread ID:0x17F8, native priority:0x5,
native policy:UNKNOWN)
3XMTHREADINFO2 (native stack address range
from:0x00007FEA9487B000, to:0x00007FEA948BC000, size:0x41000)
3XMCPUTIME CPU usage total: 55.216798962 secs
3XMHEAPALLOC Heap bytes allocated since last GC cycle=3176200
(0x307708)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at
org/apache/log4j/Category.callAppenders(Category.java:204)
4XESTACKTRACE at
org/apache/log4j/Category.forcedLog(Category.java:391(Compiled Code))
4XESTACKTRACE at
org/apache/log4j/Category.log(Category.java:856(Compiled Code))
4XESTACKTRACE at
org/slf4j/impl/Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:498)
4XESTACKTRACE at
org/apache/solr/common/SolrException.log(SolrException.java:109)
4XESTACKTRACE at
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:153(Compiled
Code))
4XESTACKTRACE at
org/apache/solr/core/SolrCore.execute(SolrCore.java:1916(Compiled Code))
4XESTACKTRACE at
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:780(Compiled
Code))
4XESTACKTRACE at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427(Compiled
Code))
4XESTACKTRACE at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217(Compiled
...
Our logging is done to a local file.
After searching the web, we found similar problems:
https://bz.apache.org/bugzilla/show_bug.cgi?id=50213
https://bz.apache.org/bugzilla/show_bug.cgi?id=51047
https://dzone.com/articles/log4j-thread-deadlock-case
However, seems like the fixes were made for log4j 2.X. And Solr uses log4j
1.2.X (even the new Solr 5.3.0, from what I've seen).
Is this a known problem?
Is it possible to upgrade Solr log4j version to 2.X?
Thanks,
Arnon
Re: Solr cloud hangs, log4j contention issue observed
Posted by Arnon Yogev <AR...@il.ibm.com>.
Thank you Shawn,
We are indeed using Tomcat, maxThreads was set to 2000 (Normally seen <600
active threads under load).
I attached the complete stack trace of http-bio-8443-exec-37460 below.
The thread is marked as "Waiting on Condition", and does not mention any
lock it's waiting for.
Looking at the code of Category.callAppenders, the thread hangs in line
204.
198 public
199 void callAppenders(LoggingEvent event) {
200 int writes = 0;
201
202 for(Category c = this; c != null; c=c.parent) {
203 // Protected against simultaneous call to addAppender,
removeAppender,...
204 synchronized(c) {
205 if(c.aai != null) {
206 writes += c.aai.appendLoopOnAppenders(event);
207 }
208 if(!c.additive) {
209 break;
210 }
211 }
212 }
213
214 if(writes == 0) {
215 repository.emitNoAppenderWarning(this);
216 }
217 }
Of course, I understand this is Solr's mailing list and not log4j's. So
wanted to know if this problem happens to be familiar in Solr, and whether
a fix or some solution exists.
Thanks,
Arnon
3XMTHREADINFO "http-bio-8443-exec-37460"
J9VMThread:0x00007FED88044600, j9thread_t:0x00007FE73E4D04A0,
java/lang/Thread:0x00007FF267995468, state:CW, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0xA1AC9, isDaemon:true)
3XMTHREADINFO1 (native thread ID:0x17F8, native priority:0x5,
native policy:UNKNOWN)
3XMTHREADINFO2 (native stack address range
from:0x00007FEA9487B000, to:0x00007FEA948BC000, size:0x41000)
3XMCPUTIME CPU usage total: 55.216798962 secs
3XMHEAPALLOC Heap bytes allocated since last GC cycle=3176200
(0x307708)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at
org/apache/log4j/Category.callAppenders(Category.java:204)
4XESTACKTRACE at
org/apache/log4j/Category.forcedLog(Category.java:391(Compiled Code))
4XESTACKTRACE at
org/apache/log4j/Category.log(Category.java:856(Compiled Code))
4XESTACKTRACE at
org/slf4j/impl/Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:498)
4XESTACKTRACE at
org/apache/solr/common/SolrException.log(SolrException.java:109)
4XESTACKTRACE at
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:153(Compiled
Code))
4XESTACKTRACE at
org/apache/solr/core/SolrCore.execute(SolrCore.java:1916(Compiled Code))
4XESTACKTRACE at
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:780(Compiled
Code))
4XESTACKTRACE at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427(Compiled
Code))
4XESTACKTRACE at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217(Compiled
Code))
4XESTACKTRACE at
org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241(Compiled
Code))
4XESTACKTRACE at
org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208(Compiled
Code))
4XESTACKTRACE at
org/apache/catalina/core/StandardWrapperValve.invoke(StandardWrapperValve.java:220(Compiled
Code))
4XESTACKTRACE at
org/apache/catalina/core/StandardContextValve.invoke(StandardContextValve.java:122(Compiled
Code))
4XESTACKTRACE at
org/apache/catalina/core/StandardHostValve.invoke(StandardHostValve.java:171(Compiled
Code))
4XESTACKTRACE at
org/apache/catalina/valves/ErrorReportValve.invoke(ErrorReportValve.java:102(Compiled
Code))
4XESTACKTRACE at
org/apache/catalina/valves/AccessLogValve.invoke(AccessLogValve.java:950(Compiled
Code))
4XESTACKTRACE at
org/apache/catalina/core/StandardEngineValve.invoke(StandardEngineValve.java:116(Compiled
Code))
4XESTACKTRACE at
org/apache/catalina/connector/CoyoteAdapter.service(CoyoteAdapter.java:408(Compiled
Code))
4XESTACKTRACE at
org/apache/coyote/http11/AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040(Compiled
Code))
4XESTACKTRACE at
org/apache/coyote/AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607(Compiled
Code))
4XESTACKTRACE at
org/apache/tomcat/util/net/JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314(Compiled
Code))
5XESTACKTRACE (entered lock:
org/apache/tomcat/util/net/SocketWrapper@0x00007FF253C18FA8, entry count:
1)
4XESTACKTRACE at
java/util/concurrent/ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156(Compiled
Code))
4XESTACKTRACE at
java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626(Compiled
Code))
4XESTACKTRACE at
org/apache/tomcat/util/threads/TaskThread$WrappingRunnable.run(TaskThread.java:61(Compiled
Code))
4XESTACKTRACE at
java/lang/Thread.run(Thread.java:804(Compiled Code))
3XMTHREADINFO3 No native callstack available for this thread
From: Shawn Heisey <ap...@elyograg.org>
To: solr-user@lucene.apache.org
Date: 01/09/2015 06:09 PM
Subject: Re: Solr cloud hangs, log4j contention issue observed
On 9/1/2015 12:53 AM, Arnon Yogev wrote:
> We have a Solr cloud (4.7) consisting of 5 servers.
> At some point we noticed that one of the servers had a very high CPU and
> was not responding. A few minutes later, the other 4 servers were
> responding very slowly. A restart was required.
> Looking at the Solr logs, we mainly saw symptoms, i.e. errors that
happened
> a few minutes after the high CPU started (connection timeouts etc).
>
> When looking at the javacore of the problematic server, we found that
one
> thread was waiting on a log4j method, and 538 threads (!) were waiting
on
> the same lock.
> The thread's stack trace is:
<snip>
> Our logging is done to a local file.
> After searching the web, we found similar problems:
> https://bz.apache.org/bugzilla/show_bug.cgi?id=50213
> https://bz.apache.org/bugzilla/show_bug.cgi?id=51047
> https://dzone.com/articles/log4j-thread-deadlock-case
>
> However, seems like the fixes were made for log4j 2.X. And Solr uses
log4j
> 1.2.X (even the new Solr 5.3.0, from what I've seen).
>
> Is this a known problem?
> Is it possible to upgrade Solr log4j version to 2.X?
We have an issue to upgrde log4j. I know because I'm the one that
opened it. I haven't had any time to work on it, and until I can
actually research it, I am fairly clueless about how to proceed.
https://issues.apache.org/jira/browse/SOLR-7887
What container are you running in? The stacktrace was not complete
enough for me to figure that out myself. What is that container's
maxThreads setting? The thread name including "http-bio-8443" makes me
thing it's probably Tomcat, not the jetty included in the example found
in the download, which makes the maxThreads parameter particularly
relevant.
I do not see any mention of locks in the information that you included,
either held or waiting. If a lot of threads are waiting on a single
lock, then you should be able to find which thread is holding that lock
... and I don't think it will be the thread that you mentioned.
Thanks,
Shawn
Re: Solr cloud hangs, log4j contention issue observed
Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/1/2015 12:53 AM, Arnon Yogev wrote:
> We have a Solr cloud (4.7) consisting of 5 servers.
> At some point we noticed that one of the servers had a very high CPU and
> was not responding. A few minutes later, the other 4 servers were
> responding very slowly. A restart was required.
> Looking at the Solr logs, we mainly saw symptoms, i.e. errors that happened
> a few minutes after the high CPU started (connection timeouts etc).
>
> When looking at the javacore of the problematic server, we found that one
> thread was waiting on a log4j method, and 538 threads (!) were waiting on
> the same lock.
> The thread's stack trace is:
<snip>
> Our logging is done to a local file.
> After searching the web, we found similar problems:
> https://bz.apache.org/bugzilla/show_bug.cgi?id=50213
> https://bz.apache.org/bugzilla/show_bug.cgi?id=51047
> https://dzone.com/articles/log4j-thread-deadlock-case
>
> However, seems like the fixes were made for log4j 2.X. And Solr uses log4j
> 1.2.X (even the new Solr 5.3.0, from what I've seen).
>
> Is this a known problem?
> Is it possible to upgrade Solr log4j version to 2.X?
We have an issue to upgrde log4j. I know because I'm the one that
opened it. I haven't had any time to work on it, and until I can
actually research it, I am fairly clueless about how to proceed.
https://issues.apache.org/jira/browse/SOLR-7887
What container are you running in? The stacktrace was not complete
enough for me to figure that out myself. What is that container's
maxThreads setting? The thread name including "http-bio-8443" makes me
thing it's probably Tomcat, not the jetty included in the example found
in the download, which makes the maxThreads parameter particularly relevant.
I do not see any mention of locks in the information that you included,
either held or waiting. If a lot of threads are waiting on a single
lock, then you should be able to find which thread is holding that lock
... and I don't think it will be the thread that you mentioned.
Thanks,
Shawn