You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Arnon Yogev <ar...@gmail.com> on 2015/09/01 08:53:18 UTC

Solr cloud hangs, log4j contention issue observed

We have a Solr cloud (4.7) consisting of 5 servers.
At some point we noticed that one of the servers had a very high CPU and
was not responding. A few minutes later, the other 4 servers were
responding very slowly. A restart was required.
Looking at the Solr logs, we mainly saw symptoms, i.e. errors that happened
a few minutes after the high CPU started (connection timeouts etc).

When looking at the javacore of the problematic server, we found that one
thread was waiting on a log4j method, and 538 threads (!) were waiting on
the same lock.
The thread's stack trace is:

3XMTHREADINFO      "http-bio-8443-exec-37460"
J9VMThread:0x00007FED88044600, j9thread_t:0x00007FE73E4D04A0,
java/lang/Thread:0x00007FF267995468, state:CW, prio=5

3XMJAVALTHREAD            (java/lang/Thread getId:0xA1AC9, isDaemon:true)

3XMTHREADINFO1            (native thread ID:0x17F8, native priority:0x5,
native policy:UNKNOWN)

3XMTHREADINFO2            (native stack address range
from:0x00007FEA9487B000, to:0x00007FEA948BC000, size:0x41000)

3XMCPUTIME               CPU usage total: 55.216798962 secs

3XMHEAPALLOC             Heap bytes allocated since last GC cycle=3176200
(0x307708)

3XMTHREADINFO3           Java callstack:

4XESTACKTRACE                at
org/apache/log4j/Category.callAppenders(Category.java:204)

4XESTACKTRACE                at
org/apache/log4j/Category.forcedLog(Category.java:391(Compiled Code))

4XESTACKTRACE                at
org/apache/log4j/Category.log(Category.java:856(Compiled Code))

4XESTACKTRACE                at
org/slf4j/impl/Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:498)

4XESTACKTRACE                at
org/apache/solr/common/SolrException.log(SolrException.java:109)

4XESTACKTRACE                at
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:153(Compiled
Code))

4XESTACKTRACE                at
org/apache/solr/core/SolrCore.execute(SolrCore.java:1916(Compiled Code))

4XESTACKTRACE                at
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:780(Compiled
Code))

4XESTACKTRACE                at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427(Compiled
Code))
4XESTACKTRACE                at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217(Compiled
...

Our logging is done to a local file.
After searching the web, we found similar problems:
https://bz.apache.org/bugzilla/show_bug.cgi?id=50213
https://bz.apache.org/bugzilla/show_bug.cgi?id=51047
https://dzone.com/articles/log4j-thread-deadlock-case

However, seems like the fixes were made for log4j 2.X. And Solr uses log4j
1.2.X (even the new Solr 5.3.0, from what I've seen).

Is this a known problem?
Is it possible to upgrade Solr log4j version to 2.X?

Thanks,
Arnon

Re: Solr cloud hangs, log4j contention issue observed

Posted by Arnon Yogev <AR...@il.ibm.com>.
Thank you Shawn,

We are indeed using Tomcat, maxThreads was set to 2000 (Normally seen <600 
active threads under load).

I attached the complete stack trace of http-bio-8443-exec-37460 below. 
The thread is marked as "Waiting on Condition", and does not mention any 
lock it's waiting for.
Looking at the code of Category.callAppenders, the thread hangs in line 
204.

198     public
199     void callAppenders(LoggingEvent event) {
200       int writes = 0;
201   
202       for(Category c = this; c != null; c=c.parent) {
203         // Protected against simultaneous call to addAppender, 
removeAppender,...
204         synchronized(c) {
205              if(c.aai != null) {
206                writes += c.aai.appendLoopOnAppenders(event);
207              }
208              if(!c.additive) {
209                break;
210              }
211         }
212       }
213   
214       if(writes == 0) {
215         repository.emitNoAppenderWarning(this);
216       }
217     }


Of course, I understand this is Solr's mailing list and not log4j's. So 
wanted to know if this problem happens to be familiar in Solr, and whether 
a fix or some solution exists.

Thanks,
Arnon


3XMTHREADINFO      "http-bio-8443-exec-37460" 
J9VMThread:0x00007FED88044600, j9thread_t:0x00007FE73E4D04A0, 
java/lang/Thread:0x00007FF267995468, state:CW, prio=5
3XMJAVALTHREAD            (java/lang/Thread getId:0xA1AC9, isDaemon:true)
3XMTHREADINFO1            (native thread ID:0x17F8, native priority:0x5, 
native policy:UNKNOWN)
3XMTHREADINFO2            (native stack address range 
from:0x00007FEA9487B000, to:0x00007FEA948BC000, size:0x41000)
3XMCPUTIME               CPU usage total: 55.216798962 secs
3XMHEAPALLOC             Heap bytes allocated since last GC cycle=3176200 
(0x307708)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at 
org/apache/log4j/Category.callAppenders(Category.java:204)
4XESTACKTRACE                at 
org/apache/log4j/Category.forcedLog(Category.java:391(Compiled Code))
4XESTACKTRACE                at 
org/apache/log4j/Category.log(Category.java:856(Compiled Code))
4XESTACKTRACE                at 
org/slf4j/impl/Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:498)
4XESTACKTRACE                at 
org/apache/solr/common/SolrException.log(SolrException.java:109)
4XESTACKTRACE                at 
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:153(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/solr/core/SolrCore.execute(SolrCore.java:1916(Compiled Code))
4XESTACKTRACE                at 
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:780(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/catalina/core/StandardWrapperValve.invoke(StandardWrapperValve.java:220(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/catalina/core/StandardContextValve.invoke(StandardContextValve.java:122(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/catalina/core/StandardHostValve.invoke(StandardHostValve.java:171(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/catalina/valves/ErrorReportValve.invoke(ErrorReportValve.java:102(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/catalina/valves/AccessLogValve.invoke(AccessLogValve.java:950(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/catalina/core/StandardEngineValve.invoke(StandardEngineValve.java:116(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/catalina/connector/CoyoteAdapter.service(CoyoteAdapter.java:408(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/coyote/http11/AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/coyote/AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/tomcat/util/net/JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314(Compiled 
Code))
5XESTACKTRACE                   (entered lock: 
org/apache/tomcat/util/net/SocketWrapper@0x00007FF253C18FA8, entry count: 
1)
4XESTACKTRACE                at 
java/util/concurrent/ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156(Compiled 
Code))
4XESTACKTRACE                at 
java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626(Compiled 
Code))
4XESTACKTRACE                at 
org/apache/tomcat/util/threads/TaskThread$WrappingRunnable.run(TaskThread.java:61(Compiled 
Code))
4XESTACKTRACE                at 
java/lang/Thread.run(Thread.java:804(Compiled Code))
3XMTHREADINFO3           No native callstack available for this thread




From:   Shawn Heisey <ap...@elyograg.org>
To:     solr-user@lucene.apache.org
Date:   01/09/2015 06:09 PM
Subject:        Re: Solr cloud hangs, log4j contention issue observed



On 9/1/2015 12:53 AM, Arnon Yogev wrote:
> We have a Solr cloud (4.7) consisting of 5 servers.
> At some point we noticed that one of the servers had a very high CPU and
> was not responding. A few minutes later, the other 4 servers were
> responding very slowly. A restart was required.
> Looking at the Solr logs, we mainly saw symptoms, i.e. errors that 
happened
> a few minutes after the high CPU started (connection timeouts etc).
>
> When looking at the javacore of the problematic server, we found that 
one
> thread was waiting on a log4j method, and 538 threads (!) were waiting 
on
> the same lock.
> The thread's stack trace is:

<snip>

> Our logging is done to a local file.
> After searching the web, we found similar problems:
> https://bz.apache.org/bugzilla/show_bug.cgi?id=50213
> https://bz.apache.org/bugzilla/show_bug.cgi?id=51047
> https://dzone.com/articles/log4j-thread-deadlock-case
>
> However, seems like the fixes were made for log4j 2.X. And Solr uses 
log4j
> 1.2.X (even the new Solr 5.3.0, from what I've seen).
>
> Is this a known problem?
> Is it possible to upgrade Solr log4j version to 2.X?

We have an issue to upgrde log4j.  I know because I'm the one that
opened it.  I haven't had any time to work on it, and until I can
actually research it, I am fairly clueless about how to proceed.

https://issues.apache.org/jira/browse/SOLR-7887

What container are you running in?  The stacktrace was not complete
enough for me to figure that out myself.  What is that container's
maxThreads setting?  The thread name including "http-bio-8443" makes me
thing it's probably Tomcat, not the jetty included in the example found
in the download, which makes the maxThreads parameter particularly 
relevant.

I do not see any mention of locks in the information that you included,
either held or waiting.  If a lot of threads are waiting on a single
lock, then you should be able to find which thread is holding that lock
... and I don't think it will be the thread that you mentioned.

Thanks,
Shawn



Re: Solr cloud hangs, log4j contention issue observed

Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/1/2015 12:53 AM, Arnon Yogev wrote:
> We have a Solr cloud (4.7) consisting of 5 servers.
> At some point we noticed that one of the servers had a very high CPU and
> was not responding. A few minutes later, the other 4 servers were
> responding very slowly. A restart was required.
> Looking at the Solr logs, we mainly saw symptoms, i.e. errors that happened
> a few minutes after the high CPU started (connection timeouts etc).
>
> When looking at the javacore of the problematic server, we found that one
> thread was waiting on a log4j method, and 538 threads (!) were waiting on
> the same lock.
> The thread's stack trace is:

<snip>

> Our logging is done to a local file.
> After searching the web, we found similar problems:
> https://bz.apache.org/bugzilla/show_bug.cgi?id=50213
> https://bz.apache.org/bugzilla/show_bug.cgi?id=51047
> https://dzone.com/articles/log4j-thread-deadlock-case
>
> However, seems like the fixes were made for log4j 2.X. And Solr uses log4j
> 1.2.X (even the new Solr 5.3.0, from what I've seen).
>
> Is this a known problem?
> Is it possible to upgrade Solr log4j version to 2.X?

We have an issue to upgrde log4j.  I know because I'm the one that
opened it.  I haven't had any time to work on it, and until I can
actually research it, I am fairly clueless about how to proceed.

https://issues.apache.org/jira/browse/SOLR-7887

What container are you running in?  The stacktrace was not complete
enough for me to figure that out myself.  What is that container's
maxThreads setting?  The thread name including "http-bio-8443" makes me
thing it's probably Tomcat, not the jetty included in the example found
in the download, which makes the maxThreads parameter particularly relevant.

I do not see any mention of locks in the information that you included,
either held or waiting.  If a lot of threads are waiting on a single
lock, then you should be able to find which thread is holding that lock
... and I don't think it will be the thread that you mentioned.

Thanks,
Shawn