You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Mihir Patel <ex...@gmail.com> on 2016/04/26 07:07:08 UTC

Tomcat unresponsive, and http acceptor thread disappearing

Hello,

I would like to share my learnings on a recent issue I came across where
tomcat became unresponsive under load, and couldn't recover without
restart. More importantly, it was surprising to find tomcat acceptor thread
(http-bio-8080-acceptor-0) disappearing resulting into the unresponsive
state of the tomcat/app.

*Env:* Tomcat 7.0.29 on Java 1.7.0_51 on CentOS 6.3, with BIO (JioEndpoint)
connector

*Issue:* Came across an issue under heavy load (generated through Gatling)
where during the load test tomcat/application becomes unresponsive, and
doesn't recover even after load test. As a side effect, I see lots of TCP
sockets in CLOSE_WAIT state with non-empty Recv-Q.

*Observation:* I am able to telnet to the port, even curl works up to the
point of "sending request", but then hangs while receiving response from
server and eventually fails with "curl: (56) Recv failure: Connection reset
by peer". Gatling requests are failing with request timed out errors
consistently. It doesn't take more than 10 seconds to reproduce the error.
Application is all normal from threads and memory perspective while it's
unresponsive.

Luckily I was able to successfully reproduce the issue. So took heap dump
and thread dump before and after multiple load tests. There were no errors
in catalina.out or application logs. Suddenly, noticed that "after" we see
application in unresponsive state, thread "http-bio-8080-Acceptor" is not
in thread dump, it disappeared. So, thought that if thread is exiting it's
run method, then it would end and we won't see it in thread dump. Went back
and looked at source code of tomcat 7.0.29, remote debugged with tomcat in
following method in
*org/apache/tomcat/tomcat-coyote/7.0.29/tomcat-coyote-7.0.29.jar!/org/apache/tomcat/util/net/JIoEndpoint.class*
:

// --------------------------------------------------- Acceptor Inner Class
   /**
    * The background thread that listens for incoming TCP/IP connections and
    * hands them off to an appropriate processor.
    */
   protected class Acceptor extends AbstractEndpoint.Acceptor {

       @Override
       public void run() {

           int errorDelay = 0;

           // Loop until we receive a shutdown command
           while (running) {

               // Loop if endpoint is paused
               while (paused && running) {
                   state = AcceptorState.PAUSED;
                   try {
                       Thread.sleep(50);
                   } catch (InterruptedException e) {
                       // Ignore
                   }
               }

               if (!running) {
                   break;
               }
               state = AcceptorState.RUNNING;

               try {
                   //if we have reached max connections, wait
                   countUpOrAwaitConnection();

                   Socket socket = null;
                   try {
                       // Accept the next incoming connection from the
server
                       // socket
                       socket =
serverSocketFactory.acceptSocket(serverSocket);
                   } catch (IOException ioe) {
                       countDownConnection();
                       // Introduce delay if necessary
                       errorDelay = handleExceptionWithDelay(errorDelay);
                       // re-throw
                       throw ioe;
                   }
                   // Successful accept, reset the error delay
                   errorDelay = 0;

                   // Configure the socket
                   if (running && !paused && setSocketOptions(socket)) {
                       // Hand this socket off to an appropriate processor
                       if (!processSocket(socket)) {
                           countDownConnection();
                           // Close socket right away
                           closeSocket(socket);
                       }
                   } else {
                       countDownConnection();
                       // Close socket right away
                       closeSocket(socket);
                   }
               } catch (IOException x) {
                   if (running) {
                       log.error(sm.getString("endpoint.accept.fail"), x);
                   }
               } catch (NullPointerException npe) {
                   if (running) {
                       log.error(sm.getString("endpoint.accept.fail"), npe);
                   }
               } catch (Throwable t) {
                   ExceptionUtils.handleThrowable(t);
                   *log.error(sm.getString("endpoint.accept.fail"), t);*
               }
           }
           state = AcceptorState.ENDED;
       }
   }

Looking at the code above where tomcat is accepting socket connections, put
debug points within each catch block during load test (with remote debug),
and got the execution stopped in last catch (Throwable t) block. The
ExecptionUtils.handleThrowable checks for type of throwable and converts
them to appropriate type and re-throws. At this point, we do not have any
more try/catch nor do we have the log statements (one after the
ExceptionUtils.handleThrowable) will be skipped as the handleThrowable
method itself is re-throwing, causing the thread to exit out of it's run
method and end.

I saw (googled) only couple instances of such issue reported, so thought to
provide details here for future reference.

*Root cause: *Tomcat was getting OutOfMemoryError and going to last catch
block. Specific error was *"OutOfMemoryError: Unable to Create New Native
Thread*". Tomcat had maxConnections (and max executor thread size) set to
1000, and the underlying linux user had set nproc limit to 1024 (file
descriptor limit was set to 65k though). Given that each thread in jvm gets
a LWP in OS, the user was running out of the set nproc limit causing the
OOM with "Unable to create new native thread" error. The *processSocket* call
in acceptor is where the socket is assigned to a thread in executor thread
pool, so if there isn't a free worker thread available, it will attempt to
create a new thread, while OS denies creating new LWP resulting into the
error.

*Solution:* Once I found the issue with lower ulimits set for number of
processes for users, I ended up solving this by increasing nproc limit in
/etc/security/limits.d/90-nproc.conf (which is where user specific nproc
ulimits were applied in our case)

*Ask/Comment:* First of all, It was disappointing to not find any
errors/log (unless my log config was messed up, but I did enable
org.apache.tomcat.util to FINEST level, in tomcat/conf/logging.properties,
and even saw LimitLatch printing countDown and countUp entries in
catalina.out) for this scenario. So, at the very least, would it make sense
to swap ExceptionUtils.handleThrowable and log.error in last catch block so
that we do see at least error reported? Additionally, it doesn't make sense
to "end" the main acceptor thread, so we should report the error in logs
but continue accepting new threads.

I did check JioEndpoint.Acceptor code in Tomcat 8.0.24
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.tomcat/tomcat-coyote/8.0.24/org/apache/tomcat/util/net/JIoEndpoint.java>
version
as well, and see same code which would make this reproducible in Tomcat 8
as well.

Appreciate feedback on the expectations/ask above.

Thanks,
Mihir Patel

Re: Tomcat unresponsive, and http acceptor thread disappearing

Posted by Mark Thomas <ma...@apache.org>.
On 26/04/2016 06:07, Mihir Patel wrote:

<snip/>

> *Ask/Comment:* First of all, It was disappointing to not find any
> errors/log

Any exceptions thrown at that point should be logged to standard out.
Depending on how bad the error condition is, the JVM might not be able
to manage that but such a situation would be pretty unusual.

> So, at the very least, would it make sense
> to swap ExceptionUtils.handleThrowable and log.error in last catch block so
> that we do see at least error reported?

No. The errors being discussed are typically fatal to the JVM. In many
cases attempting to use the logging system would simply trigger more
errors and possibly mask the original error.

> Additionally, it doesn't make sense
> to "end" the main acceptor thread, so we should report the error in logs
> but continue accepting new threads.

See above.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org