You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@thrift.apache.org by "Qiao Mu (JIRA)" <ji...@apache.org> on 2014/12/02 11:57:12 UTC

[jira] [Commented] (THRIFT-2789) TNonblockingServer leaks socket FD's under load

    [ https://issues.apache.org/jira/browse/THRIFT-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231305#comment-14231305 ] 

Qiao Mu commented on THRIFT-2789:
---------------------------------

Finally I reproduced the issue and fixed it. Sergey was right about the PIPE, although the patch contains some irrelevant code. I've uploaded a cleaner version.

The root cause is TNonblockingServer::TConnection::Task::run() throws an TException when notifyIOThread returns false. Then ThreadManager just simply ignores the exception (the comment in ThreadManager says "XXX need to log this" but it never does). So from the user view, we don't see any error except a never-return connection and have to use timeout to work around.

When there's high load for IOThread, it's common to see notifyIOThread fails. More specifically, the send method inside TNonblockingIOThread::notify returns -1 and errno is set to EAGAIN.

The patch close the connection in such case as Sergey did. It's simple and enough for me. I also tried with select and a short  timeout, it did not work well. 

This bug exists for a very long time and it's still not fixed yet. Could anybody please look into this issue?

> TNonblockingServer leaks socket FD's under load
> -----------------------------------------------
>
>                 Key: THRIFT-2789
>                 URL: https://issues.apache.org/jira/browse/THRIFT-2789
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Library
>            Reporter: Sergey
>         Attachments: 0001-Close-connection-when-failed-to-notify-IO-thread.patch, D10015.diff
>
>
> I checked 0.9.2 and 1.0, but code didn't seem to change in 1.2 either.
> Problem is that network threads and worker threads use non-blocking socket (pipe) to communicate. Under heavy load writes to that pipe might fail with EAGAIN. While 'notifyIOThread' method carefully checks for the error and communicates the result via return value, not all callers check result of 'notify'.
> Generally it's hard to tell what appropriate handling of such a failure would be, but it's clear sockets shouldn't leak. Please use attached patch for the reference, but I do not insist what I did there is the best way to fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)