You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Mikhail Bautin (JIRA)" <ji...@apache.org> on 2012/07/20 22:27:36 UTC

[jira] [Created] (THRIFT-1653) TThreadedSelectorServer leaks CLOSE_WAIT sockets

Mikhail Bautin created THRIFT-1653:
--------------------------------------

             Summary: TThreadedSelectorServer leaks CLOSE_WAIT sockets 
                 Key: THRIFT-1653
                 URL: https://issues.apache.org/jira/browse/THRIFT-1653
             Project: Thrift
          Issue Type: Bug
            Reporter: Mikhail Bautin


We are using TThreadedSelectorServer in HBase regionserver. We are observing that under high load thousands of sockets in the CLOSE_WAIT state are not being cleaned up, leading to server crash. Is it possible that the sockets are not being closed on the server side, or the process of closing sockets closed by client is being starved on the server, because normal I/O takes priority?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (THRIFT-1653) TThreadedSelectorServer leaks CLOSE_WAIT sockets

Posted by "Mikhail Bautin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420927#comment-13420927 ] 

Mikhail Bautin commented on THRIFT-1653:
----------------------------------------

Dominik, Jake: thank you for your replies. We are using Linux with /proc/sys/net/ipv4/tcp_fin_timeout value of 5 seconds. Is that low enough based on your experience? If there is no bug in TThreadedSelectorServer (we could not find anything obvious by reading the code), then another thing we could try is to reconfigure the application to keep more connections open and create new connections less frequently as a result.

                
> TThreadedSelectorServer leaks CLOSE_WAIT sockets 
> -------------------------------------------------
>
>                 Key: THRIFT-1653
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1653
>             Project: Thrift
>          Issue Type: Bug
>            Reporter: Mikhail Bautin
>            Assignee: Jake Farrell
>             Fix For: 0.9
>
>
> We are using TThreadedSelectorServer in HBase regionserver. We are observing that under high load thousands of sockets in the CLOSE_WAIT state are not being cleaned up, leading to server crash. Is it possible that the sockets are not being closed on the server side, or the process of closing sockets closed by client is being starved on the server, because normal I/O takes priority?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (THRIFT-1653) TThreadedSelectorServer leaks CLOSE_WAIT sockets

Posted by "Dominik Psenner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421225#comment-13421225 ] 

Dominik Psenner commented on THRIFT-1653:
-----------------------------------------

This timeout exists for cases where the remote peer crashes and may keep sending data to the socket. It effectively prevents that a port is being reused too early because that may result into the problem that it receives data from the earlier session in case that the remote peer suddenly manages to send packets or packets were routed oddly and travelled longer than expected.

Therefore the decision of how long tcp_fin_timeout should be chosen depends mostly on the network infrastructure and should be lowered only to the worst case round trip time of a client. If your server resides in the USA and one client is in China, your worst case round trip time may be longer than 5 seconds. If all your clients are on a LAN, the worst case round trip time may be much shorter than a second.

Connection pooling, on the other hand, improves the situation whenever there are only a few clients that query a service very frequently because a singlethreaded client will use at most one connection. In case when there are many clients it effectively worses the situation because it will keep connections open and lock down the server with idle connections.
                
> TThreadedSelectorServer leaks CLOSE_WAIT sockets 
> -------------------------------------------------
>
>                 Key: THRIFT-1653
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1653
>             Project: Thrift
>          Issue Type: Bug
>            Reporter: Mikhail Bautin
>            Assignee: Jake Farrell
>             Fix For: 0.9
>
>
> We are using TThreadedSelectorServer in HBase regionserver. We are observing that under high load thousands of sockets in the CLOSE_WAIT state are not being cleaned up, leading to server crash. Is it possible that the sockets are not being closed on the server side, or the process of closing sockets closed by client is being starved on the server, because normal I/O takes priority?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (THRIFT-1653) TThreadedSelectorServer leaks CLOSE_WAIT sockets

Posted by "Dominik Psenner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420436#comment-13420436 ] 

Dominik Psenner edited comment on THRIFT-1653 at 7/23/12 5:15 AM:
------------------------------------------------------------------

Howdie

It's probably not a bug, but rather common policy. Windows|Linux keeps ports open for about a minute after they have been used. Which means that if you have thousands of short sessions that are opened/closed very frequently the connection is being disposed of in the process, but the operating system keeps the handle open. There is a registry entry that configures the time that a closed socket is kept open in TIME_WAIT until it is really closed so that it is reusable by someone else. See here (http://msdn.microsoft.com/en-us/library/aa560610%28v=bts.20%29.aspx) or here (http://www.speedguide.net/articles/linux-tweaking-121 the section about "net.ipv4.tcp_fin_timeout").

This timeout historically is a value of exactly one minute (60s), but it can be lowered to 15 seconds (as far as the operating system supports it).

Cheers
                
      was (Author: nachbarslumpi):
    Howdie

It's probably not a bug, but rather common policy. Windows|Linux keeps ports open for about a minute after they have been used. Which means that if you have thousands of short sessions that are opened/closed very frequently the connection is being disposed of in the process, but the operating system keeps the handle open. There is a registry entry that configures the time that a closed socket is kept open in TIME_WAIT until it is really closed so that it is reusable by someone else. See here (http://msdn.microsoft.com/en-us/library/aa560610%28v=bts.20%29.aspx) or here (http://www.speedguide.net/articles/linux-tweaking-121 the section about "net.ipv4.tcp_fin_timeout").

Cheers
                  
> TThreadedSelectorServer leaks CLOSE_WAIT sockets 
> -------------------------------------------------
>
>                 Key: THRIFT-1653
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1653
>             Project: Thrift
>          Issue Type: Bug
>            Reporter: Mikhail Bautin
>
> We are using TThreadedSelectorServer in HBase regionserver. We are observing that under high load thousands of sockets in the CLOSE_WAIT state are not being cleaned up, leading to server crash. Is it possible that the sockets are not being closed on the server side, or the process of closing sockets closed by client is being starved on the server, because normal I/O takes priority?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (THRIFT-1653) TThreadedSelectorServer leaks CLOSE_WAIT sockets

Posted by "Dominik Psenner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420436#comment-13420436 ] 

Dominik Psenner edited comment on THRIFT-1653 at 7/23/12 5:13 AM:
------------------------------------------------------------------

Howdie

It's probably not a bug, but rather common policy. Windows|Linux keeps ports open for about a minute after they have been used. Which means that if you have thousands of short sessions that are opened/closed very frequently the connection is being disposed of in the process, but the operating system keeps the handle open. There is a registry entry that configures the time that a closed socket is kept open in TIME_WAIT until it is really closed so that it is reusable by someone else. See here (http://msdn.microsoft.com/en-us/library/aa560610%28v=bts.20%29.aspx) or here (http://www.speedguide.net/articles/linux-tweaking-121 the section about "net.ipv4.tcp_fin_timeout").

Cheers
                
      was (Author: nachbarslumpi):
    Howdie

It's probably not a bug, but rather common policy. Windows|Linux keeps ports open for about a minute after they have been used for some reason that I did not understood yet. Which means that if you have thousands of short sessions that are opened/closed very frequently the connection is being disposed of in the process, but windows keeps the handle open. There is a registry entry that configures the time that a closed socket is kept open in TIME_WAIT until it is really closed so that it is reusable by someone else. See here (http://msdn.microsoft.com/en-us/library/aa560610%28v=bts.20%29.aspx) or here (http://www.speedguide.net/articles/linux-tweaking-121 the section about "net.ipv4.tcp_fin_timeout").

Cheers
                  
> TThreadedSelectorServer leaks CLOSE_WAIT sockets 
> -------------------------------------------------
>
>                 Key: THRIFT-1653
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1653
>             Project: Thrift
>          Issue Type: Bug
>            Reporter: Mikhail Bautin
>
> We are using TThreadedSelectorServer in HBase regionserver. We are observing that under high load thousands of sockets in the CLOSE_WAIT state are not being cleaned up, leading to server crash. Is it possible that the sockets are not being closed on the server side, or the process of closing sockets closed by client is being starved on the server, because normal I/O takes priority?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (THRIFT-1653) TThreadedSelectorServer leaks CLOSE_WAIT sockets

Posted by "Dominik Psenner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420436#comment-13420436 ] 

Dominik Psenner commented on THRIFT-1653:
-----------------------------------------

Howdie

It's probably not a bug, but rather common policy. Windows|Linux keeps ports open for about a minute after they have been used for some reason that I did not understood yet. Which means that if you have thousands of short sessions that are opened/closed very frequently the connection is being disposed of in the process, but windows keeps the handle open. There is a registry entry that configures the time that a closed socket is kept open in TIME_WAIT until it is really closed so that it is reusable by someone else. See here (http://msdn.microsoft.com/en-us/library/aa560610%28v=bts.20%29.aspx) or here (http://www.speedguide.net/articles/linux-tweaking-121 the section about "net.ipv4.tcp_fin_timeout").

Cheers
                
> TThreadedSelectorServer leaks CLOSE_WAIT sockets 
> -------------------------------------------------
>
>                 Key: THRIFT-1653
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1653
>             Project: Thrift
>          Issue Type: Bug
>            Reporter: Mikhail Bautin
>
> We are using TThreadedSelectorServer in HBase regionserver. We are observing that under high load thousands of sockets in the CLOSE_WAIT state are not being cleaned up, leading to server crash. Is it possible that the sockets are not being closed on the server side, or the process of closing sockets closed by client is being starved on the server, because normal I/O takes priority?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (THRIFT-1653) TThreadedSelectorServer leaks CLOSE_WAIT sockets

Posted by "Jake Farrell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jake Farrell closed THRIFT-1653.
--------------------------------

       Resolution: Invalid
    Fix Version/s: 0.9
         Assignee: Jake Farrell

This is a server config issue not a specific Thrift issue, we use a modified kernel with a low time_wait value to resolve this
                
> TThreadedSelectorServer leaks CLOSE_WAIT sockets 
> -------------------------------------------------
>
>                 Key: THRIFT-1653
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1653
>             Project: Thrift
>          Issue Type: Bug
>            Reporter: Mikhail Bautin
>            Assignee: Jake Farrell
>             Fix For: 0.9
>
>
> We are using TThreadedSelectorServer in HBase regionserver. We are observing that under high load thousands of sockets in the CLOSE_WAIT state are not being cleaned up, leading to server crash. Is it possible that the sockets are not being closed on the server side, or the process of closing sockets closed by client is being starved on the server, because normal I/O takes priority?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira