You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alan Conway (JIRA)" <ji...@apache.org> on 2018/03/14 17:30:00 UTC

[jira] [Comment Edited] (PROTON-1791) TCP sockets remain open in CLOSE_WAIT state

    [ https://issues.apache.org/jira/browse/PROTON-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398937#comment-16398937 ] 

Alan Conway edited comment on PROTON-1791 at 3/14/18 5:29 PM:
--------------------------------------------------------------

[~miha-plesko]Please try the fix at: https://github.com/alanconway/qpid-proton/tree/ruby-close-fd
It's a one-line fix and looks a lot like it might be your problem (proton was doing IO#close_read and close_write but never actually IO#close)

However I haven't reproduced the problem yet, so if that does not fix things for you I need some more info on how you produce the problem. Ideally if you can isolate some example code to demonstrate the problem, or if you can, show me the code you're using and the steps you take to see it.

I have tried using the proton examples/ruby programs to create many connections to a broker.rb and disconnect them, but lsof does not show file descriptors piling up. I also tried writing a test client that creates connections and closes them in a variety of ways, killing processes with open connections etc. Probably there is some specific tear-down sequence that is causing the problem for you that I have not hit on. Also your ruby and OS  versions may be relevant.

Thanks!
Alan. 


was (Author: aconway):
Please try the fix at: https://github.com/alanconway/qpid-proton/tree/ruby-close-fd
It's a one-line fix and looks a lot like it might be your problem (proton was doing IO#close_read and close_write but never actually IO#close)

However I haven't reproduced the problem yet, so if that does not fix things for you I need some more info on how you are produce the problem. Ideally if you can isolate some example code to demonstrate the problem, or if you can, show me the code you're using and the steps you take to see it.

I have tried using the proton examples/ruby programs to create many connections to a broker.rb and disconnect them, but lsof does not show file descriptors piling up. I also tried writing a test client that creates connections and closes them in a variety of ways, killing processes with open connections etc. Probably there is some specific tear-down sequence that is causing the problem for you that I have not hit on. Also your ruby and OS  versions may be relevant.

Thanks!
Alan. 

> TCP sockets remain open in CLOSE_WAIT state
> -------------------------------------------
>
>                 Key: PROTON-1791
>                 URL: https://issues.apache.org/jira/browse/PROTON-1791
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: ruby-binding
>    Affects Versions: proton-c-0.21.0
>         Environment: Confirmed on Ubuntu 16.04 and RHEL 7.4
> Confirmed on qpid_proton 0.19.0 and 0.21.0
>            Reporter: Miha Plesko
>            Assignee: Alan Conway
>            Priority: Major
>              Labels: bug
>             Fix For: proton-c-0.22.0
>
>
> Hi guys,
> thanks for developing the awesome qpid_proton ruby gem, we're using it on daily basis!
> However, recently we noticed following error in our server log:
> Too many open files - socket(2) for "172.16.117.189" port 5672
> After some research it turns out that qpid_proton process is having increasingly
> more and more following file descriptors open:
> $ lsof -ap 108533
> ruby    108533 miha  116u  IPv4             562438      0t0      TCP 172.16.117.189:53626->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  197u  IPv4             561644      0t0      TCP 172.16.117.189:53630->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  311u  IPv4             560657      0t0      TCP 172.16.117.189:53634->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  549u  IPv4             565342      0t0      TCP 172.16.117.189:53642->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  576u  IPv4             565122      0t0      TCP 172.16.117.189:53650->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  603u  IPv4             565738      0t0      TCP 172.16.117.189:53654->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  630u  IPv4             563021      0t0      TCP 172.16.117.189:53658->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  657u  IPv4             568361      0t0      TCP 172.16.117.189:53662->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  666u  IPv4             563027      0t0      TCP 172.16.117.189:53666->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  675u  IPv4             567538      0t0      TCP 172.16.117.189:53670->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  684u  IPv4             567998      0t0      TCP 172.16.117.189:53678->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  690u  IPv4             574709      0t0      TCP 172.16.117.189:53686->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  693u  IPv4             578725      0t0      TCP 172.16.117.189:53694->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  696u  IPv4             576840      0t0      TCP 172.16.117.189:53698->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  699u  IPv4             577819      0t0      TCP 172.16.117.189:53702->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  702u  IPv4             582192      0t0      TCP 172.16.117.189:53710->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  705u  IPv4             582861      0t0      TCP 172.16.117.189:53714->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  708u  IPv4             577363      0t0      TCP 172.16.117.189:53718->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  711u  IPv4             578175      0t0      TCP 172.16.117.189:53722->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  714u  IPv4             587172      0t0      TCP 172.16.117.189:53730->147.75.102.132:amqp (CLOSE_WAIT)
> ruby    108533 miha  717u  IPv4             584387      0t0      TCP 172.16.117.189:53734->147.75.102.132:amqp (CLOSE_WAIT)
> ...
> I think the CLOSE_WAIT status of file descriptor indicates that the TCP
> connection has already been closed, but the file descriptor wasn't closed.
> After 9 hours or so there are enough of such file descriptors for OS to
> complain about it.
> We did all we could to close connections gracefully:
> connection.container.stop
> connection.close
> connection = nil
> but nothing seems to help. A simple but expensive workaround is to manually invoke Ruby's garbage collection,
> but ideally `connection.close` would close the file descriptor.
> May I kindly ask you to look at this?
> Thank you and Best Regards,
> Miha
> PS: The error occurs both on Ubuntu 16.04 and RHEL 7.4
> PS2: The error occurs both on qpid_proton 0.19.0 and 0.21.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org