You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alan Conway (JIRA)" <ji...@apache.org> on 2018/03/14 17:30:00 UTC
[jira] [Comment Edited] (PROTON-1791) TCP sockets remain open in
CLOSE_WAIT state
[ https://issues.apache.org/jira/browse/PROTON-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398937#comment-16398937 ]
Alan Conway edited comment on PROTON-1791 at 3/14/18 5:29 PM:
--------------------------------------------------------------
[~miha-plesko]Please try the fix at: https://github.com/alanconway/qpid-proton/tree/ruby-close-fd
It's a one-line fix and looks a lot like it might be your problem (proton was doing IO#close_read and close_write but never actually IO#close)
However I haven't reproduced the problem yet, so if that does not fix things for you I need some more info on how you produce the problem. Ideally if you can isolate some example code to demonstrate the problem, or if you can, show me the code you're using and the steps you take to see it.
I have tried using the proton examples/ruby programs to create many connections to a broker.rb and disconnect them, but lsof does not show file descriptors piling up. I also tried writing a test client that creates connections and closes them in a variety of ways, killing processes with open connections etc. Probably there is some specific tear-down sequence that is causing the problem for you that I have not hit on. Also your ruby and OS versions may be relevant.
Thanks!
Alan.
was (Author: aconway):
Please try the fix at: https://github.com/alanconway/qpid-proton/tree/ruby-close-fd
It's a one-line fix and looks a lot like it might be your problem (proton was doing IO#close_read and close_write but never actually IO#close)
However I haven't reproduced the problem yet, so if that does not fix things for you I need some more info on how you are produce the problem. Ideally if you can isolate some example code to demonstrate the problem, or if you can, show me the code you're using and the steps you take to see it.
I have tried using the proton examples/ruby programs to create many connections to a broker.rb and disconnect them, but lsof does not show file descriptors piling up. I also tried writing a test client that creates connections and closes them in a variety of ways, killing processes with open connections etc. Probably there is some specific tear-down sequence that is causing the problem for you that I have not hit on. Also your ruby and OS versions may be relevant.
Thanks!
Alan.
> TCP sockets remain open in CLOSE_WAIT state
> -------------------------------------------
>
> Key: PROTON-1791
> URL: https://issues.apache.org/jira/browse/PROTON-1791
> Project: Qpid Proton
> Issue Type: Bug
> Components: ruby-binding
> Affects Versions: proton-c-0.21.0
> Environment: Confirmed on Ubuntu 16.04 and RHEL 7.4
> Confirmed on qpid_proton 0.19.0 and 0.21.0
> Reporter: Miha Plesko
> Assignee: Alan Conway
> Priority: Major
> Labels: bug
> Fix For: proton-c-0.22.0
>
>
> Hi guys,
> thanks for developing the awesome qpid_proton ruby gem, we're using it on daily basis!
> However, recently we noticed following error in our server log:
> Too many open files - socket(2) for "172.16.117.189" port 5672
> After some research it turns out that qpid_proton process is having increasingly
> more and more following file descriptors open:
> $ lsof -ap 108533
> ruby 108533 miha 116u IPv4 562438 0t0 TCP 172.16.117.189:53626->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 197u IPv4 561644 0t0 TCP 172.16.117.189:53630->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 311u IPv4 560657 0t0 TCP 172.16.117.189:53634->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 549u IPv4 565342 0t0 TCP 172.16.117.189:53642->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 576u IPv4 565122 0t0 TCP 172.16.117.189:53650->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 603u IPv4 565738 0t0 TCP 172.16.117.189:53654->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 630u IPv4 563021 0t0 TCP 172.16.117.189:53658->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 657u IPv4 568361 0t0 TCP 172.16.117.189:53662->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 666u IPv4 563027 0t0 TCP 172.16.117.189:53666->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 675u IPv4 567538 0t0 TCP 172.16.117.189:53670->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 684u IPv4 567998 0t0 TCP 172.16.117.189:53678->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 690u IPv4 574709 0t0 TCP 172.16.117.189:53686->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 693u IPv4 578725 0t0 TCP 172.16.117.189:53694->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 696u IPv4 576840 0t0 TCP 172.16.117.189:53698->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 699u IPv4 577819 0t0 TCP 172.16.117.189:53702->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 702u IPv4 582192 0t0 TCP 172.16.117.189:53710->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 705u IPv4 582861 0t0 TCP 172.16.117.189:53714->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 708u IPv4 577363 0t0 TCP 172.16.117.189:53718->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 711u IPv4 578175 0t0 TCP 172.16.117.189:53722->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 714u IPv4 587172 0t0 TCP 172.16.117.189:53730->147.75.102.132:amqp (CLOSE_WAIT)
> ruby 108533 miha 717u IPv4 584387 0t0 TCP 172.16.117.189:53734->147.75.102.132:amqp (CLOSE_WAIT)
> ...
> I think the CLOSE_WAIT status of file descriptor indicates that the TCP
> connection has already been closed, but the file descriptor wasn't closed.
> After 9 hours or so there are enough of such file descriptors for OS to
> complain about it.
> We did all we could to close connections gracefully:
> connection.container.stop
> connection.close
> connection = nil
> but nothing seems to help. A simple but expensive workaround is to manually invoke Ruby's garbage collection,
> but ideally `connection.close` would close the file descriptor.
> May I kindly ask you to look at this?
> Thank you and Best Regards,
> Miha
> PS: The error occurs both on Ubuntu 16.04 and RHEL 7.4
> PS2: The error occurs both on qpid_proton 0.19.0 and 0.21.0
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org