You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Chuck Rolke (Created) (JIRA)" <ji...@apache.org> on 2012/01/13 18:50:39 UTC

[jira] [Created] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Heartbeat timeout in Windows does not lead to timely reconnect
--------------------------------------------------------------

                 Key: QPID-3759
                 URL: https://issues.apache.org/jira/browse/QPID-3759
             Project: Qpid
          Issue Type: Bug
          Components: C++ Client
    Affects Versions: 0.14
         Environment: Windows C++ messaging
            Reporter: Chuck Rolke


Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011

The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).

After the heartbeat timeout the timer task fires and a debug trace shows:
 Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close

But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().

The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "Chuck Rolke (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235680#comment-13235680 ] 

Chuck Rolke commented on QPID-3759:
-----------------------------------

The fix in r1301636 uses function CancelIoEx (Kernel32.lib,.dll) sets a minimum operation version of the client to Windows Vista and of the server to Windows Server 2008. I compiled and tested without realizing that this won't work on XP.
                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>            Assignee: Cliff Jansen
>             Fix For: 0.17
>
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "Cliff Jansen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249416#comment-13249416 ] 

Cliff Jansen commented on QPID-3759:
------------------------------------

Thanks for the heads up.  I can reproduce and have a fix in mind.

Will post asap.
                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>            Assignee: Cliff Jansen
>             Fix For: 0.17
>
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Resolved] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "Robbie Gemmell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robbie Gemmell resolved QPID-3759.
----------------------------------

    Resolution: Fixed

There are 4 commits for this from 5+ months ago so im going to assume its done, resolving.
                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>            Assignee: Cliff Jansen
>             Fix For: 0.17
>
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251269#comment-13251269 ] 

jiraposter@reviews.apache.org commented on QPID-3759:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4383/
-----------------------------------------------------------

(Updated 2012-04-11 02:57:20.281375)


Review request for qpid, Andrew Stitcher, Ted Ross, Chug Rolke, and Steve Huston.


Changes
-------

The cancelled read usually results in an "aborted" status, but depending on how far the socketclose has progressed at the time the completion is posted, you can get a number of other statuses such as connection reset and several others.  This results in a spurious notifyDisconnect() and general mayhem from the deleted Socket.

Since closesocket() is a relatively long operation and the cancel operation occurred outside the completionQueue loop, the dislodged read completion could be processed in a separate thread resulting in a concurrent Socket::close() which, on occasion, yielded an exception.  This was fixed by moving the cancel inside the completionQueue loop so that the resulting completion would be serialized after the cancel.

So round three involves:

1. just using queuedClose to indicate a drained read
2. moving the socket.close() to serialize the read completion
3. adding a queuedDelete check before using a non-existent socket

Presumably #2 would never have occurred with CancelIoEx.  But it is probable that #1 would have been lurking, just occurring very rarely (depending on whether the other side closed its connection at just the right/wrong time).  #3 can be attributed solely to my paranoia.


Summary
-------

The cause of the hang was an outstanding read side completion when the AsynchIO object in charge of the socket was in the queuedClose state.

The completion handler drains outstanding async requests before closing the socket.  Since the cable had been pulled, the async read would never complete until Windows gave up on the socket altogether (some time much later).

This patch remembers the last aio read and will cancel it  if in the queuedClose state before blocking again.


Aside from the basic description from the Jira, I also removed an unused test for restartRead, which doesn't change the logic of the section, but may indicate an intention that wasn't fully coded or something left over from a previous change.


This addresses bug QPID-3759.
    https://issues.apache.org/jira/browse/QPID-3759


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/AsynchIO.cpp 1301636 

Diff: https://reviews.apache.org/r/4383/diff


Testing
-------

qpid-perftest, qpid-send, qpid-receive, cable pulls, broker pause/resumes


Thanks,

Cliff


                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>            Assignee: Cliff Jansen
>             Fix For: 0.17
>
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13257314#comment-13257314 ] 

jiraposter@reviews.apache.org commented on QPID-3759:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4383/
-----------------------------------------------------------

(Updated 2012-04-19 06:52:43.361572)


Review request for qpid, Andrew Stitcher, Ted Ross, Chug Rolke, and Steve Huston.


Changes
-------

Load tests over a period of time reveal a threading bug when closing a connection.

The testing for opsInprogress == 0 and the states of queuedDelete and queuedClose occurs outside the lock.  If an IO thread suspends right after releasing the lock (opsInProgress == 1) and resumes some time later, when another IO thread has decremented opsInProgress to zero, both threads will conclude that they are the last IO completion.  This results variously in double deletes of the underlying socket or the AsynchIO object itself.

This patch moves the test inside the lock.

It also uses the same lock to protect the setting of either queuedDelete or queuedClose and the handoff (if any) to the IO thread.  This has the effect of adding two additional locks over the life of the connection, but should have no effect on throughput or latency.


Summary
-------

The cause of the hang was an outstanding read side completion when the AsynchIO object in charge of the socket was in the queuedClose state.

The completion handler drains outstanding async requests before closing the socket.  Since the cable had been pulled, the async read would never complete until Windows gave up on the socket altogether (some time much later).

This patch remembers the last aio read and will cancel it  if in the queuedClose state before blocking again.


Aside from the basic description from the Jira, I also removed an unused test for restartRead, which doesn't change the logic of the section, but may indicate an intention that wasn't fully coded or something left over from a previous change.


This addresses bug QPID-3759.
    https://issues.apache.org/jira/browse/QPID-3759


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/AsynchIO.cpp 1327776 

Diff: https://reviews.apache.org/r/4383/diff


Testing
-------

qpid-perftest, qpid-send, qpid-receive, cable pulls, broker pause/resumes


Thanks,

Cliff


                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>            Assignee: Cliff Jansen
>             Fix For: 0.17
>
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "Steve Huston (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247776#comment-13247776 ] 

Steve Huston commented on QPID-3759:
------------------------------------

With the closesocket instead of CancelIoEx, the nightly quick qpid-perftest crashes trying to close the socket after its impl has been freed.

Test case:

- Start broker
- qpid-perftest --summary --count 100


                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>            Assignee: Cliff Jansen
>             Fix For: 0.17
>
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "Cliff Jansen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235731#comment-13235731 ] 

Cliff Jansen commented on QPID-3759:
------------------------------------

I chose CancelIoEx because it worked most naturally with the existing code.  The XP-friendly CancelIo can probably be substituted with minor modification.  I will get a new patch fro review asap.


                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>            Assignee: Cliff Jansen
>             Fix For: 0.17
>
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Updated] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "Chuck Rolke (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chuck Rolke updated QPID-3759:
------------------------------

    Attachment: main.cpp

Heartbeat timeout test code.
                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


[jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "Chuck Rolke (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232675#comment-13232675 ] 

Chuck Rolke commented on QPID-3759:
-----------------------------------

Verified AsynchIO.cpp r1301636 on Windows Server 2008 R2 Datacenter 64-bit OS, 32-bit app.
                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258253#comment-13258253 ] 

jiraposter@reviews.apache.org commented on QPID-3759:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4383/#review7069
-----------------------------------------------------------

Ship it!


I spun up VS2008 and VS2010, x86 and x64, debug and release versions of the C++ and .NET binding tools and ran 10's of thousands of these executables against each other with no problem. Previous versions of tests built with patches on this review (on 64-bit Server 2008 R2 Datacenter) usually showed some executable failures before this many executions.

- Chug


On 2012-04-19 06:52:43, Cliff Jansen wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4383/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-19 06:52:43)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Ted Ross, Chug Rolke, and Steve Huston.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  The cause of the hang was an outstanding read side completion when the AsynchIO object in charge of the socket was in the queuedClose state.
bq.  
bq.  The completion handler drains outstanding async requests before closing the socket.  Since the cable had been pulled, the async read would never complete until Windows gave up on the socket altogether (some time much later).
bq.  
bq.  This patch remembers the last aio read and will cancel it  if in the queuedClose state before blocking again.
bq.  
bq.  
bq.  Aside from the basic description from the Jira, I also removed an unused test for restartRead, which doesn't change the logic of the section, but may indicate an intention that wasn't fully coded or something left over from a previous change.
bq.  
bq.  
bq.  This addresses bug QPID-3759.
bq.      https://issues.apache.org/jira/browse/QPID-3759
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/AsynchIO.cpp 1327776 
bq.  
bq.  Diff: https://reviews.apache.org/r/4383/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  qpid-perftest, qpid-send, qpid-receive, cable pulls, broker pause/resumes
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Cliff
bq.  
bq.


                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>            Assignee: Cliff Jansen
>             Fix For: 0.17
>
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238653#comment-13238653 ] 

jiraposter@reviews.apache.org commented on QPID-3759:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4383/
-----------------------------------------------------------

(Updated 2012-03-26 18:26:34.827957)


Review request for qpid, Andrew Stitcher, Ted Ross, Chug Rolke, and Steve Huston.


Changes
-------

This patch follows the same logic of the previous while avoiding CancelIoEx.

CancelIo as a substitution for CancelIoEx was considered but has thread restrictions that would have required a major rewrite of the base code.

I have substituted a much blunter instrument to achieve the completion, namely a full closesocket to unstick the read.  It forces all pending overlapped operations to completions, which is the last read in our case.


Summary
-------

The cause of the hang was an outstanding read side completion when the AsynchIO object in charge of the socket was in the queuedClose state.

The completion handler drains outstanding async requests before closing the socket.  Since the cable had been pulled, the async read would never complete until Windows gave up on the socket altogether (some time much later).

This patch remembers the last aio read and will cancel it  if in the queuedClose state before blocking again.


Aside from the basic description from the Jira, I also removed an unused test for restartRead, which doesn't change the logic of the section, but may indicate an intention that wasn't fully coded or something left over from a previous change.


This addresses bug QPID-3759.
    https://issues.apache.org/jira/browse/QPID-3759


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/AsynchIO.cpp 1301636 

Diff: https://reviews.apache.org/r/4383/diff


Testing
-------

qpid-perftest, qpid-send, qpid-receive, cable pulls, broker pause/resumes


Thanks,

Cliff


                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>            Assignee: Cliff Jansen
>             Fix For: 0.17
>
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251865#comment-13251865 ] 

jiraposter@reviews.apache.org commented on QPID-3759:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4383/#review6859
-----------------------------------------------------------

Ship it!


- Chug


On 2012-04-11 02:57:20, Cliff Jansen wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4383/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-11 02:57:20)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Ted Ross, Chug Rolke, and Steve Huston.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  The cause of the hang was an outstanding read side completion when the AsynchIO object in charge of the socket was in the queuedClose state.
bq.  
bq.  The completion handler drains outstanding async requests before closing the socket.  Since the cable had been pulled, the async read would never complete until Windows gave up on the socket altogether (some time much later).
bq.  
bq.  This patch remembers the last aio read and will cancel it  if in the queuedClose state before blocking again.
bq.  
bq.  
bq.  Aside from the basic description from the Jira, I also removed an unused test for restartRead, which doesn't change the logic of the section, but may indicate an intention that wasn't fully coded or something left over from a previous change.
bq.  
bq.  
bq.  This addresses bug QPID-3759.
bq.      https://issues.apache.org/jira/browse/QPID-3759
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/AsynchIO.cpp 1301636 
bq.  
bq.  Diff: https://reviews.apache.org/r/4383/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  qpid-perftest, qpid-send, qpid-receive, cable pulls, broker pause/resumes
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Cliff
bq.  
bq.


                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>            Assignee: Cliff Jansen
>             Fix For: 0.17
>
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231411#comment-13231411 ] 

jiraposter@reviews.apache.org commented on QPID-3759:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4383/
-----------------------------------------------------------

Review request for qpid, Andrew Stitcher, Ted Ross, Chug Rolke, and Steve Huston.


Summary
-------

The cause of the hang was an outstanding read side completion when the AsynchIO object in charge of the socket was in the queuedClose state.

The completion handler drains outstanding async requests before closing the socket.  Since the cable had been pulled, the async read would never complete until Windows gave up on the socket altogether (some time much later).

This patch remembers the last aio read and will cancel it  if in the queuedClose state before blocking again.


Aside from the basic description from the Jira, I also removed an unused test for restartRead, which doesn't change the logic of the section, but may indicate an intention that wasn't fully coded or something left over from a previous change.


This addresses bug QPID-3759.
    https://issues.apache.org/jira/browse/QPID-3759


Diffs
-----

  http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/AsynchIO.cpp 1301636 

Diff: https://reviews.apache.org/r/4383/diff


Testing
-------

qpid-perftest, qpid-send, qpid-receive, cable pulls, broker pause/resumes


Thanks,

Cliff


                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231466#comment-13231466 ] 

jiraposter@reviews.apache.org commented on QPID-3759:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4383/#review6045
-----------------------------------------------------------

Ship it!


Looks good to me.

- Steve


On 2012-03-16 17:32:02, Cliff Jansen wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4383/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-16 17:32:02)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Ted Ross, Chug Rolke, and Steve Huston.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  The cause of the hang was an outstanding read side completion when the AsynchIO object in charge of the socket was in the queuedClose state.
bq.  
bq.  The completion handler drains outstanding async requests before closing the socket.  Since the cable had been pulled, the async read would never complete until Windows gave up on the socket altogether (some time much later).
bq.  
bq.  This patch remembers the last aio read and will cancel it  if in the queuedClose state before blocking again.
bq.  
bq.  
bq.  Aside from the basic description from the Jira, I also removed an unused test for restartRead, which doesn't change the logic of the section, but may indicate an intention that wasn't fully coded or something left over from a previous change.
bq.  
bq.  
bq.  This addresses bug QPID-3759.
bq.      https://issues.apache.org/jira/browse/QPID-3759
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/AsynchIO.cpp 1301636 
bq.  
bq.  Diff: https://reviews.apache.org/r/4383/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  qpid-perftest, qpid-send, qpid-receive, cable pulls, broker pause/resumes
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Cliff
bq.  
bq.


                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Commented] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231494#comment-13231494 ] 

jiraposter@reviews.apache.org commented on QPID-3759:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4383/#review6048
-----------------------------------------------------------

Ship it!


- Chug


On 2012-03-16 17:32:02, Cliff Jansen wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4383/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-16 17:32:02)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Ted Ross, Chug Rolke, and Steve Huston.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  The cause of the hang was an outstanding read side completion when the AsynchIO object in charge of the socket was in the queuedClose state.
bq.  
bq.  The completion handler drains outstanding async requests before closing the socket.  Since the cable had been pulled, the async read would never complete until Windows gave up on the socket altogether (some time much later).
bq.  
bq.  This patch remembers the last aio read and will cancel it  if in the queuedClose state before blocking again.
bq.  
bq.  
bq.  Aside from the basic description from the Jira, I also removed an unused test for restartRead, which doesn't change the logic of the section, but may indicate an intention that wasn't fully coded or something left over from a previous change.
bq.  
bq.  
bq.  This addresses bug QPID-3759.
bq.      https://issues.apache.org/jira/browse/QPID-3759
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/AsynchIO.cpp 1301636 
bq.  
bq.  Diff: https://reviews.apache.org/r/4383/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  qpid-perftest, qpid-send, qpid-receive, cable pulls, broker pause/resumes
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Cliff
bq.  
bq.


                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


[jira] [Updated] (QPID-3759) Heartbeat timeout in Windows does not lead to timely reconnect

Posted by "Justin Ross (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Justin Ross updated QPID-3759:
------------------------------

    Fix Version/s: 0.17
         Assignee: Cliff Jansen
    
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>            Assignee: Cliff Jansen
>             Fix For: 0.17
>
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html 22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network connection to the broker and then start the program. It creates a connection, sends two messages, and then pauses for 15 seconds. During the pause disconnect the network connection to the broker for at least two heartbeat timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org