You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by Ernie Allen <ea...@redhat.com> on 2015/05/21 21:48:10 UTC

Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/
-----------------------------------------------------------

Review request for qpid, Alan Conway and Kenneth Giusti.


Repository: qpid


Description
-------

Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.

The recursion problem was solved independently.

The attached patch does two things:
1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.

Here is my estimation on what is happening when the network drops:
- The driver detects the socket error, closes the engine and goes into its retry loop.
- Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
- However, the broker sees the attempt to attach using a channel that it thinks is already attached.
- The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
- This results in a do_session_detached() event in the engine.
- However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.

If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.

Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.


Diffs
-----

  trunk/qpid/python/qpid/messaging/driver.py 1680941 
  trunk/qpid/python/qpid/messaging/endpoints.py 1680941 

Diff: https://reviews.apache.org/r/34560/diff/


Testing
-------

1. Run this script against a qpidd broker:
#!/usr/bin/env python
from qpid.messaging import *
import datetime

conn = Connection("localhost:5672", reconnect=10)
timeout=10

try:
  conn.open()
  sess = conn.session()

  recv = sess.receiver("testQueue;{create:always}")
  
  while (1):
    print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
    msg = Message()
    try:
      msg = recv.fetch(timeout=timeout)
    except ReceiverError, e:
      print e
    except ConnectError, e:
      print "ConnectError", str(e)
      break
    print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)

  print "about to close session"
  sess.close()

except ReceiverError, e:
  print e
except KeyboardInterrupt:
  pass

print "about to close connection"
conn.close()

2. Simulate network outage:
iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date

3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)

4. Wait up to 10 seconds

The ConnectError is received by the client and the loop can be exited.


Thanks,

Ernie Allen


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Alan Conway <ac...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/#review86709
-----------------------------------------------------------


I would strongly suggest fixing session force. This scenario is exactly why that feature exists, and it is the right way to solve the problem. It takes care of cleanup on the broker end and it doesn't require renaming the sessions. Renaming the sessions seems like something that will bite us somewhere down the line when the user's idea of the session name fails to match the client and brokers - I don't have a concrete example now but it feels like an accident waiting to happen. I can give a hand with fixing session force if need be.

- Alan Conway


On June 4, 2015, 6:10 p.m., Ernie Allen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34560/
> -----------------------------------------------------------
> 
> (Updated June 4, 2015, 6:10 p.m.)
> 
> 
> Review request for qpid, Alan Conway and Kenneth Giusti.
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.
> 
> The recursion problem was solved independently.
> 
> The attached patch does two things:
> 1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
> 2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.
> 
> Here is my estimation on what is happening when the network drops:
> - The driver detects the socket error, closes the engine and goes into its retry loop.
> - Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
> - However, the broker sees the attempt to attach using a channel that it thinks is already attached.
> - The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
> 2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
> - This results in a do_session_detached() event in the engine.
> - However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.
> 
> If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.
> 
> Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/python/qpid/messaging/driver.py 1680941 
>   trunk/qpid/python/qpid/messaging/endpoints.py 1680941 
> 
> Diff: https://reviews.apache.org/r/34560/diff/
> 
> 
> Testing
> -------
> 
> 1. Run this script against a qpidd broker:
> #!/usr/bin/env python
> from qpid.messaging import *
> import datetime
> 
> conn = Connection("localhost:5672", reconnect=10)
> timeout=10
> 
> try:
>   conn.open()
>   sess = conn.session()
> 
>   recv = sess.receiver("testQueue;{create:always}")
>   
>   while (1):
>     print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
>     msg = Message()
>     try:
>       msg = recv.fetch(timeout=timeout)
>     except ReceiverError, e:
>       print e
>     except ConnectError, e:
>       print "ConnectError", str(e)
>       break
>     print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)
> 
>   print "about to close session"
>   sess.close()
> 
> except ReceiverError, e:
>   print e
> except KeyboardInterrupt:
>   pass
> 
> print "about to close connection"
> conn.close()
> 
> 2. Simulate network outage:
> iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date
> 
> 3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)
> 
> 4. Wait up to 10 seconds
> 
> The ConnectError is received by the client and the loop can be exited.
> 
> 
> Thanks,
> 
> Ernie Allen
> 
>


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Alan Conway <ac...@redhat.com>.

> On June 8, 2015, 1:12 p.m., Alan Conway wrote:
> > What's there will work for the case  you are trying to fix, where the original connection has failed. There is a missing piece however. The spec says of "force": "If set then a busy session will be forcibly detached from its other transport and reattached to the current transport. A session MUST NOT be attached to more than one transport at a time." So there willl be a problem if the original session has NOT failed and another connection forces one of its sessions. In that case we will have 2 connections with the same named session. The SessionManager is not designed to handle this, it does not have any way to find the connection owns the original session to detach it. I'll have a think and send you a sketch of a fix.

Typo: "if the original session has NOT failed" should be "if the original *connection* has NOT failed"


- Alan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/#review87025
-----------------------------------------------------------


On June 5, 2015, 2:46 p.m., Ernie Allen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34560/
> -----------------------------------------------------------
> 
> (Updated June 5, 2015, 2:46 p.m.)
> 
> 
> Review request for qpid, Alan Conway and Kenneth Giusti.
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.
> 
> The recursion problem was solved independently.
> 
> The attached patch does two things:
> 1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
> 2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.
> 
> Here is my estimation on what is happening when the network drops:
> - The driver detects the socket error, closes the engine and goes into its retry loop.
> - Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
> - However, the broker sees the attempt to attach using a channel that it thinks is already attached.
> - The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
> 2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
> - This results in a do_session_detached() event in the engine.
> - However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.
> 
> If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.
> 
> Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/cpp/src/qpid/broker/SessionManager.cpp 1680941 
>   trunk/qpid/python/qpid/messaging/driver.py 1680941 
> 
> Diff: https://reviews.apache.org/r/34560/diff/
> 
> 
> Testing
> -------
> 
> 1. Run this script against a qpidd broker:
> #!/usr/bin/env python
> from qpid.messaging import *
> import datetime
> 
> conn = Connection("localhost:5672", reconnect=10)
> timeout=10
> 
> try:
>   conn.open()
>   sess = conn.session()
> 
>   recv = sess.receiver("testQueue;{create:always}")
>   
>   while (1):
>     print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
>     msg = Message()
>     try:
>       msg = recv.fetch(timeout=timeout)
>     except ReceiverError, e:
>       print e
>     except ConnectError, e:
>       print "ConnectError", str(e)
>       break
>     print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)
> 
>   print "about to close session"
>   sess.close()
> 
> except ReceiverError, e:
>   print e
> except KeyboardInterrupt:
>   pass
> 
> print "about to close connection"
> conn.close()
> 
> 2. Simulate network outage:
> iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date
> 
> 3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)
> 
> 4. Wait up to 10 seconds
> 
> The ConnectError is received by the client and the loop can be exited.
> 
> 
> Thanks,
> 
> Ernie Allen
> 
>


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Ernie Allen <ea...@redhat.com>.

> On June 8, 2015, 1:12 p.m., Alan Conway wrote:
> > What's there will work for the case  you are trying to fix, where the original connection has failed. There is a missing piece however. The spec says of "force": "If set then a busy session will be forcibly detached from its other transport and reattached to the current transport. A session MUST NOT be attached to more than one transport at a time." So there willl be a problem if the original session has NOT failed and another connection forces one of its sessions. In that case we will have 2 connections with the same named session. The SessionManager is not designed to handle this, it does not have any way to find the connection owns the original session to detach it. I'll have a think and send you a sketch of a fix.
> 
> Alan Conway wrote:
>     Typo: "if the original session has NOT failed" should be "if the original *connection* has NOT failed"

I agree that this doesn't fully support the session attach force flag. However, since it does work in the case of a failed connection, and this bz deals exclusively with failed connections, I recommend we use this fix for now and fully implement the force flag in a future enhancement bz.


- Ernie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/#review87025
-----------------------------------------------------------


On June 5, 2015, 2:46 p.m., Ernie Allen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34560/
> -----------------------------------------------------------
> 
> (Updated June 5, 2015, 2:46 p.m.)
> 
> 
> Review request for qpid, Alan Conway and Kenneth Giusti.
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.
> 
> The recursion problem was solved independently.
> 
> The attached patch does two things:
> 1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
> 2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.
> 
> Here is my estimation on what is happening when the network drops:
> - The driver detects the socket error, closes the engine and goes into its retry loop.
> - Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
> - However, the broker sees the attempt to attach using a channel that it thinks is already attached.
> - The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
> 2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
> - This results in a do_session_detached() event in the engine.
> - However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.
> 
> If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.
> 
> Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/cpp/src/qpid/broker/SessionManager.cpp 1680941 
>   trunk/qpid/python/qpid/messaging/driver.py 1680941 
> 
> Diff: https://reviews.apache.org/r/34560/diff/
> 
> 
> Testing
> -------
> 
> 1. Run this script against a qpidd broker:
> #!/usr/bin/env python
> from qpid.messaging import *
> import datetime
> 
> conn = Connection("localhost:5672", reconnect=10)
> timeout=10
> 
> try:
>   conn.open()
>   sess = conn.session()
> 
>   recv = sess.receiver("testQueue;{create:always}")
>   
>   while (1):
>     print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
>     msg = Message()
>     try:
>       msg = recv.fetch(timeout=timeout)
>     except ReceiverError, e:
>       print e
>     except ConnectError, e:
>       print "ConnectError", str(e)
>       break
>     print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)
> 
>   print "about to close session"
>   sess.close()
> 
> except ReceiverError, e:
>   print e
> except KeyboardInterrupt:
>   pass
> 
> print "about to close connection"
> conn.close()
> 
> 2. Simulate network outage:
> iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date
> 
> 3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)
> 
> 4. Wait up to 10 seconds
> 
> The ConnectError is received by the client and the loop can be exited.
> 
> 
> Thanks,
> 
> Ernie Allen
> 
>


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Alan Conway <ac...@redhat.com>.

> On June 8, 2015, 1:12 p.m., Alan Conway wrote:
> > What's there will work for the case  you are trying to fix, where the original connection has failed. There is a missing piece however. The spec says of "force": "If set then a busy session will be forcibly detached from its other transport and reattached to the current transport. A session MUST NOT be attached to more than one transport at a time." So there willl be a problem if the original session has NOT failed and another connection forces one of its sessions. In that case we will have 2 connections with the same named session. The SessionManager is not designed to handle this, it does not have any way to find the connection owns the original session to detach it. I'll have a think and send you a sketch of a fix.
> 
> Alan Conway wrote:
>     Typo: "if the original session has NOT failed" should be "if the original *connection* has NOT failed"
> 
> Ernie Allen wrote:
>     I agree that this doesn't fully support the session attach force flag. However, since it does work in the case of a failed connection, and this bz deals exclusively with failed connections, I recommend we use this fix for now and fully implement the force flag in a future enhancement bz.

+1


- Alan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/#review87025
-----------------------------------------------------------


On June 5, 2015, 2:46 p.m., Ernie Allen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34560/
> -----------------------------------------------------------
> 
> (Updated June 5, 2015, 2:46 p.m.)
> 
> 
> Review request for qpid, Alan Conway and Kenneth Giusti.
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.
> 
> The recursion problem was solved independently.
> 
> The attached patch does two things:
> 1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
> 2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.
> 
> Here is my estimation on what is happening when the network drops:
> - The driver detects the socket error, closes the engine and goes into its retry loop.
> - Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
> - However, the broker sees the attempt to attach using a channel that it thinks is already attached.
> - The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
> 2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
> - This results in a do_session_detached() event in the engine.
> - However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.
> 
> If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.
> 
> Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/cpp/src/qpid/broker/SessionManager.cpp 1680941 
>   trunk/qpid/python/qpid/messaging/driver.py 1680941 
> 
> Diff: https://reviews.apache.org/r/34560/diff/
> 
> 
> Testing
> -------
> 
> 1. Run this script against a qpidd broker:
> #!/usr/bin/env python
> from qpid.messaging import *
> import datetime
> 
> conn = Connection("localhost:5672", reconnect=10)
> timeout=10
> 
> try:
>   conn.open()
>   sess = conn.session()
> 
>   recv = sess.receiver("testQueue;{create:always}")
>   
>   while (1):
>     print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
>     msg = Message()
>     try:
>       msg = recv.fetch(timeout=timeout)
>     except ReceiverError, e:
>       print e
>     except ConnectError, e:
>       print "ConnectError", str(e)
>       break
>     print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)
> 
>   print "about to close session"
>   sess.close()
> 
> except ReceiverError, e:
>   print e
> except KeyboardInterrupt:
>   pass
> 
> print "about to close connection"
> conn.close()
> 
> 2. Simulate network outage:
> iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date
> 
> 3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)
> 
> 4. Wait up to 10 seconds
> 
> The ConnectError is received by the client and the loop can be exited.
> 
> 
> Thanks,
> 
> Ernie Allen
> 
>


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Alan Conway <ac...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/#review87025
-----------------------------------------------------------


What's there will work for the case  you are trying to fix, where the original connection has failed. There is a missing piece however. The spec says of "force": "If set then a busy session will be forcibly detached from its other transport and reattached to the current transport. A session MUST NOT be attached to more than one transport at a time." So there willl be a problem if the original session has NOT failed and another connection forces one of its sessions. In that case we will have 2 connections with the same named session. The SessionManager is not designed to handle this, it does not have any way to find the connection owns the original session to detach it. I'll have a think and send you a sketch of a fix.

- Alan Conway


On June 5, 2015, 2:46 p.m., Ernie Allen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34560/
> -----------------------------------------------------------
> 
> (Updated June 5, 2015, 2:46 p.m.)
> 
> 
> Review request for qpid, Alan Conway and Kenneth Giusti.
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.
> 
> The recursion problem was solved independently.
> 
> The attached patch does two things:
> 1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
> 2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.
> 
> Here is my estimation on what is happening when the network drops:
> - The driver detects the socket error, closes the engine and goes into its retry loop.
> - Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
> - However, the broker sees the attempt to attach using a channel that it thinks is already attached.
> - The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
> 2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
> - This results in a do_session_detached() event in the engine.
> - However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.
> 
> If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.
> 
> Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/cpp/src/qpid/broker/SessionManager.cpp 1680941 
>   trunk/qpid/python/qpid/messaging/driver.py 1680941 
> 
> Diff: https://reviews.apache.org/r/34560/diff/
> 
> 
> Testing
> -------
> 
> 1. Run this script against a qpidd broker:
> #!/usr/bin/env python
> from qpid.messaging import *
> import datetime
> 
> conn = Connection("localhost:5672", reconnect=10)
> timeout=10
> 
> try:
>   conn.open()
>   sess = conn.session()
> 
>   recv = sess.receiver("testQueue;{create:always}")
>   
>   while (1):
>     print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
>     msg = Message()
>     try:
>       msg = recv.fetch(timeout=timeout)
>     except ReceiverError, e:
>       print e
>     except ConnectError, e:
>       print "ConnectError", str(e)
>       break
>     print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)
> 
>   print "about to close session"
>   sess.close()
> 
> except ReceiverError, e:
>   print e
> except KeyboardInterrupt:
>   pass
> 
> print "about to close connection"
> conn.close()
> 
> 2. Simulate network outage:
> iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date
> 
> 3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)
> 
> 4. Wait up to 10 seconds
> 
> The ConnectError is received by the client and the loop can be exited.
> 
> 
> Thanks,
> 
> Ernie Allen
> 
>


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Alan Conway <ac...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/#review87063
-----------------------------------------------------------

Ship it!


I agree with comitting this fix for the current issue and raising a JIRA to fix force later. 0-10 is an old protocol now and force doesn't apply to 1.0, plus we never implemented it before and nobody complained so it's not a top priority.

- Alan Conway


On June 5, 2015, 2:46 p.m., Ernie Allen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34560/
> -----------------------------------------------------------
> 
> (Updated June 5, 2015, 2:46 p.m.)
> 
> 
> Review request for qpid, Alan Conway and Kenneth Giusti.
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.
> 
> The recursion problem was solved independently.
> 
> The attached patch does two things:
> 1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
> 2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.
> 
> Here is my estimation on what is happening when the network drops:
> - The driver detects the socket error, closes the engine and goes into its retry loop.
> - Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
> - However, the broker sees the attempt to attach using a channel that it thinks is already attached.
> - The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
> 2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
> - This results in a do_session_detached() event in the engine.
> - However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.
> 
> If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.
> 
> Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/cpp/src/qpid/broker/SessionManager.cpp 1680941 
>   trunk/qpid/python/qpid/messaging/driver.py 1680941 
> 
> Diff: https://reviews.apache.org/r/34560/diff/
> 
> 
> Testing
> -------
> 
> 1. Run this script against a qpidd broker:
> #!/usr/bin/env python
> from qpid.messaging import *
> import datetime
> 
> conn = Connection("localhost:5672", reconnect=10)
> timeout=10
> 
> try:
>   conn.open()
>   sess = conn.session()
> 
>   recv = sess.receiver("testQueue;{create:always}")
>   
>   while (1):
>     print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
>     msg = Message()
>     try:
>       msg = recv.fetch(timeout=timeout)
>     except ReceiverError, e:
>       print e
>     except ConnectError, e:
>       print "ConnectError", str(e)
>       break
>     print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)
> 
>   print "about to close session"
>   sess.close()
> 
> except ReceiverError, e:
>   print e
> except KeyboardInterrupt:
>   pass
> 
> print "about to close connection"
> conn.close()
> 
> 2. Simulate network outage:
> iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date
> 
> 3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)
> 
> 4. Wait up to 10 seconds
> 
> The ConnectError is received by the client and the loop can be exited.
> 
> 
> Thanks,
> 
> Ernie Allen
> 
>


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Ernie Allen <ea...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/
-----------------------------------------------------------

(Updated June 5, 2015, 2:46 p.m.)


Review request for qpid, Alan Conway and Kenneth Giusti.


Changes
-------

Using force flag on SessionManager::attach.
When a new python engine is created, if the passed connection already has sessions, then attach those sessions with force=True.

Note: It can't be that easy. Alan, what else needs to be done in SessionManager::attach to support force? All I did was not raise an exception if the session was already in the attached set and the force flag was True. This seems to work in my test though. The existing broker session was used and messages can be sent/received after blocking and unblocking the firewall.


Repository: qpid


Description
-------

Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.

The recursion problem was solved independently.

The attached patch does two things:
1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.

Here is my estimation on what is happening when the network drops:
- The driver detects the socket error, closes the engine and goes into its retry loop.
- Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
- However, the broker sees the attempt to attach using a channel that it thinks is already attached.
- The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
- This results in a do_session_detached() event in the engine.
- However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.

If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.

Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.


Diffs (updated)
-----

  trunk/qpid/cpp/src/qpid/broker/SessionManager.cpp 1680941 
  trunk/qpid/python/qpid/messaging/driver.py 1680941 

Diff: https://reviews.apache.org/r/34560/diff/


Testing
-------

1. Run this script against a qpidd broker:
#!/usr/bin/env python
from qpid.messaging import *
import datetime

conn = Connection("localhost:5672", reconnect=10)
timeout=10

try:
  conn.open()
  sess = conn.session()

  recv = sess.receiver("testQueue;{create:always}")
  
  while (1):
    print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
    msg = Message()
    try:
      msg = recv.fetch(timeout=timeout)
    except ReceiverError, e:
      print e
    except ConnectError, e:
      print "ConnectError", str(e)
      break
    print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)

  print "about to close session"
  sess.close()

except ReceiverError, e:
  print e
except KeyboardInterrupt:
  pass

print "about to close connection"
conn.close()

2. Simulate network outage:
iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date

3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)

4. Wait up to 10 seconds

The ConnectError is received by the client and the loop can be exited.


Thanks,

Ernie Allen


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Ernie Allen <ea...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/
-----------------------------------------------------------

(Updated June 4, 2015, 6:10 p.m.)


Review request for qpid, Alan Conway and Kenneth Giusti.


Changes
-------

Until SessionManager::attach supports the force flag, we can't reconnect using the existing session name. So in this special case (temporary network glitch after session is opened), rename the sessions. 
This of course results in stranded sessions on the broker. Is there a way to close those when we reconnect?


Repository: qpid


Description
-------

Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.

The recursion problem was solved independently.

The attached patch does two things:
1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.

Here is my estimation on what is happening when the network drops:
- The driver detects the socket error, closes the engine and goes into its retry loop.
- Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
- However, the broker sees the attempt to attach using a channel that it thinks is already attached.
- The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
- This results in a do_session_detached() event in the engine.
- However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.

If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.

Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.


Diffs (updated)
-----

  trunk/qpid/python/qpid/messaging/driver.py 1680941 
  trunk/qpid/python/qpid/messaging/endpoints.py 1680941 

Diff: https://reviews.apache.org/r/34560/diff/


Testing
-------

1. Run this script against a qpidd broker:
#!/usr/bin/env python
from qpid.messaging import *
import datetime

conn = Connection("localhost:5672", reconnect=10)
timeout=10

try:
  conn.open()
  sess = conn.session()

  recv = sess.receiver("testQueue;{create:always}")
  
  while (1):
    print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
    msg = Message()
    try:
      msg = recv.fetch(timeout=timeout)
    except ReceiverError, e:
      print e
    except ConnectError, e:
      print "ConnectError", str(e)
      break
    print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)

  print "about to close session"
  sess.close()

except ReceiverError, e:
  print e
except KeyboardInterrupt:
  pass

print "about to close connection"
conn.close()

2. Simulate network outage:
iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date

3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)

4. Wait up to 10 seconds

The ConnectError is received by the client and the loop can be exited.


Thanks,

Ernie Allen


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Ernie Allen <ea...@redhat.com>.

> On May 22, 2015, 6:17 p.m., Gordon Sim wrote:
> > trunk/qpid/python/qpid/messaging/driver.py, line 452
> > <https://reviews.apache.org/r/34560/diff/2/?file=969418#file969418line452>
> >
> >     The whole point of the reconnect is to allow the application to keep using the connection and associated sessions and links in the face of a network failure.

Agreed. The new patch uses the session.force flag to re-attach the existing session.


- Ernie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/#review84948
-----------------------------------------------------------


On June 5, 2015, 2:46 p.m., Ernie Allen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34560/
> -----------------------------------------------------------
> 
> (Updated June 5, 2015, 2:46 p.m.)
> 
> 
> Review request for qpid, Alan Conway and Kenneth Giusti.
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.
> 
> The recursion problem was solved independently.
> 
> The attached patch does two things:
> 1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
> 2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.
> 
> Here is my estimation on what is happening when the network drops:
> - The driver detects the socket error, closes the engine and goes into its retry loop.
> - Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
> - However, the broker sees the attempt to attach using a channel that it thinks is already attached.
> - The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
> 2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
> - This results in a do_session_detached() event in the engine.
> - However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.
> 
> If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.
> 
> Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/cpp/src/qpid/broker/SessionManager.cpp 1680941 
>   trunk/qpid/python/qpid/messaging/driver.py 1680941 
> 
> Diff: https://reviews.apache.org/r/34560/diff/
> 
> 
> Testing
> -------
> 
> 1. Run this script against a qpidd broker:
> #!/usr/bin/env python
> from qpid.messaging import *
> import datetime
> 
> conn = Connection("localhost:5672", reconnect=10)
> timeout=10
> 
> try:
>   conn.open()
>   sess = conn.session()
> 
>   recv = sess.receiver("testQueue;{create:always}")
>   
>   while (1):
>     print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
>     msg = Message()
>     try:
>       msg = recv.fetch(timeout=timeout)
>     except ReceiverError, e:
>       print e
>     except ConnectError, e:
>       print "ConnectError", str(e)
>       break
>     print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)
> 
>   print "about to close session"
>   sess.close()
> 
> except ReceiverError, e:
>   print e
> except KeyboardInterrupt:
>   pass
> 
> print "about to close connection"
> conn.close()
> 
> 2. Simulate network outage:
> iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date
> 
> 3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)
> 
> 4. Wait up to 10 seconds
> 
> The ConnectError is received by the client and the loop can be exited.
> 
> 
> Thanks,
> 
> Ernie Allen
> 
>


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Gordon Sim <gs...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/#review84948
-----------------------------------------------------------



trunk/qpid/python/qpid/messaging/driver.py
<https://reviews.apache.org/r/34560/#comment136405>

    The whole point of the reconnect is to allow the application to keep using the connection and associated sessions and links in the face of a network failure.


- Gordon Sim


On May 22, 2015, 4:52 p.m., Ernie Allen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34560/
> -----------------------------------------------------------
> 
> (Updated May 22, 2015, 4:52 p.m.)
> 
> 
> Review request for qpid, Alan Conway and Kenneth Giusti.
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.
> 
> The recursion problem was solved independently.
> 
> The attached patch does two things:
> 1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
> 2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.
> 
> Here is my estimation on what is happening when the network drops:
> - The driver detects the socket error, closes the engine and goes into its retry loop.
> - Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
> - However, the broker sees the attempt to attach using a channel that it thinks is already attached.
> - The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
> 2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
> - This results in a do_session_detached() event in the engine.
> - However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.
> 
> If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.
> 
> Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/python/qpid/messaging/driver.py 1680941 
> 
> Diff: https://reviews.apache.org/r/34560/diff/
> 
> 
> Testing
> -------
> 
> 1. Run this script against a qpidd broker:
> #!/usr/bin/env python
> from qpid.messaging import *
> import datetime
> 
> conn = Connection("localhost:5672", reconnect=10)
> timeout=10
> 
> try:
>   conn.open()
>   sess = conn.session()
> 
>   recv = sess.receiver("testQueue;{create:always}")
>   
>   while (1):
>     print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
>     msg = Message()
>     try:
>       msg = recv.fetch(timeout=timeout)
>     except ReceiverError, e:
>       print e
>     except ConnectError, e:
>       print "ConnectError", str(e)
>       break
>     print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)
> 
>   print "about to close session"
>   sess.close()
> 
> except ReceiverError, e:
>   print e
> except KeyboardInterrupt:
>   pass
> 
> print "about to close connection"
> conn.close()
> 
> 2. Simulate network outage:
> iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date
> 
> 3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)
> 
> 4. Wait up to 10 seconds
> 
> The ConnectError is received by the client and the loop can be exited.
> 
> 
> Thanks,
> 
> Ernie Allen
> 
>


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Ernie Allen <ea...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/
-----------------------------------------------------------

(Updated May 22, 2015, 4:52 p.m.)


Review request for qpid, Alan Conway and Kenneth Giusti.


Changes
-------

This patch sets the connection.reconnect flag to false when a socket error occurs.


Repository: qpid


Description
-------

Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.

The recursion problem was solved independently.

The attached patch does two things:
1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.

Here is my estimation on what is happening when the network drops:
- The driver detects the socket error, closes the engine and goes into its retry loop.
- Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
- However, the broker sees the attempt to attach using a channel that it thinks is already attached.
- The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
- This results in a do_session_detached() event in the engine.
- However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.

If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.

Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.


Diffs (updated)
-----

  trunk/qpid/python/qpid/messaging/driver.py 1680941 

Diff: https://reviews.apache.org/r/34560/diff/


Testing
-------

1. Run this script against a qpidd broker:
#!/usr/bin/env python
from qpid.messaging import *
import datetime

conn = Connection("localhost:5672", reconnect=10)
timeout=10

try:
  conn.open()
  sess = conn.session()

  recv = sess.receiver("testQueue;{create:always}")
  
  while (1):
    print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
    msg = Message()
    try:
      msg = recv.fetch(timeout=timeout)
    except ReceiverError, e:
      print e
    except ConnectError, e:
      print "ConnectError", str(e)
      break
    print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)

  print "about to close session"
  sess.close()

except ReceiverError, e:
  print e
except KeyboardInterrupt:
  pass

print "about to close connection"
conn.close()

2. Simulate network outage:
iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date

3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)

4. Wait up to 10 seconds

The ConnectError is received by the client and the loop can be exited.


Thanks,

Ernie Allen


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Gordon Sim <gs...@redhat.com>.

> On May 21, 2015, 8:51 p.m., Alan Conway wrote:
> > There is something more going on here. If the client is re-connecting then it should be re-connecting on an entirely new connection so sessions should not be able to clash, the new connection should have no session on it.
> > 
> > It sounds to me like somehow the client is re-using the original connection after you unblock it at the firewall, which is definitely not right - there could be all kinds of invalid state in that connections sessions. If the client decides a connection is faulty it should definitively close it and forget it before re-connecting and re-establishing sessions. You need to track down how/why it is managing to use the old connection after it has failed.

'If the client is re-connecting then it should be re-connecting on an entirely new connection so sessions should not be able to clash' - this is true in the sense that it is a new connection as far as the broker is concerned, however the client will keep using the qpid.message.Connection and will reattach all the corresponding sessions. If the broker hasn't determined that the old connection is now dead, it won't have closed the old sessions, which could indeed result in a naming clash as the text above hypothesises.


- Gordon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/#review84786
-----------------------------------------------------------


On May 22, 2015, 4:52 p.m., Ernie Allen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34560/
> -----------------------------------------------------------
> 
> (Updated May 22, 2015, 4:52 p.m.)
> 
> 
> Review request for qpid, Alan Conway and Kenneth Giusti.
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.
> 
> The recursion problem was solved independently.
> 
> The attached patch does two things:
> 1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
> 2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.
> 
> Here is my estimation on what is happening when the network drops:
> - The driver detects the socket error, closes the engine and goes into its retry loop.
> - Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
> - However, the broker sees the attempt to attach using a channel that it thinks is already attached.
> - The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
> 2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
> - This results in a do_session_detached() event in the engine.
> - However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.
> 
> If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.
> 
> Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/python/qpid/messaging/driver.py 1680941 
> 
> Diff: https://reviews.apache.org/r/34560/diff/
> 
> 
> Testing
> -------
> 
> 1. Run this script against a qpidd broker:
> #!/usr/bin/env python
> from qpid.messaging import *
> import datetime
> 
> conn = Connection("localhost:5672", reconnect=10)
> timeout=10
> 
> try:
>   conn.open()
>   sess = conn.session()
> 
>   recv = sess.receiver("testQueue;{create:always}")
>   
>   while (1):
>     print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
>     msg = Message()
>     try:
>       msg = recv.fetch(timeout=timeout)
>     except ReceiverError, e:
>       print e
>     except ConnectError, e:
>       print "ConnectError", str(e)
>       break
>     print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)
> 
>   print "about to close session"
>   sess.close()
> 
> except ReceiverError, e:
>   print e
> except KeyboardInterrupt:
>   pass
> 
> print "about to close connection"
> conn.close()
> 
> 2. Simulate network outage:
> iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date
> 
> 3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)
> 
> 4. Wait up to 10 seconds
> 
> The ConnectError is received by the client and the loop can be exited.
> 
> 
> Thanks,
> 
> Ernie Allen
> 
>


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Alan Conway <ac...@redhat.com>.

> On May 21, 2015, 8:51 p.m., Alan Conway wrote:
> > There is something more going on here. If the client is re-connecting then it should be re-connecting on an entirely new connection so sessions should not be able to clash, the new connection should have no session on it.
> > 
> > It sounds to me like somehow the client is re-using the original connection after you unblock it at the firewall, which is definitely not right - there could be all kinds of invalid state in that connections sessions. If the client decides a connection is faulty it should definitively close it and forget it before re-connecting and re-establishing sessions. You need to track down how/why it is managing to use the old connection after it has failed.
> 
> Gordon Sim wrote:
>     'If the client is re-connecting then it should be re-connecting on an entirely new connection so sessions should not be able to clash' - this is true in the sense that it is a new connection as far as the broker is concerned, however the client will keep using the qpid.message.Connection and will reattach all the corresponding sessions. If the broker hasn't determined that the old connection is now dead, it won't have closed the old sessions, which could indeed result in a naming clash as the text above hypothesises.

Ah yes - because 0-10 session names are not scoped to the connection, I had forgotten that. So we either have to rename the session on the client or force the broker to allow the new session. 0-10 has a "force" flag on attach for exactly that but looking at the broker/SessionManager code I see "// FIXME aconway 2008-04-29: implement force " :( It probably wouldn't be hard to implement though.


- Alan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/#review84786
-----------------------------------------------------------


On May 22, 2015, 4:52 p.m., Ernie Allen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34560/
> -----------------------------------------------------------
> 
> (Updated May 22, 2015, 4:52 p.m.)
> 
> 
> Review request for qpid, Alan Conway and Kenneth Giusti.
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.
> 
> The recursion problem was solved independently.
> 
> The attached patch does two things:
> 1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
> 2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.
> 
> Here is my estimation on what is happening when the network drops:
> - The driver detects the socket error, closes the engine and goes into its retry loop.
> - Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
> - However, the broker sees the attempt to attach using a channel that it thinks is already attached.
> - The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
> 2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
> - This results in a do_session_detached() event in the engine.
> - However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.
> 
> If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.
> 
> Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/python/qpid/messaging/driver.py 1680941 
> 
> Diff: https://reviews.apache.org/r/34560/diff/
> 
> 
> Testing
> -------
> 
> 1. Run this script against a qpidd broker:
> #!/usr/bin/env python
> from qpid.messaging import *
> import datetime
> 
> conn = Connection("localhost:5672", reconnect=10)
> timeout=10
> 
> try:
>   conn.open()
>   sess = conn.session()
> 
>   recv = sess.receiver("testQueue;{create:always}")
>   
>   while (1):
>     print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
>     msg = Message()
>     try:
>       msg = recv.fetch(timeout=timeout)
>     except ReceiverError, e:
>       print e
>     except ConnectError, e:
>       print "ConnectError", str(e)
>       break
>     print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)
> 
>   print "about to close session"
>   sess.close()
> 
> except ReceiverError, e:
>   print e
> except KeyboardInterrupt:
>   pass
> 
> print "about to close connection"
> conn.close()
> 
> 2. Simulate network outage:
> iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date
> 
> 3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)
> 
> 4. Wait up to 10 seconds
> 
> The ConnectError is received by the client and the loop can be exited.
> 
> 
> Thanks,
> 
> Ernie Allen
> 
>


Re: Review Request 34560: [python] receiver.fetch raises KeyError after network glitch

Posted by Alan Conway <ac...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34560/#review84786
-----------------------------------------------------------


There is something more going on here. If the client is re-connecting then it should be re-connecting on an entirely new connection so sessions should not be able to clash, the new connection should have no session on it.

It sounds to me like somehow the client is re-using the original connection after you unblock it at the firewall, which is definitely not right - there could be all kinds of invalid state in that connections sessions. If the client decides a connection is faulty it should definitively close it and forget it before re-connecting and re-establishing sessions. You need to track down how/why it is managing to use the old connection after it has failed.

- Alan Conway


On May 21, 2015, 7:48 p.m., Ernie Allen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34560/
> -----------------------------------------------------------
> 
> (Updated May 21, 2015, 7:48 p.m.)
> 
> 
> Review request for qpid, Alan Conway and Kenneth Giusti.
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while causes uncaught exception KeyError in python-qpid-0.22. It causes on semi-infinite recursion on python-qpid-0.30.
> 
> The recursion problem was solved independently.
> 
> The attached patch does two things:
> 1) session.close() checks to see if the session is already closed. If so, it just returns. This prevents an exception from being displayed when the session is already closed.
> 2) In driver.py, if we get a do_session_detached() event, check to see if the channel is in our list of sessions before using it. If it isn't, close the session.
> 
> Here is my estimation on what is happening when the network drops:
> - The driver detects the socket error, closes the engine and goes into its retry loop.
> - Once the network comes back, the engine is restarted and all the sessions on the connection are re-attached.
> - However, the broker sees the attempt to attach using a channel that it thinks is already attached.
> - The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel exception: session-busy: Session already attached: anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55)
> 2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 is not attached (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39)
> - This results in a do_session_detached() event in the engine.
> - However, since the engine was closed when the socket error occurred and reopened when it cleared, it doesn't know about the old session.
> 
> If I test to see if the channel number being detached is associated with a session, and just return, then the client is hung. So.. when I see an event to detach an unknown session, I'm closing the engine and raising a ConnectionError back to the client.
> 
> Ideally the driver/engine would recover, but I don't see how we can get the broker and client back into agreement.
> 
> 
> Diffs
> -----
> 
>   trunk/qpid/python/qpid/messaging/driver.py 1680941 
>   trunk/qpid/python/qpid/messaging/endpoints.py 1680941 
> 
> Diff: https://reviews.apache.org/r/34560/diff/
> 
> 
> Testing
> -------
> 
> 1. Run this script against a qpidd broker:
> #!/usr/bin/env python
> from qpid.messaging import *
> import datetime
> 
> conn = Connection("localhost:5672", reconnect=10)
> timeout=10
> 
> try:
>   conn.open()
>   sess = conn.session()
> 
>   recv = sess.receiver("testQueue;{create:always}")
>   
>   while (1):
>     print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout)
>     msg = Message()
>     try:
>       msg = recv.fetch(timeout=timeout)
>     except ReceiverError, e:
>       print e
>     except ConnectError, e:
>       print "ConnectError", str(e)
>       break
>     print "%s: after fetch, msg=%s"  (datetime.datetime.now(), msg)
> 
>   print "about to close session"
>   sess.close()
> 
> except ReceiverError, e:
>   print e
> except KeyboardInterrupt:
>   pass
> 
> print "about to close connection"
> conn.close()
> 
> 2. Simulate network outage:
> iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date
> 
> 3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F)
> 
> 4. Wait up to 10 seconds
> 
> The ConnectError is received by the client and the loop can be exited.
> 
> 
> Thanks,
> 
> Ernie Allen
> 
>