You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by Alan Conway <ac...@redhat.com> on 2014/01/31 00:13:25 UTC

Review Request 17563: QPID-5528: HA Clean up error messages around rolled-back transactions.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17563/
-----------------------------------------------------------

Review request for qpid and Gordon Sim.


Repository: qpid


Description
-------

QPID-5528: HA Clean up error messages around rolled-back transactions.

A simple transaction test generates a lot of errors and rollback notices in the
broker logs even though the test code never rolls back a transaction. E.g.

  qpid-cluster-benchmark -b 20.0.20.200 -n1 -m 1000 -q3 -s2 -r2 --send-arg=--tx --send-arg=10 --receive-arg=--tx --receive-arg=10

The errors are caused by queues being deleted while backup brokers are using
them. This happens a lot in the transaction test because a transactional session
must create a new transaction when the previous one closes. When the session
closes the open transaction is rolled back automatically. Thus there is almost
always an empty transaction that is created then immediately rolled back at the
end of the session. Backup brokers may still be in the process of subscribing to
the transaction's replication queue at this point, causing (harmlesss) errors.

We'd like to clean this up completely but unfortunately there are some error
messages that are not easy to remove. This commit does the following:

- Remove misleading "backup disconnected" message for cancelled transactions.
- Include TxReplicator destroy in QueueReplicator mutex. Idempotence check before any destroy.
- Remove spurious warning about ignored unreplicated dequeues.
- Remove log messages about rolling back/destroying empty transactions.
- Log incoming execution exceptions as errors to explain subsequent session detached errors.

There should be no more unexpected log messages about rollbacks since we no
longer log rollback of an empty transaction.

The following errors remain, they are harmless and can be ignored. They are not
necessarily related to transactions and can occur in other situations where
queues are deleted while still replicating.

Queue deleted, replication in progress (happens in the test, not related to TX)

On primary, the following errors:
- [Protocol] error Incoming Execution exception: not-found: Exchange not found: qpid.ha-q:...
- [Protocol] error Channel exception: not-attached: Channel xxx is not attached

On backup:
- [Protocol] error Execution exception: not-found: Exchange not found: qpid.ha-q:...

Queue deleted while backup is in the process of subscribing (in the test happens
because of empty tx)

On primary:
 [Protocol] error Execution exception: not-found: Queue not found: qpid.ha-tx:...

On backup:
[Broker] error Incoming execution exception: not-found: Queue not found: qpid.ha-tx:...

Note the "Incoming exception exception" error was actually added by this commit,
since otherwise there's no way to tell the cause of the "channel not attached"
errors.

This is not a satisfactory situation. The "harmless" error messages might
sometimes indicate real error situations and the clutter makes debugging
harder. I will leave this JIRA open for now and try to come up with a better
solution.


Diffs
-----

  /trunk/qpid/cpp/src/qpid/Exception.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/broker/SessionAdapter.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.h 1562539 
  /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.h 1562539 
  /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/TxReplicator.h 1562539 
  /trunk/qpid/cpp/src/qpid/ha/TxReplicator.cpp 1562539 

Diff: https://reviews.apache.org/r/17563/diff/


Testing
-------


Thanks,

Alan Conway


Re: Review Request 17563: QPID-5528: HA Clean up error messages around rolled-back transactions.

Posted by Gordon Sim <gs...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17563/#review33414
-----------------------------------------------------------

Ship it!


Very nice!

- Gordon Sim


On Jan. 31, 2014, 10:23 p.m., Alan Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17563/
> -----------------------------------------------------------
> 
> (Updated Jan. 31, 2014, 10:23 p.m.)
> 
> 
> Review request for qpid and Gordon Sim.
> 
> 
> Bugs: QPID-5528
>     https://issues.apache.org/jira/browse/QPID-5528
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> QPID-5528: HA Clean up error messages around rolled-back transactions.
> 
> A simple transaction test on a 3 node cluster generates a lot of errors and
> rollback notices in the broker logs even though the test code never rolls back a
> transaction. E.g.
> 
>   qpid-cluster-benchmark -b 20.0.20.200 -n1 -m 1000 -q3 -s2 -r2 --send-arg=--tx --send-arg=10 --receive-arg=--tx --receive-arg=10
> 
> The errors are caused by queues being deleted while backup brokers are using
> them. This happens a lot in the transaction test because a transactional session
> must create a new transaction when the previous one closes. When the session
> closes the open transaction is rolled back automatically. Thus there is almost
> always an empty transaction that is created then immediately rolled back at the
> end of the session. Backup brokers may still be in the process of subscribing to
> the transaction's replication queue at this point, causing (harmlesss) errors.
> 
> This commit takes the following steps to clean up the unwanted error and rollback messages:
> 
> HA TX messages cleaned up:
> - Remove misleading "backup disconnected" message for cancelled transactions.
> - Include TxReplicator destroy in QueueReplicator mutex. Idempotence check before any destroy.
> - Remove spurious warning about ignored unreplicated dequeues.
> - Remove log messages about rolling back/destroying empty transactions.
> 
> Allow HA to suppress/modify broker exception logging:
> - Move broker exception logging into ErrorListener
>   - Every SessionHandler has DefaultErrorListener that does the same logging as before.
>   - Added SessionHandlerObserver to allow plugins to change the error listener.
> - HA backup sets ErrorListeners on replication session bridges
>   - Log exceptions that are not errors as HA debug messages, not broker errors.
> - HA primary sets error listener on backup sessions via SessionHandlerObserver
>   - Log exceptions as HA debug messages, not broker errors.
> 
> Unrelated cleanup:
> - Remove unused DetachedCallback in Link.cpp
> - Broker now logs "incoming execution exceptions" as debug messages rather than ignoring.
> 
> Backps are still logging this error message, not sure why:
>   [Broker] error Channel exception: not-attached: session.detached from peer.
> Will continue to investigate this, it looks like this should be suppressed by backup error listeners.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/Exception.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/broker/Broker.h 1562539 
>   /trunk/qpid/cpp/src/qpid/broker/Link.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/broker/SessionHandler.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/BrokerReplicator.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/Primary.h 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/Primary.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.h 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.h 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/TxReplicator.h 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/TxReplicator.cpp 1562539 
> 
> Diff: https://reviews.apache.org/r/17563/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alan Conway
> 
>


Re: Review Request 17563: QPID-5528: HA Clean up error messages around rolled-back transactions.

Posted by Alan Conway <ac...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17563/
-----------------------------------------------------------

(Updated Jan. 31, 2014, 10:23 p.m.)


Review request for qpid and Gordon Sim.


Changes
-------

This is MUCH better. It cleans up (almost) all the bogus errors on HA backup sessions without any effects on non-HA sessions. It does not introduce any new error messages, although it does now have the broker _debug_ logging incoming execution exceptions in case this is ever useful. There is one exception still showing up on the backup that I don't understand - I don't remember seeing it before so I may have introduced it somehow. I'll track it down & sort that out as well.


Bugs: QPID-5528
    https://issues.apache.org/jira/browse/QPID-5528


Repository: qpid


Description (updated)
-------

QPID-5528: HA Clean up error messages around rolled-back transactions.

A simple transaction test on a 3 node cluster generates a lot of errors and
rollback notices in the broker logs even though the test code never rolls back a
transaction. E.g.

  qpid-cluster-benchmark -b 20.0.20.200 -n1 -m 1000 -q3 -s2 -r2 --send-arg=--tx --send-arg=10 --receive-arg=--tx --receive-arg=10

The errors are caused by queues being deleted while backup brokers are using
them. This happens a lot in the transaction test because a transactional session
must create a new transaction when the previous one closes. When the session
closes the open transaction is rolled back automatically. Thus there is almost
always an empty transaction that is created then immediately rolled back at the
end of the session. Backup brokers may still be in the process of subscribing to
the transaction's replication queue at this point, causing (harmlesss) errors.

This commit takes the following steps to clean up the unwanted error and rollback messages:

HA TX messages cleaned up:
- Remove misleading "backup disconnected" message for cancelled transactions.
- Include TxReplicator destroy in QueueReplicator mutex. Idempotence check before any destroy.
- Remove spurious warning about ignored unreplicated dequeues.
- Remove log messages about rolling back/destroying empty transactions.

Allow HA to suppress/modify broker exception logging:
- Move broker exception logging into ErrorListener
  - Every SessionHandler has DefaultErrorListener that does the same logging as before.
  - Added SessionHandlerObserver to allow plugins to change the error listener.
- HA backup sets ErrorListeners on replication session bridges
  - Log exceptions that are not errors as HA debug messages, not broker errors.
- HA primary sets error listener on backup sessions via SessionHandlerObserver
  - Log exceptions as HA debug messages, not broker errors.

Unrelated cleanup:
- Remove unused DetachedCallback in Link.cpp
- Broker now logs "incoming execution exceptions" as debug messages rather than ignoring.

Backps are still logging this error message, not sure why:
  [Broker] error Channel exception: not-attached: session.detached from peer.
Will continue to investigate this, it looks like this should be suppressed by backup error listeners.


Diffs (updated)
-----

  /trunk/qpid/cpp/src/qpid/Exception.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/broker/Broker.h 1562539 
  /trunk/qpid/cpp/src/qpid/broker/Link.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/broker/SessionHandler.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/BrokerReplicator.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/Primary.h 1562539 
  /trunk/qpid/cpp/src/qpid/ha/Primary.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.h 1562539 
  /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.h 1562539 
  /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/TxReplicator.h 1562539 
  /trunk/qpid/cpp/src/qpid/ha/TxReplicator.cpp 1562539 

Diff: https://reviews.apache.org/r/17563/diff/


Testing
-------


Thanks,

Alan Conway


Re: Review Request 17563: QPID-5528: HA Clean up error messages around rolled-back transactions.

Posted by Alan Conway <ac...@redhat.com>.

> On Jan. 31, 2014, 10:58 a.m., Gordon Sim wrote:
> > /trunk/qpid/cpp/src/qpid/broker/SessionAdapter.cpp, line 548
> > <https://reviews.apache.org/r/17563/diff/1/?file=457081#file457081line548>
> >
> >     Is this always an 'error'? What about with federation when a failover exchange doesn't exist? Not a big deal, but if we can avoid logging those sorts of expected things at error its better. However if this is the cleanest way to log some important error that can't be logged elsewhere, then feel free to ignore this.

No its not an error, in the HA case either. The reason I added it is that you get "channel not attached" which is logged as an error, but there's nothing to explain *why* the channel isn't attached - so it's hard to tell if it is a real error or something benign (as in the HA or federation case). Perhaps I should make both "incoming execution exception" _and_ "channel not attached" into info messages. Sound reasonable?


- Alan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17563/#review33312
-----------------------------------------------------------


On Jan. 30, 2014, 11:14 p.m., Alan Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17563/
> -----------------------------------------------------------
> 
> (Updated Jan. 30, 2014, 11:14 p.m.)
> 
> 
> Review request for qpid and Gordon Sim.
> 
> 
> Bugs: QPID-5528
>     https://issues.apache.org/jira/browse/QPID-5528
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> QPID-5528: HA Clean up error messages around rolled-back transactions.
> 
> A simple transaction test on a 3 node cluster generates a lot of errors and rollback notices in the
> broker logs even though the test code never rolls back a transaction. E.g.
> 
>   qpid-cluster-benchmark -b 20.0.20.200 -n1 -m 1000 -q3 -s2 -r2 --send-arg=--tx --send-arg=10 --receive-arg=--tx --receive-arg=10
> 
> The errors are caused by queues being deleted while backup brokers are using
> them. This happens a lot in the transaction test because a transactional session
> must create a new transaction when the previous one closes. When the session
> closes the open transaction is rolled back automatically. Thus there is almost
> always an empty transaction that is created then immediately rolled back at the
> end of the session. Backup brokers may still be in the process of subscribing to
> the transaction's replication queue at this point, causing (harmlesss) errors.
> 
> We'd like to clean this up completely but unfortunately there are some error
> messages that are not easy to remove. This commit does the following:
> 
> - Remove misleading "backup disconnected" message for cancelled transactions.
> - Include TxReplicator destroy in QueueReplicator mutex. Idempotence check before any destroy.
> - Remove spurious warning about ignored unreplicated dequeues.
> - Remove log messages about rolling back/destroying empty transactions.
> - Log incoming execution exceptions as errors to explain subsequent session detached errors.
> 
> There should be no more unexpected log messages about rollbacks since we no
> longer log rollback of an empty transaction.
> 
> The following errors remain, they are harmless and can be ignored. They are not
> necessarily related to transactions and can occur in other situations where
> queues are deleted while still replicating.
> 
> Queue deleted, replication in progress (happens in the test, not related to TX)
> 
> On primary, the following errors:
> - [Protocol] error Incoming Execution exception: not-found: Exchange not found: qpid.ha-q:...
> - [Protocol] error Channel exception: not-attached: Channel xxx is not attached
> 
> On backup:
> - [Protocol] error Execution exception: not-found: Exchange not found: qpid.ha-q:...
> 
> Queue deleted while backup is in the process of subscribing (in the test happens
> because of empty tx)
> 
> On primary:
>  [Protocol] error Execution exception: not-found: Queue not found: qpid.ha-tx:...
> 
> On backup:
> [Broker] error Incoming execution exception: not-found: Queue not found: qpid.ha-tx:...
> 
> Note the "Incoming exception exception" error was actually added by this commit,
> since otherwise there's no way to tell the cause of the "channel not attached"
> errors.
> 
> This is not a satisfactory situation. The "harmless" error messages might
> sometimes indicate real error situations and the clutter makes debugging
> harder. I will leave this JIRA open for now and try to come up with a better
> solution.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/Exception.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/broker/SessionAdapter.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.h 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.h 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/TxReplicator.h 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/TxReplicator.cpp 1562539 
> 
> Diff: https://reviews.apache.org/r/17563/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alan Conway
> 
>


Re: Review Request 17563: QPID-5528: HA Clean up error messages around rolled-back transactions.

Posted by Gordon Sim <gs...@redhat.com>.

> On Jan. 31, 2014, 10:58 a.m., Gordon Sim wrote:
> > /trunk/qpid/cpp/src/qpid/broker/SessionAdapter.cpp, line 548
> > <https://reviews.apache.org/r/17563/diff/1/?file=457081#file457081line548>
> >
> >     Is this always an 'error'? What about with federation when a failover exchange doesn't exist? Not a big deal, but if we can avoid logging those sorts of expected things at error its better. However if this is the cleanest way to log some important error that can't be logged elsewhere, then feel free to ignore this.
> 
> Alan Conway wrote:
>     No its not an error, in the HA case either. The reason I added it is that you get "channel not attached" which is logged as an error, but there's nothing to explain *why* the channel isn't attached - so it's hard to tell if it is a real error or something benign (as in the HA or federation case). Perhaps I should make both "incoming execution exception" _and_ "channel not attached" into info messages. Sound reasonable?

Sorry, I see you had explained that in the description which I hadn't read properly! I understand the motivation of the log entry now. Though the channel-not-attached is nearly always a result of some preceding proble, I'm slightly hesitant to lower it all the way to info (which is off by default) just in case...

I would be inclined to log the execution-exception as notice and leave the channel not attached as is. Even as you have it is fine though.


- Gordon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17563/#review33312
-----------------------------------------------------------


On Jan. 30, 2014, 11:14 p.m., Alan Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17563/
> -----------------------------------------------------------
> 
> (Updated Jan. 30, 2014, 11:14 p.m.)
> 
> 
> Review request for qpid and Gordon Sim.
> 
> 
> Bugs: QPID-5528
>     https://issues.apache.org/jira/browse/QPID-5528
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> QPID-5528: HA Clean up error messages around rolled-back transactions.
> 
> A simple transaction test on a 3 node cluster generates a lot of errors and rollback notices in the
> broker logs even though the test code never rolls back a transaction. E.g.
> 
>   qpid-cluster-benchmark -b 20.0.20.200 -n1 -m 1000 -q3 -s2 -r2 --send-arg=--tx --send-arg=10 --receive-arg=--tx --receive-arg=10
> 
> The errors are caused by queues being deleted while backup brokers are using
> them. This happens a lot in the transaction test because a transactional session
> must create a new transaction when the previous one closes. When the session
> closes the open transaction is rolled back automatically. Thus there is almost
> always an empty transaction that is created then immediately rolled back at the
> end of the session. Backup brokers may still be in the process of subscribing to
> the transaction's replication queue at this point, causing (harmlesss) errors.
> 
> We'd like to clean this up completely but unfortunately there are some error
> messages that are not easy to remove. This commit does the following:
> 
> - Remove misleading "backup disconnected" message for cancelled transactions.
> - Include TxReplicator destroy in QueueReplicator mutex. Idempotence check before any destroy.
> - Remove spurious warning about ignored unreplicated dequeues.
> - Remove log messages about rolling back/destroying empty transactions.
> - Log incoming execution exceptions as errors to explain subsequent session detached errors.
> 
> There should be no more unexpected log messages about rollbacks since we no
> longer log rollback of an empty transaction.
> 
> The following errors remain, they are harmless and can be ignored. They are not
> necessarily related to transactions and can occur in other situations where
> queues are deleted while still replicating.
> 
> Queue deleted, replication in progress (happens in the test, not related to TX)
> 
> On primary, the following errors:
> - [Protocol] error Incoming Execution exception: not-found: Exchange not found: qpid.ha-q:...
> - [Protocol] error Channel exception: not-attached: Channel xxx is not attached
> 
> On backup:
> - [Protocol] error Execution exception: not-found: Exchange not found: qpid.ha-q:...
> 
> Queue deleted while backup is in the process of subscribing (in the test happens
> because of empty tx)
> 
> On primary:
>  [Protocol] error Execution exception: not-found: Queue not found: qpid.ha-tx:...
> 
> On backup:
> [Broker] error Incoming execution exception: not-found: Queue not found: qpid.ha-tx:...
> 
> Note the "Incoming exception exception" error was actually added by this commit,
> since otherwise there's no way to tell the cause of the "channel not attached"
> errors.
> 
> This is not a satisfactory situation. The "harmless" error messages might
> sometimes indicate real error situations and the clutter makes debugging
> harder. I will leave this JIRA open for now and try to come up with a better
> solution.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/Exception.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/broker/SessionAdapter.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.h 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.h 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/TxReplicator.h 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/TxReplicator.cpp 1562539 
> 
> Diff: https://reviews.apache.org/r/17563/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alan Conway
> 
>


Re: Review Request 17563: QPID-5528: HA Clean up error messages around rolled-back transactions.

Posted by Gordon Sim <gs...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17563/#review33312
-----------------------------------------------------------

Ship it!



/trunk/qpid/cpp/src/qpid/broker/SessionAdapter.cpp
<https://reviews.apache.org/r/17563/#comment62757>

    Is this always an 'error'? What about with federation when a failover exchange doesn't exist? Not a big deal, but if we can avoid logging those sorts of expected things at error its better. However if this is the cleanest way to log some important error that can't be logged elsewhere, then feel free to ignore this.


- Gordon Sim


On Jan. 30, 2014, 11:14 p.m., Alan Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17563/
> -----------------------------------------------------------
> 
> (Updated Jan. 30, 2014, 11:14 p.m.)
> 
> 
> Review request for qpid and Gordon Sim.
> 
> 
> Bugs: QPID-5528
>     https://issues.apache.org/jira/browse/QPID-5528
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> QPID-5528: HA Clean up error messages around rolled-back transactions.
> 
> A simple transaction test on a 3 node cluster generates a lot of errors and rollback notices in the
> broker logs even though the test code never rolls back a transaction. E.g.
> 
>   qpid-cluster-benchmark -b 20.0.20.200 -n1 -m 1000 -q3 -s2 -r2 --send-arg=--tx --send-arg=10 --receive-arg=--tx --receive-arg=10
> 
> The errors are caused by queues being deleted while backup brokers are using
> them. This happens a lot in the transaction test because a transactional session
> must create a new transaction when the previous one closes. When the session
> closes the open transaction is rolled back automatically. Thus there is almost
> always an empty transaction that is created then immediately rolled back at the
> end of the session. Backup brokers may still be in the process of subscribing to
> the transaction's replication queue at this point, causing (harmlesss) errors.
> 
> We'd like to clean this up completely but unfortunately there are some error
> messages that are not easy to remove. This commit does the following:
> 
> - Remove misleading "backup disconnected" message for cancelled transactions.
> - Include TxReplicator destroy in QueueReplicator mutex. Idempotence check before any destroy.
> - Remove spurious warning about ignored unreplicated dequeues.
> - Remove log messages about rolling back/destroying empty transactions.
> - Log incoming execution exceptions as errors to explain subsequent session detached errors.
> 
> There should be no more unexpected log messages about rollbacks since we no
> longer log rollback of an empty transaction.
> 
> The following errors remain, they are harmless and can be ignored. They are not
> necessarily related to transactions and can occur in other situations where
> queues are deleted while still replicating.
> 
> Queue deleted, replication in progress (happens in the test, not related to TX)
> 
> On primary, the following errors:
> - [Protocol] error Incoming Execution exception: not-found: Exchange not found: qpid.ha-q:...
> - [Protocol] error Channel exception: not-attached: Channel xxx is not attached
> 
> On backup:
> - [Protocol] error Execution exception: not-found: Exchange not found: qpid.ha-q:...
> 
> Queue deleted while backup is in the process of subscribing (in the test happens
> because of empty tx)
> 
> On primary:
>  [Protocol] error Execution exception: not-found: Queue not found: qpid.ha-tx:...
> 
> On backup:
> [Broker] error Incoming execution exception: not-found: Queue not found: qpid.ha-tx:...
> 
> Note the "Incoming exception exception" error was actually added by this commit,
> since otherwise there's no way to tell the cause of the "channel not attached"
> errors.
> 
> This is not a satisfactory situation. The "harmless" error messages might
> sometimes indicate real error situations and the clutter makes debugging
> harder. I will leave this JIRA open for now and try to come up with a better
> solution.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/Exception.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/broker/SessionAdapter.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.h 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.h 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.cpp 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/TxReplicator.h 1562539 
>   /trunk/qpid/cpp/src/qpid/ha/TxReplicator.cpp 1562539 
> 
> Diff: https://reviews.apache.org/r/17563/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alan Conway
> 
>


Re: Review Request 17563: QPID-5528: HA Clean up error messages around rolled-back transactions.

Posted by Alan Conway <ac...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17563/
-----------------------------------------------------------

(Updated Jan. 30, 2014, 11:14 p.m.)


Review request for qpid and Gordon Sim.


Bugs: QPID-5528
    https://issues.apache.org/jira/browse/QPID-5528


Repository: qpid


Description (updated)
-------

QPID-5528: HA Clean up error messages around rolled-back transactions.

A simple transaction test on a 3 node cluster generates a lot of errors and rollback notices in the
broker logs even though the test code never rolls back a transaction. E.g.

  qpid-cluster-benchmark -b 20.0.20.200 -n1 -m 1000 -q3 -s2 -r2 --send-arg=--tx --send-arg=10 --receive-arg=--tx --receive-arg=10

The errors are caused by queues being deleted while backup brokers are using
them. This happens a lot in the transaction test because a transactional session
must create a new transaction when the previous one closes. When the session
closes the open transaction is rolled back automatically. Thus there is almost
always an empty transaction that is created then immediately rolled back at the
end of the session. Backup brokers may still be in the process of subscribing to
the transaction's replication queue at this point, causing (harmlesss) errors.

We'd like to clean this up completely but unfortunately there are some error
messages that are not easy to remove. This commit does the following:

- Remove misleading "backup disconnected" message for cancelled transactions.
- Include TxReplicator destroy in QueueReplicator mutex. Idempotence check before any destroy.
- Remove spurious warning about ignored unreplicated dequeues.
- Remove log messages about rolling back/destroying empty transactions.
- Log incoming execution exceptions as errors to explain subsequent session detached errors.

There should be no more unexpected log messages about rollbacks since we no
longer log rollback of an empty transaction.

The following errors remain, they are harmless and can be ignored. They are not
necessarily related to transactions and can occur in other situations where
queues are deleted while still replicating.

Queue deleted, replication in progress (happens in the test, not related to TX)

On primary, the following errors:
- [Protocol] error Incoming Execution exception: not-found: Exchange not found: qpid.ha-q:...
- [Protocol] error Channel exception: not-attached: Channel xxx is not attached

On backup:
- [Protocol] error Execution exception: not-found: Exchange not found: qpid.ha-q:...

Queue deleted while backup is in the process of subscribing (in the test happens
because of empty tx)

On primary:
 [Protocol] error Execution exception: not-found: Queue not found: qpid.ha-tx:...

On backup:
[Broker] error Incoming execution exception: not-found: Queue not found: qpid.ha-tx:...

Note the "Incoming exception exception" error was actually added by this commit,
since otherwise there's no way to tell the cause of the "channel not attached"
errors.

This is not a satisfactory situation. The "harmless" error messages might
sometimes indicate real error situations and the clutter makes debugging
harder. I will leave this JIRA open for now and try to come up with a better
solution.


Diffs
-----

  /trunk/qpid/cpp/src/qpid/Exception.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/broker/SessionAdapter.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.h 1562539 
  /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.h 1562539 
  /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/TxReplicator.h 1562539 
  /trunk/qpid/cpp/src/qpid/ha/TxReplicator.cpp 1562539 

Diff: https://reviews.apache.org/r/17563/diff/


Testing
-------


Thanks,

Alan Conway


Re: Review Request 17563: QPID-5528: HA Clean up error messages around rolled-back transactions.

Posted by Alan Conway <ac...@redhat.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17563/
-----------------------------------------------------------

(Updated Jan. 30, 2014, 11:14 p.m.)


Review request for qpid and Gordon Sim.


Bugs: QPID-5528
    https://issues.apache.org/jira/browse/QPID-5528


Repository: qpid


Description
-------

QPID-5528: HA Clean up error messages around rolled-back transactions.

A simple transaction test generates a lot of errors and rollback notices in the
broker logs even though the test code never rolls back a transaction. E.g.

  qpid-cluster-benchmark -b 20.0.20.200 -n1 -m 1000 -q3 -s2 -r2 --send-arg=--tx --send-arg=10 --receive-arg=--tx --receive-arg=10

The errors are caused by queues being deleted while backup brokers are using
them. This happens a lot in the transaction test because a transactional session
must create a new transaction when the previous one closes. When the session
closes the open transaction is rolled back automatically. Thus there is almost
always an empty transaction that is created then immediately rolled back at the
end of the session. Backup brokers may still be in the process of subscribing to
the transaction's replication queue at this point, causing (harmlesss) errors.

We'd like to clean this up completely but unfortunately there are some error
messages that are not easy to remove. This commit does the following:

- Remove misleading "backup disconnected" message for cancelled transactions.
- Include TxReplicator destroy in QueueReplicator mutex. Idempotence check before any destroy.
- Remove spurious warning about ignored unreplicated dequeues.
- Remove log messages about rolling back/destroying empty transactions.
- Log incoming execution exceptions as errors to explain subsequent session detached errors.

There should be no more unexpected log messages about rollbacks since we no
longer log rollback of an empty transaction.

The following errors remain, they are harmless and can be ignored. They are not
necessarily related to transactions and can occur in other situations where
queues are deleted while still replicating.

Queue deleted, replication in progress (happens in the test, not related to TX)

On primary, the following errors:
- [Protocol] error Incoming Execution exception: not-found: Exchange not found: qpid.ha-q:...
- [Protocol] error Channel exception: not-attached: Channel xxx is not attached

On backup:
- [Protocol] error Execution exception: not-found: Exchange not found: qpid.ha-q:...

Queue deleted while backup is in the process of subscribing (in the test happens
because of empty tx)

On primary:
 [Protocol] error Execution exception: not-found: Queue not found: qpid.ha-tx:...

On backup:
[Broker] error Incoming execution exception: not-found: Queue not found: qpid.ha-tx:...

Note the "Incoming exception exception" error was actually added by this commit,
since otherwise there's no way to tell the cause of the "channel not attached"
errors.

This is not a satisfactory situation. The "harmless" error messages might
sometimes indicate real error situations and the clutter makes debugging
harder. I will leave this JIRA open for now and try to come up with a better
solution.


Diffs
-----

  /trunk/qpid/cpp/src/qpid/Exception.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/broker/SessionAdapter.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.h 1562539 
  /trunk/qpid/cpp/src/qpid/ha/PrimaryTxObserver.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.h 1562539 
  /trunk/qpid/cpp/src/qpid/ha/QueueReplicator.cpp 1562539 
  /trunk/qpid/cpp/src/qpid/ha/TxReplicator.h 1562539 
  /trunk/qpid/cpp/src/qpid/ha/TxReplicator.cpp 1562539 

Diff: https://reviews.apache.org/r/17563/diff/


Testing
-------


Thanks,

Alan Conway