You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Rajith Attapattu (JIRA)" <ji...@apache.org> on 2011/04/04 15:46:05 UTC

[jira] [Updated] (QPID-3177) JMS Failover Not Working

     [ https://issues.apache.org/jira/browse/QPID-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajith Attapattu updated QPID-3177:
-----------------------------------

      Description: 
The JMS failover is not working due to a regression introduced in rev 1071631.
More specifically the reason is the following line being removed.

@@ -257,12 +256,14 @@
         ConnectionClose close = exc.getClose();
         if (close == null)
         {
+            _conn.getProtocolHandler().setFailoverLatch(new CountDownLatch(1));
+            
             try
             {
                 if (_conn.firePreFailover(false) && _conn.attemptReconnection())
                 {
                     _conn.failoverPrep();
-                    _qpidConnection.resume();   ---- > This line is removed in this commit.
+                    _conn.resubscribeSessions();
                     _conn.fireFailoverComplete();
                     return;
                 }

On the surface this seems like a unintended omission. However I'd like to investigate it further and find out if this was intentional.
It if it was, then what was the reason behind it? 

The obvious fix is to re-introduce this line. I have done so and preliminary tests indicates that it will resolve the issue. I haven't seen any other issues.
However I'd like to look at more information regarding the context of the above change before making a final decision.

  was:
The JMS failover is not working due to a regression introduced in rev 1071631.
More specifically the reason is the following line being removed.

@@ -257,12 +256,14 @@
         ConnectionClose close = exc.getClose();
         if (close == null)
         {
+            _conn.getProtocolHandler().setFailoverLatch(new CountDownLatch(1));
+            
             try
             {
                 if (_conn.firePreFailover(false) && _conn.attemptReconnection())
                 {
                     _conn.failoverPrep();
-                    _qpidConnection.resume();   ---- > This line is removed in this comment.
+                    _conn.resubscribeSessions();
                     _conn.fireFailoverComplete();
                     return;
                 }

On the surface this seems like a unintended omission. However I'd like to investigate it further and find out if this was intentional.
It if it was, then what was the reason behind it? 

The obvious fix is to re-introduce this line. I have done so and preliminary tests indicates that it will resolve the issue. I haven't seen any other issues.
However I'd like to look at more information regarding the context of the above change before making a final decision.

    Fix Version/s:     (was: 0.11)
                   0.10

The failover issue was noticed before (due to test failures) and it was on my todo list to investigate.
But there was a steady stream of issues that I had to deal with, hence the delay in getting to it.

I have spent a lot of time working on the failover/transport code for several bugs Ex. QPID-3043, QPID-3042, QPID-2876,QPID-2861, QPID-2808, QPID-2994 ..etc
The failover code is probably one of more fragile areas in the code base to say the least.
Some of these issues cannot be tested with automated test cases due to various limitations, hence difficult to catch certain regressions.
I need to get the python testkit working properly as it's a lot more flexible than the ant tests seems to be able to reproduce a few of the client and broker bugs we had in the past.

I plan to retest the failover area manually for some of the above issues. Post release I am planning to get the testkit running more frequently.
I have tested this particular fix, but also plan more comprehensive tests around this area.


> JMS Failover Not Working
> ------------------------
>
>                 Key: QPID-3177
>                 URL: https://issues.apache.org/jira/browse/QPID-3177
>             Project: Qpid
>          Issue Type: Bug
>    Affects Versions: 0.10
>            Reporter: Rajith Attapattu
>            Assignee: Robbie Gemmell
>            Priority: Blocker
>             Fix For: 0.10
>
>
> The JMS failover is not working due to a regression introduced in rev 1071631.
> More specifically the reason is the following line being removed.
> @@ -257,12 +256,14 @@
>          ConnectionClose close = exc.getClose();
>          if (close == null)
>          {
> +            _conn.getProtocolHandler().setFailoverLatch(new CountDownLatch(1));
> +            
>              try
>              {
>                  if (_conn.firePreFailover(false) && _conn.attemptReconnection())
>                  {
>                      _conn.failoverPrep();
> -                    _qpidConnection.resume();   ---- > This line is removed in this commit.
> +                    _conn.resubscribeSessions();
>                      _conn.fireFailoverComplete();
>                      return;
>                  }
> On the surface this seems like a unintended omission. However I'd like to investigate it further and find out if this was intentional.
> It if it was, then what was the reason behind it? 
> The obvious fix is to re-introduce this line. I have done so and preliminary tests indicates that it will resolve the issue. I haven't seen any other issues.
> However I'd like to look at more information regarding the context of the above change before making a final decision.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org