You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by Rajith Attapattu <ra...@gmail.com> on 2010/08/03 15:24:00 UTC

Re: 0-10 Session Close and Failover

Andrew,

The change has actually caused issues with the JMS client failover
against the C++ cluster.
I would appreciate if you could revert the change until we figure out
a proper fix.
Rafi have suggested a few alternatives, perhaps it's best if one of
those approaches are investigated.

Rajith

On Mon, Jul 26, 2010 at 12:18 PM, Andrew Kennedy
<an...@gmail.com> wrote:
> Ok,
>
> So, the result of what I changed is that the 0-10 session expiry is
> correctly propagated via sessionRequestTimeout and sessionTimeout
> messages, however you are saying that this will not work correctly? If
> I understand things correctly, when the client creates a session and
> negotiates an N > 0 second expiry (or timeout) with the broker, this
> successful negitiation will mislead a well-behaved client into
> thinking it can resume a session in a way that is currently not
> possible?
>
> I can certainly revert the change if it is likely to cause any issues;
> I think I was misled by there not being a handy test case that I could
> look at, and I'm not sure it would be possible to create one using the
> Java broker. As people have mentioned, more CI would certainly make
> this kind of thing easier - It took me longer than I would have
> expected just to get the Java client tests running against the C++
> broker...
>
> Cheers,
> Andrew.
> --
> -- andrew d kennedy ? edinburgh : +44 7941 197 134
>
>
>
> On 26 July 2010 16:03, Rafael Schloming <ra...@redhat.com> wrote:
>> Andrew Kennedy wrote:
>>>
>>> Hi.
>>>
>>> I have been looking at the 0-10 session close semantics, and been
>>> meaning to ask this for a while...
>>>
>>> There is no explicit close message in 0-10, rather the session timeout
>>> is supposed to be set to 0 seconds and then a session detach message
>>> is sent. I have implemented this, since it requires simply creating a
>>> handler for sessionRequestTimeout messages that actually sets expiry
>>> to 0, as discussed on QPID-2586 and in this message on the dev list:
>>>
>>> http://apache-qpid-developers.2158895.n2.nabble.com/Java-0-10-Session-closure-expiry-timeout-td4865744.html#a4865744
>>>
>>> Also, see comments in the code (in o.a.q.t.Session.java):
>>>
>>> // XXX: when the broker and client support full session
>>> // recovery we should use expiry as the requested timeout
>>>
>>> // XXX: we manually set the expiry to zero here to
>>> // simulate full session recovery in brokers that don't
>>> // support it, we should remove this line when there is
>>> // broker support for full session resume:
>>> expiry = 0;
>>>
>>> which is exactly what I have done, I use the expiry field (therefore
>>> on session create the timeout is set to 1 on both sides now) and (in
>>> o.a.q.t.SessionDelegate.java) we have:
>>>
>>> // XXX: we ignore this right now, we should uncomment this
>>> // when full session resume is supported:
>>>
>>> where I have uncommented the line following, which  just calls
>>> ssn.setExpiry(t.getTimeout()) and done the same  in the new handler
>>> for sessionRequestTimeout.
>>>
>>> I believe that this is a fairly simple and non-contentious change. One
>>> thing that I noticed from other dev list messages is that this could
>>> affect the failover mechanism used by the C++ clients. I ran the cpp
>>> test suite with my changes and noticed no difference, but then I
>>> couldn't see what tests would actually be affected. Can anyone shged
>>> more light on the problems I might expect or be missing?
>>
>> Sorry for not commenting on this sooner, but until being prompted by a few
>> conversations, I only had a very vague recollection of this area.
>>
>> I think this change is going to be problematic for the C++ broker (and
>> possibly for the Java broker too).
>>
>> The reason the 0-10 session only ever advertises zero as the timeout is
>> because it does not do frame level replay. It does a semantic resume. If it
>> advertises a non-zero timeout then it will end up doing a semantic resume
>> against a session that is expecting frame level replay, and that won't be
>> happy.
>>
>> --Rafael
>>
>> ---------------------------------------------------------------------
>> Apache Qpid - AMQP Messaging Implementation
>> Project:      http://qpid.apache.org
>> Use/Interact: mailto:dev-subscribe@qpid.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:dev-subscribe@qpid.apache.org
>
>



-- 
Regards,

Rajith Attapattu
Red Hat
http://rajith.2rlabs.com/

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Re: 0-10 Session Close and Failover

Posted by Alan Conway <ac...@redhat.com>.
On 08/03/2010 11:37 AM, Andrew Kennedy wrote:
> On 3 August 2010 14:24, Rajith Attapattu<ra...@gmail.com>  wrote:
>> Andrew,
>>
>> The change has actually caused issues with the JMS client failover against the C++ cluster.
>> I would appreciate if you could revert the change until we figure out a proper fix.
>> Rafi have suggested a few alternatives, perhaps it's best if one of those approaches are investigated.
>>
>> Rajith
>
> Hi.
>
> Can you elaborate on what the issue is, as I didn't see any of the cpp
> profile tests failing? If you had a test that illustrated the problem
> it would make it easier to understand your issue. As it is, it will be
> a simple matter to revert the change, so I will do this as soon as I
> have time.
>

The issue is that the C++ broker had incomplete support for session resume, and 
we don't support it at all on the client side. If you opened a session with at 
timeout, then killed the node, the session would continue to live on the other 
nodes which caused conflicts when the client attempted to fail over and 
re-create a session with the same name.

I've fixed this on the broker side, the broker now sets a timeout of 0 
regardlesss of what the client requests. If/when we support session resume fully 
we can re-enable timeouts. It might still be worth changing the java code for 
clarity as the timeout isn't really being set.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Re: 0-10 Session Close and Failover

Posted by Rajith Attapattu <ra...@gmail.com>.
On Fri, Aug 13, 2010 at 11:05 AM, Andrew Kennedy
<an...@gmail.com> wrote:
> On 3 August 2010 21:52, Rajith Attapattu <ra...@gmail.com> wrote:
>> The java-cpp-cluster test profile was hanging due to your checkin,
>> since a non zero timeout causes stale sessions to interfere with
>> proper failover.
>> [...]
>>  The test is "testFailoverInALoop" in FailoverTest.java
>
> Hi,
>
> Actually, even after building everything in the cpp folder on a
> machine I have access to (where such things work) I was unable to run
> the cpp.cluster test profile. Is there some special thing I ought to
> be doing, over and above the obvious?

Are you using RHEL 4/5 or Fedora 10 or higher?
For RHEL 4/5 you need to configure openais
For Fedora 10 and higher you need to configure corosync.

You also need the proper devel packages at build time, or else the
cluster module will not be built.
You can check the <cpp-folder>/src/.libs to see if you have
cluster.so, if not that means you don't have the packages installed.

Let me know which OS you are using and I can help you get the env going.
Btw the cpp.cluster profile is still failing.

Rajith

> Thanks,
> Andrew.
> --
> -- andrew d kennedy ? edinburgh : +44 7941 197 134
>
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:dev-subscribe@qpid.apache.org
>
>



-- 
Regards,

Rajith Attapattu
Red Hat
http://rajith.2rlabs.com/

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Re: 0-10 Session Close and Failover

Posted by Andrew Kennedy <an...@gmail.com>.
On 3 August 2010 21:52, Rajith Attapattu <ra...@gmail.com> wrote:
> The java-cpp-cluster test profile was hanging due to your checkin,
> since a non zero timeout causes stale sessions to interfere with
> proper failover.
> [...]
>  The test is "testFailoverInALoop" in FailoverTest.java

Hi,

Actually, even after building everything in the cpp folder on a
machine I have access to (where such things work) I was unable to run
the cpp.cluster test profile. Is there some special thing I ought to
be doing, over and above the obvious?

Thanks,
Andrew.
--
-- andrew d kennedy ? edinburgh : +44 7941 197 134

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Re: 0-10 Session Close and Failover

Posted by Gordon Sim <gs...@redhat.com>.
Thanks! Change committed.

On 08/05/2010 10:06 AM, Robbie Gemmell wrote:
> Thats one of the changes id do when making any updates to the clients
> use of the 'expiry' variable based on Rafi's comments, so no
> objections from me to the commit.
>
> Robbie
>
> On 5 August 2010 09:41, Robert Godfrey<ro...@gmail.com>  wrote:
>> [snip]
>>
>>>
>>> Any objections if I commit the patch below?
>>>
>>> As far as I can see it will not result in any difference on the client side
>>> (it doesn't alter the value of the expiry variable) and given the recent
>>> change to the broker won't affect the broker on trunk either (which will
>>> assume a timeout of 0 regardless of the requested value).
>>>
>>> Index: java/common/src/main/java/org/apache/qpid/transport/Session.java
>>> ===================================================================
>>> --- java/common/src/main/java/org/apache/qpid/transport/Session.java
>>> (revision 982137)
>>> +++ java/common/src/main/java/org/apache/qpid/transport/Session.java
>>> (working copy)
>>> @@ -237,7 +237,7 @@
>>>      {
>>>          initReceiver();
>>>          sessionAttach(name.getBytes());
>>> -        sessionRequestTimeout(expiry);
>>> +        sessionRequestTimeout(0);//use expiry here only if/when session
>>> resume is supported
>>>      }
>>>
>>>      void resume()
>>>
>>
>> I think this patch makes perfect sense, so no objections from my side
>>
>> -- Rob
>>
>> ---------------------------------------------------------------------
>> Apache Qpid - AMQP Messaging Implementation
>> Project:      http://qpid.apache.org
>> Use/Interact: mailto:dev-subscribe@qpid.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:dev-subscribe@qpid.apache.org
>


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Re: 0-10 Session Close and Failover

Posted by Robbie Gemmell <ro...@gmail.com>.
Thats one of the changes id do when making any updates to the clients
use of the 'expiry' variable based on Rafi's comments, so no
objections from me to the commit.

Robbie

On 5 August 2010 09:41, Robert Godfrey <ro...@gmail.com> wrote:
> [snip]
>
>>
>> Any objections if I commit the patch below?
>>
>> As far as I can see it will not result in any difference on the client side
>> (it doesn't alter the value of the expiry variable) and given the recent
>> change to the broker won't affect the broker on trunk either (which will
>> assume a timeout of 0 regardless of the requested value).
>>
>> Index: java/common/src/main/java/org/apache/qpid/transport/Session.java
>> ===================================================================
>> --- java/common/src/main/java/org/apache/qpid/transport/Session.java
>> (revision 982137)
>> +++ java/common/src/main/java/org/apache/qpid/transport/Session.java
>> (working copy)
>> @@ -237,7 +237,7 @@
>>     {
>>         initReceiver();
>>         sessionAttach(name.getBytes());
>> -        sessionRequestTimeout(expiry);
>> +        sessionRequestTimeout(0);//use expiry here only if/when session
>> resume is supported
>>     }
>>
>>     void resume()
>>
>
> I think this patch makes perfect sense, so no objections from my side
>
> -- Rob
>
> ---------------------------------------------------------------------
> Apache Qpid - AMQP Messaging Implementation
> Project:      http://qpid.apache.org
> Use/Interact: mailto:dev-subscribe@qpid.apache.org
>
>

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Re: 0-10 Session Close and Failover

Posted by Robert Godfrey <ro...@gmail.com>.
[snip]

>
> Any objections if I commit the patch below?
>
> As far as I can see it will not result in any difference on the client side
> (it doesn't alter the value of the expiry variable) and given the recent
> change to the broker won't affect the broker on trunk either (which will
> assume a timeout of 0 regardless of the requested value).
>
> Index: java/common/src/main/java/org/apache/qpid/transport/Session.java
> ===================================================================
> --- java/common/src/main/java/org/apache/qpid/transport/Session.java
> (revision 982137)
> +++ java/common/src/main/java/org/apache/qpid/transport/Session.java
> (working copy)
> @@ -237,7 +237,7 @@
>     {
>         initReceiver();
>         sessionAttach(name.getBytes());
> -        sessionRequestTimeout(expiry);
> +        sessionRequestTimeout(0);//use expiry here only if/when session
> resume is supported
>     }
>
>     void resume()
>

I think this patch makes perfect sense, so no objections from my side

-- Rob

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Re: 0-10 Session Close and Failover

Posted by Gordon Sim <gs...@redhat.com>.
On 08/04/2010 10:25 PM, Andrew Kennedy wrote:
> On 3 Aug 2010, at 21:52, Rajith Attapattu wrote:
>> On Tue, Aug 3, 2010 at 11:37 AM, Andrew Kennedy
>> <an...@gmail.com> wrote:
>>> Can you elaborate on what the issue is, as I didn't see any of the cpp
>>> profile tests failing?
>>
>> The java-cpp-cluster test profile was hanging due to your checkin,
>> since a non zero timeout causes stale sessions to interfere with
>> proper failover.
>> Alan Conway made a commit at rev 981933 to ignore non zero timeouts,
>> so that issue is now gone.
>> So now there is no real hurry to backout the change.

What's there is actually wrong for the clients perspective as well 
however, as my understanding is that the client does not attempt to 
correctly resume the session either. (From Rafi's earlier comment: "I'm 
just saying that the client never does frame replay and so should never 
advertise a non-zero session timeout.").

The fix to the broker to refuse to accept the non-zero timeout doesn't 
help anyone trying to use the client from trunk against an older broker. 
I know of at least one person trying to do this, so I'd like to get a 
fix promptly.

Any objections if I commit the patch below?

As far as I can see it will not result in any difference on the client 
side (it doesn't alter the value of the expiry variable) and given the 
recent change to the broker won't affect the broker on trunk either 
(which will assume a timeout of 0 regardless of the requested value).

Index: java/common/src/main/java/org/apache/qpid/transport/Session.java
===================================================================
--- java/common/src/main/java/org/apache/qpid/transport/Session.java 
(revision 982137)
+++ java/common/src/main/java/org/apache/qpid/transport/Session.java 
(working copy)
@@ -237,7 +237,7 @@
      {
          initReceiver();
          sessionAttach(name.getBytes());
-        sessionRequestTimeout(expiry);
+        sessionRequestTimeout(0);//use expiry here only if/when session 
resume is supported
      }

      void resume()


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Re: 0-10 Session Close and Failover

Posted by Andrew Kennedy <an...@gmail.com>.
On 3 Aug 2010, at 21:52, Rajith Attapattu wrote:
> On Tue, Aug 3, 2010 at 11:37 AM, Andrew Kennedy  
> <an...@gmail.com> wrote:
>> Can you elaborate on what the issue is, as I didn't see any of the  
>> cpp
>> profile tests failing?
>
> The java-cpp-cluster test profile was hanging due to your checkin,
> since a non zero timeout causes stale sessions to interfere with
> proper failover.
> Alan Conway made a commit at rev 981933 to ignore non zero timeouts,
> so that issue is now gone.
> So now there is no real hurry to backout the change.

Understood.

In that case, I'll leave it in at the moment. I will be discussing  
Rafi's suggestions with the other developers in my team tomorrow, so  
I'll post a summary to the list for everyone, to make sure we're  
going in the right direction.

> However as Rafi pointed out, there is no point in having non zero
> timeouts as neither the c++ broker nor the java broker (it still
> doesn't even have clustering) implements full session resume.
> So perhaps it's best to consider the alternative Rafi suggested.
>
>> If you had a test that illustrated the problem
>> it would make it easier to understand your issue.
>
> The test is "testFailoverInALoop" in FailoverTest.java


Thanks,

I didn't notice and/or run that profile, grrr... I had *thought* i  
ran all the available test profiles, but I missed that one.

There are five cpp profiles, plus another five (or seven, depending  
on the backing store I use) java profiles, making twelve possible  
profiles - each one taking from fifteen minutes to an hour each to  
run on my PC- so it's easy to miss one! I think that means automated  
continuous integration based testing is *really* needed, like Rob and  
Rafi suggested. Depending on the speed of the CI server, I suppose  
this would probably have to be run overnight, just once a day?

Actually, another problem I had was just *building* the cpp broker  
from trunk. I got stuck in package dependency hell with umpteen  
versions of 'boost' and 'cmake' on my RHEL4 box and couldn't (still  
can't) build the broker. On a 64bit system with RHEL5 things were  
easier, it just wasn't my box ;( I do have a working RHEL4 (32bit)  
build now, which is what I tested with, but I don't think it should  
have taken me so much effort to set up. Is there anything I'm doing  
wrong or I should know about? I can try again and supply error  
messages if that would help, as I'd like to test against the most up- 
to date binaries.

I'll obviously make sure *all* the cpp profiles' tests are passing ok  
before any major 0-10 client-side check-ins, next time ;)

Cheers,
Andrew.
-- 
-- andrew d kennedy ? do not fold, bend, spindle, or mutilate ;
-- http://grkvlt.blogspot.com/ ? edinburgh : +44 7941 197 134 ;

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Re: 0-10 Session Close and Failover

Posted by Andrew Kennedy <an...@gmail.com>.
On 3 August 2010 14:24, Rajith Attapattu <ra...@gmail.com> wrote:
> Andrew,
>
> The change has actually caused issues with the JMS client failover against the C++ cluster.
> I would appreciate if you could revert the change until we figure out a proper fix.
> Rafi have suggested a few alternatives, perhaps it's best if one of those approaches are investigated.
>
> Rajith

Hi.

Can you elaborate on what the issue is, as I didn't see any of the cpp
profile tests failing? If you had a test that illustrated the problem
it would make it easier to understand your issue. As it is, it will be
a simple matter to revert the change, so I will do this as soon as I
have time.

Andrew.
--
-- andrew d kennedy ? edinburgh : +44 7941 197 134

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org