You are viewing a plain text version of this content. The canonical link for it is here.
Posted to ojb-user@db.apache.org by OJB Dev <oj...@vidyah.com> on 2005/10/10 23:35:29 UTC

OSCache/JGroups: node can't join if coordinator is lost

Hi All,
  I am using the OSCache clustered cache impl and its working fine, with the
one exception, if I restart/redeploy the instance of tomcat that first
created the cluster (the JGroups "coordinator"), it and any other new nodes
can't join the cluster.

Versions: Tomcat 5.5.9, OJB 1.0.3, OSCache 2.1.1, JGroups 2.2.8.
OJB:
<object-cache class="org.apache.ojb.broker.cache.ObjectCacheTwoLevelImpl">
  <attribute attribute-name="cacheExcludes" attribute-value=""/>
  <attribute attribute-name="applicationCache" 
    attribute-value="org.apache.ojb.broker.cache.ObjectCacheOSCacheImpl"/>
  <attribute attribute-name="copyStrategy" 
 
attribute-value="org.apache.ojb.broker.cache.ObjectCacheTwoLevelImpl$CopyStr
ategyImpl"/>
</object-cache>

OSCache: using default multicast config for JGroups

Reproduce procedure,
1. start webapp 1
   - this instance become the JGroups "coordinator"
2. start webapp 2
3. check cluster: logs show both nodes are talking, cache notifications are
working.
4. restart webapp 2
    - OJB cache will leave and rejoin the cluster with no troubles
5. restart webapp 1
    - OJB cache leaves the cluster, but when it trys to rejoin, I get the
following error , repeated over and over.

"WARN orj.jgroups.protocols.pbcast.ClientGmsImpl join(<my ip>:<my port>)
failed, retrying"
"WARN orj.jgroups.protocols.pbcast.ClientGmsImpl join(<my ip>:<my port>)
failed, retrying"
"WARN orj.jgroups.protocols.pbcast.ClientGmsImpl join(<my ip>:<my port>)
failed, retrying"
...

    - the other node holds the "channel", but there is no "coordinator" to
accept the new node.

At this point, if I restart webapp 2, the cluster goes away, and a new one
is created and both nodes join and start talking.

Question: what am I missing so that I don't have to restart all of my other
nodes if the "coordinator" node is lost.  I wrote to the JGroups list and I
was told that I need to do something like an OSCache.close() when the webapp
closes, but how do I make OJB do that? From what I could understand from the
guys in the JGroups forum, OSCache wasn't cleaning up when it was closing
down, so I figured I had better check, does OJB have anyway to notify
OSCache when its about to close?  Is there something I should be doing in
the onDestroyContext of my webapp to make a node clean up and leave the
cluster properly.  I have been unable to find any OJB docks on the subject.

Anyways, would love to hear from someone else using the OSCache cluster
model and does or doesn't have this issue.  Thanks for any suggestions.
 
Rick Gavin


---------------------------------------------------------------------
To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
For additional commands, e-mail: ojb-user-help@db.apache.org


RE: OSCache/JGroups: node can't join if coordinator is lost

Posted by OJB Dev <oj...@vidyah.com>.
Hi Armin,
  I tried what you suggested.  Now when I close the webapp, I see in the
log....
[INFO ]
com.opensymphony.oscache.plugins.clustersupport.JavaGroupsBroadcastingListen
er JavaGroups shutting down...
[WARN ] org.jgroups.blocks.PullPushAdapter       [<null>] channel closed,
exception is ChannelClosedException
[INFO ]
com.opensymphony.oscache.plugins.clustersupport.JavaGroupsBroadcastingListen
er JavaGroups shutdown complete.

Not sure if that WARNing is a problem, I will post to the Jgroups list as
well to ask.

I'm still doing some testing to verify, since I have noticed that the actual
problem doesn't always present itself with just a simple start and stop,
seems like it happends more so after the servers have been running an there
has been some activity in the cache, like every week when I do updates to my
production servers, which is the when I really don't want it to happen.
I'll keep testing and check with the jgroups and maybe the oscache board to
see if they have any comments as well.  Thanks so much for you help.

As a note, is there a way in code to check to see what the current Cache
type is for the default broker?  I want to put this test hack in my code
base to get it tested out, but some of the nodes don't use clustering and I
would like to be selective about calling that new method if its not using
the oscache type.   Otherwise, calling it will try to fire up the clustered
cache, only to then destroy it.

Thanks again,
Rick

-----Original Message-----
From: Armin Waibel [mailto:arminw@apache.org] 
Posted At: Monday, October 10, 2005 3:39 PM
Posted To: OJB Dev
Conversation: OSCache/JGroups: node can't join if coordinator is lost
Subject: Re: OSCache/JGroups: node can't join if coordinator is lost


Hi Rick,

in OScache I can find a #destroy() method. Is this call needed to force
OSCache to do a node cleanup? If true, this call is needed on
onDestroyContext of your webapp.

Thus you need access to the application cache (ObjectCacheOSCacheImpl)
instance used by ObjectCacheTwoLevelImpl or we should provide a shutdown
method.
Currently this is not possible but we can fix this for next upcoming
release. Internally OJB use a ObjectCacheInternal interface, so we can add a
shutdown/destroy method. In ObjectCacheOSCacheImpl this method will call
GeneralCacheAdministrator#destroy().
Also in next release OJB will provide a PBF#shutdown() method. Then OJB can
internally cleanup resources (and this will call shutdown on all caches
too).

To check this please test this dirty hack in webapp #onDestroyContext:

- Add a public #getGeneralCacheAdministrator() method to
ObjectCacheOSCacheImpl.
- since we use a static variable for the GeneralCacheAdministrator instance
you can do

(new ObjectCacheOSCacheImpl(null,
null)).getGeneralCacheAdministrator().destroy();

HTH
regards,
Armin


OJB Dev wrote:
> Hi All,
>   I am using the OSCache clustered cache impl and its working fine, 
> with the one exception, if I restart/redeploy the instance of tomcat 
> that first created the cluster (the JGroups "coordinator"), it and any 
> other new nodes can't join the cluster.
> 
> Versions: Tomcat 5.5.9, OJB 1.0.3, OSCache 2.1.1, JGroups 2.2.8.
> OJB:
> <object-cache class="org.apache.ojb.broker.cache.ObjectCacheTwoLevelImpl">
>   <attribute attribute-name="cacheExcludes" attribute-value=""/>
>   <attribute attribute-name="applicationCache" 
>     attribute-value="org.apache.ojb.broker.cache.ObjectCacheOSCacheImpl"/>
>   <attribute attribute-name="copyStrategy" 
>  
> attribute-value="org.apache.ojb.broker.cache.ObjectCacheTwoLevelImpl$C
> opyStr
> ategyImpl"/>
> </object-cache>
> 
> OSCache: using default multicast config for JGroups
> 
> Reproduce procedure,
> 1. start webapp 1
>    - this instance become the JGroups "coordinator"
> 2. start webapp 2
> 3. check cluster: logs show both nodes are talking, cache 
> notifications are working.
> 4. restart webapp 2
>     - OJB cache will leave and rejoin the cluster with no troubles 5. 
> restart webapp 1
>     - OJB cache leaves the cluster, but when it trys to rejoin, I get 
> the following error , repeated over and over.
> 
> "WARN orj.jgroups.protocols.pbcast.ClientGmsImpl join(<my ip>:<my 
> port>) failed, retrying"
> "WARN orj.jgroups.protocols.pbcast.ClientGmsImpl join(<my ip>:<my 
> port>) failed, retrying"
> "WARN orj.jgroups.protocols.pbcast.ClientGmsImpl join(<my ip>:<my 
> port>) failed, retrying"
> ...
> 
>     - the other node holds the "channel", but there is no 
> "coordinator" to accept the new node.
> 
> At this point, if I restart webapp 2, the cluster goes away, and a new 
> one is created and both nodes join and start talking.
> 
> Question: what am I missing so that I don't have to restart all of my 
> other nodes if the "coordinator" node is lost.  I wrote to the JGroups 
> list and I was told that I need to do something like an 
> OSCache.close() when the webapp closes, but how do I make OJB do that? 
> From what I could understand from the guys in the JGroups forum, 
> OSCache wasn't cleaning up when it was closing down, so I figured I 
> had better check, does OJB have anyway to notify OSCache when its 
> about to close?  Is there something I should be doing in the 
> onDestroyContext of my webapp to make a node clean up and leave the
cluster properly.  I have been unable to find any OJB docks on the subject.
> 
> Anyways, would love to hear from someone else using the OSCache 
> cluster model and does or doesn't have this issue.  Thanks for any
suggestions.
>  
> Rick Gavin
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
> For additional commands, e-mail: ojb-user-help@db.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
For additional commands, e-mail: ojb-user-help@db.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
For additional commands, e-mail: ojb-user-help@db.apache.org


Re: OSCache/JGroups: node can't join if coordinator is lost

Posted by Armin Waibel <ar...@apache.org>.
Hi Rick,

in OScache I can find a #destroy() method. Is this call needed to force 
OSCache to do a node cleanup? If true, this call is needed on 
onDestroyContext of your webapp.

Thus you need access to the application cache (ObjectCacheOSCacheImpl) 
instance used by ObjectCacheTwoLevelImpl or we should provide a shutdown 
method.
Currently this is not possible but we can fix this for next upcoming 
release. Internally OJB use a ObjectCacheInternal interface, so we can 
add a shutdown/destroy method. In ObjectCacheOSCacheImpl this method 
will call GeneralCacheAdministrator#destroy().
Also in next release OJB will provide a PBF#shutdown() method. Then OJB 
can internally cleanup resources (and this will call shutdown on all 
caches too).

To check this please test this dirty hack in webapp #onDestroyContext:

- Add a public #getGeneralCacheAdministrator() method to 
ObjectCacheOSCacheImpl.
- since we use a static variable for the GeneralCacheAdministrator 
instance you can do

(new ObjectCacheOSCacheImpl(null, 
null)).getGeneralCacheAdministrator().destroy();

HTH
regards,
Armin


OJB Dev wrote:
> Hi All,
>   I am using the OSCache clustered cache impl and its working fine, with the
> one exception, if I restart/redeploy the instance of tomcat that first
> created the cluster (the JGroups "coordinator"), it and any other new nodes
> can't join the cluster.
> 
> Versions: Tomcat 5.5.9, OJB 1.0.3, OSCache 2.1.1, JGroups 2.2.8.
> OJB:
> <object-cache class="org.apache.ojb.broker.cache.ObjectCacheTwoLevelImpl">
>   <attribute attribute-name="cacheExcludes" attribute-value=""/>
>   <attribute attribute-name="applicationCache" 
>     attribute-value="org.apache.ojb.broker.cache.ObjectCacheOSCacheImpl"/>
>   <attribute attribute-name="copyStrategy" 
>  
> attribute-value="org.apache.ojb.broker.cache.ObjectCacheTwoLevelImpl$CopyStr
> ategyImpl"/>
> </object-cache>
> 
> OSCache: using default multicast config for JGroups
> 
> Reproduce procedure,
> 1. start webapp 1
>    - this instance become the JGroups "coordinator"
> 2. start webapp 2
> 3. check cluster: logs show both nodes are talking, cache notifications are
> working.
> 4. restart webapp 2
>     - OJB cache will leave and rejoin the cluster with no troubles
> 5. restart webapp 1
>     - OJB cache leaves the cluster, but when it trys to rejoin, I get the
> following error , repeated over and over.
> 
> "WARN orj.jgroups.protocols.pbcast.ClientGmsImpl join(<my ip>:<my port>)
> failed, retrying"
> "WARN orj.jgroups.protocols.pbcast.ClientGmsImpl join(<my ip>:<my port>)
> failed, retrying"
> "WARN orj.jgroups.protocols.pbcast.ClientGmsImpl join(<my ip>:<my port>)
> failed, retrying"
> ...
> 
>     - the other node holds the "channel", but there is no "coordinator" to
> accept the new node.
> 
> At this point, if I restart webapp 2, the cluster goes away, and a new one
> is created and both nodes join and start talking.
> 
> Question: what am I missing so that I don't have to restart all of my other
> nodes if the "coordinator" node is lost.  I wrote to the JGroups list and I
> was told that I need to do something like an OSCache.close() when the webapp
> closes, but how do I make OJB do that? From what I could understand from the
> guys in the JGroups forum, OSCache wasn't cleaning up when it was closing
> down, so I figured I had better check, does OJB have anyway to notify
> OSCache when its about to close?  Is there something I should be doing in
> the onDestroyContext of my webapp to make a node clean up and leave the
> cluster properly.  I have been unable to find any OJB docks on the subject.
> 
> Anyways, would love to hear from someone else using the OSCache cluster
> model and does or doesn't have this issue.  Thanks for any suggestions.
>  
> Rick Gavin
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
> For additional commands, e-mail: ojb-user-help@db.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: ojb-user-unsubscribe@db.apache.org
For additional commands, e-mail: ojb-user-help@db.apache.org