You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@river.apache.org by "Chris Dolan (JIRA)" <ji...@apache.org> on 2010/04/20 18:04:50 UTC

[jira] Created: (RIVER-337) Attempted discard of unknown registrar kills LookupLocatorDiscovery thread

Attempted discard of unknown registrar kills LookupLocatorDiscovery thread
--------------------------------------------------------------------------

                 Key: RIVER-337
                 URL: https://issues.apache.org/jira/browse/RIVER-337
             Project: River
          Issue Type: Bug
          Components: net_jini_discovery, net_jini_lookup
    Affects Versions: AR1, jtsk_2.1
            Reporter: Chris Dolan


The method
   net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(DiscoveryEvent)
has the following code that throws a RuntimeException (the code comment suggests that it is supposed to be impossible, but it's not).

        ProxyReg reg = findReg(proxys[i]);
        if(reg != null ) { // this check can be removed.
            proxyRegSet.remove(proxyRegSet.indexOf(reg));
            drops.add(reg);
        } else {
            throw new RuntimeException("discard error");
        }//endif

Our QA does failover testing with two servers, each with a Reggie, where we deliberately crash and reboot server 1 then server 2 every 30 minutes continuously.  In one case, we hit that RuntimeException.  I don't know why we got a null reg (that's a problem for another defect, maybe an undiagnosed race of two discards put on a task queue?  Maybe related to RIVER-37?).  But it caused a catastrophic chain of events because the RuntimeException is not caught anywhere up the stack.  In our case, it killed the LookupLocatorDiscovery$Notifier thread.

java.lang.RuntimeException: discard error
	at net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(2639)
	at net.jini.discovery.LookupDiscoveryManager.notifyListener(1375)
	at net.jini.discovery.LookupDiscoveryManager.notifyListener(1356)
	at net.jini.discovery.LookupDiscoveryManager.access$500(92)
	at net.jini.discovery.LookupDiscoveryManager$LocatorDiscoveryListener.discarded(543)
	at net.jini.discovery.LookupLocatorDiscovery$Notifier.run(650)

I propose three changes:

  1) change the discarded() method above to simply warn instead of throwing
  2) put a try/catch(Throwable) around the listener invocation in 
     LookupLocatorDiscovery$Notifier.run()
  3) put a similar try/catch around listener invocation in LookupDiscoveryManager.notifyListener

The idea behind #2 and #3 is that misbehaving listeners should not be allowed to derail the discovery process.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


RE: [jira] Created: (RIVER-337) Attempted discard of unknown registrar kills LookupLocatorDiscovery thread

Posted by Christopher Dolan <ch...@avid.com>.
Tom,

I agree with your conclusion that I'm getting too many discard events.
To counter this, I think the ServiceDiscoveryManager code should be more
robust in the face of multiple events, because those events are
certainly asynchronous and unpredictable.  The goal of RIVER-337 is a
small increase in robustness.

No, we're not using initialLookupLocators in RegistrarImpl, nor are we
using add/setLookupLocator.  We only use locators in
LookupDiscoveryManager (and thus ServiceDiscoveryManager) for client
code.  Our Reggies are generally unaware of each other.

Chris

-----Original Message-----
From: Tom Hobbs [mailto:tvhobbs@googlemail.com] 
Sent: Wednesday, April 21, 2010 9:32 AM
To: river-dev@incubator.apache.org
Subject: Re: [jira] Created: (RIVER-337) Attempted discard of unknown
registrar kills LookupLocatorDiscovery thread

Yes, you're right.  I got the wrong end of the stick - wouldn't be the
first
time.

Your problem is that you're getting too many events.  Whereas River-52
is
the opposite, a disappearing reggie (or slow-to respond listener)
results in
all lookup events timing out.  Is that correct?

In your reggie configs, are you setting the initialLookupLocators array
for
instance to point to the other?  I've experienced problems when doing
this,
in this situation it seems like each reggie chats to the other which can
cause problems when one of them disappears.

Maybe someone who knows more about the ins and outs of reggie can
comment?



On Wed, Apr 21, 2010 at 3:18 PM, Christopher Dolan <
christopher.dolan@avid.com> wrote:

> Unless I've misunderstood RIVER-52, I can't see how it could be
related
> to my bug because that's on the Reggie side.
>
> My scenario is that we have a Reggie on Server#1 and a Reggie on
> Server#2 and when we shut down Server#1, then a *client* also running
on
> Server#2 gets this error.  That client is still in contact with the
> Reggie on Server#2, but appears to be executing two discards for
> Reggie#1.
>
> I just had a new thought: could it be that we're getting one discard
> from LookupDiscovery and one from LookupLocatorDiscovery that are
> coincidentally firing at the same time when Reggie#1 shuts down?  I
know
> for a fact that my two servers are on the same subnet and that they
also
> have locators pointing at each other, so both discovery mechanisms are
> active in the client.
>
> Thanks for your attention.
> Chris
>
> -----Original Message-----
> From: Tom Hobbs [mailto:tvhobbs@googlemail.com]
> Sent: Wednesday, April 21, 2010 6:22 AM
> To: river-dev@incubator.apache.org
> Subject: Re: [jira] Created: (RIVER-337) Attempted discard of unknown
> registrar kills LookupLocatorDiscovery thread
>
> There's a related problem on busy networks when reggies disappear; see
> RIVER-52.  Although (probably) not the cause of Chris' trouble it's in
a
> related area... I think.
>
> I sure that the problem described in RIVER-52 exists because I've
> encountered it in the wild, but I'm still having trouble reproducing
it
> at
> will.  I can take a look at both of these when time allows.
>
> Peter; I don't know about DiscoveryEvent's fields.  I can't think of
any
> reason off the top of my head as to why they can't be made protected.
I
> remember a little while ago you were talking about MarshalledInstance
> for
> some reason.  (Or did I just make that up?)  What are your thoughts on
> RIVER-29?
>
> Cheers,
>
> Tom
>
>
>
> On Wed, Apr 21, 2010 at 11:52 AM, Peter Firmstone <ji...@zeus.net.au>
> wrote:
>
> > Thanks Chris, I'll action your recommendations.  It would be nice to
> try to
> > track down where the problem is, it's a shame DiscoveryEvent isn't
> > immutable.
> >
> > Does anyone need access to the protected fields in DiscoveryEvent?
> >
> > What are the ramifications of making DiscoveryEvent immutable? How
> much
> > breakage of application code?.
> >
> > Or all Event's for that matter.
> >
> > Cheers,
> >
> > Peter.
> >
> >
> > Chris Dolan (JIRA) wrote:
> >
> >> Attempted discard of unknown registrar kills LookupLocatorDiscovery
> thread
> >>
>
------------------------------------------------------------------------
> --
> >>
> >>                 Key: RIVER-337
> >>                 URL:
https://issues.apache.org/jira/browse/RIVER-337
> >>             Project: River
> >>          Issue Type: Bug
> >>          Components: net_jini_discovery, net_jini_lookup
> >>    Affects Versions: AR1, jtsk_2.1
> >>            Reporter: Chris Dolan
> >>
> >>
> >> The method
> >>
> >>
>
net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(Discov
> eryEvent)
> >> has the following code that throws a RuntimeException (the code
> comment
> >> suggests that it is supposed to be impossible, but it's not).
> >>
> >>        ProxyReg reg = findReg(proxys[i]);
> >>        if(reg != null ) { // this check can be removed.
> >>            proxyRegSet.remove(proxyRegSet.indexOf(reg));
> >>            drops.add(reg);
> >>        } else {
> >>            throw new RuntimeException("discard error");
> >>        }//endif
> >>
> >> Our QA does failover testing with two servers, each with a Reggie,
> where
> >> we deliberately crash and reboot server 1 then server 2 every 30
> minutes
> >> continuously.  In one case, we hit that RuntimeException.  I don't
> know why
> >> we got a null reg (that's a problem for another defect, maybe an
> undiagnosed
> >> race of two discards put on a task queue?  Maybe related to
> RIVER-37?).  But
> >> it caused a catastrophic chain of events because the
RuntimeException
> is not
> >> caught anywhere up the stack.  In our case, it killed the
> >> LookupLocatorDiscovery$Notifier thread.
> >>
> >> java.lang.RuntimeException: discard error
> >>        at
> >>
>
net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(2639)
> >>        at
> net.jini.discovery.LookupDiscoveryManager.notifyListener(1375)
> >>        at
> net.jini.discovery.LookupDiscoveryManager.notifyListener(1356)
> >>        at net.jini.discovery.LookupDiscoveryManager.access$500(92)
> >>        at
> >>
>
net.jini.discovery.LookupDiscoveryManager$LocatorDiscoveryListener.disca
> rded(543)
> >>        at
net.jini.discovery.LookupLocatorDiscovery$Notifier.run(650)
> >>
> >> I propose three changes:
> >>
> >>  1) change the discarded() method above to simply warn instead of
> throwing
> >>  2) put a try/catch(Throwable) around the listener invocation in
> >>  LookupLocatorDiscovery$Notifier.run()
> >>  3) put a similar try/catch around listener invocation in
> >> LookupDiscoveryManager.notifyListener
> >>
> >> The idea behind #2 and #3 is that misbehaving listeners should not
be
> >> allowed to derail the discovery process.
> >>
> >>
> >>
> >>
> >
> >
>

Re: [jira] Created: (RIVER-337) Attempted discard of unknown registrar kills LookupLocatorDiscovery thread

Posted by Tom Hobbs <tv...@googlemail.com>.
Yes, you're right.  I got the wrong end of the stick - wouldn't be the first
time.

Your problem is that you're getting too many events.  Whereas River-52 is
the opposite, a disappearing reggie (or slow-to respond listener) results in
all lookup events timing out.  Is that correct?

In your reggie configs, are you setting the initialLookupLocators array for
instance to point to the other?  I've experienced problems when doing this,
in this situation it seems like each reggie chats to the other which can
cause problems when one of them disappears.

Maybe someone who knows more about the ins and outs of reggie can comment?



On Wed, Apr 21, 2010 at 3:18 PM, Christopher Dolan <
christopher.dolan@avid.com> wrote:

> Unless I've misunderstood RIVER-52, I can't see how it could be related
> to my bug because that's on the Reggie side.
>
> My scenario is that we have a Reggie on Server#1 and a Reggie on
> Server#2 and when we shut down Server#1, then a *client* also running on
> Server#2 gets this error.  That client is still in contact with the
> Reggie on Server#2, but appears to be executing two discards for
> Reggie#1.
>
> I just had a new thought: could it be that we're getting one discard
> from LookupDiscovery and one from LookupLocatorDiscovery that are
> coincidentally firing at the same time when Reggie#1 shuts down?  I know
> for a fact that my two servers are on the same subnet and that they also
> have locators pointing at each other, so both discovery mechanisms are
> active in the client.
>
> Thanks for your attention.
> Chris
>
> -----Original Message-----
> From: Tom Hobbs [mailto:tvhobbs@googlemail.com]
> Sent: Wednesday, April 21, 2010 6:22 AM
> To: river-dev@incubator.apache.org
> Subject: Re: [jira] Created: (RIVER-337) Attempted discard of unknown
> registrar kills LookupLocatorDiscovery thread
>
> There's a related problem on busy networks when reggies disappear; see
> RIVER-52.  Although (probably) not the cause of Chris' trouble it's in a
> related area... I think.
>
> I sure that the problem described in RIVER-52 exists because I've
> encountered it in the wild, but I'm still having trouble reproducing it
> at
> will.  I can take a look at both of these when time allows.
>
> Peter; I don't know about DiscoveryEvent's fields.  I can't think of any
> reason off the top of my head as to why they can't be made protected.  I
> remember a little while ago you were talking about MarshalledInstance
> for
> some reason.  (Or did I just make that up?)  What are your thoughts on
> RIVER-29?
>
> Cheers,
>
> Tom
>
>
>
> On Wed, Apr 21, 2010 at 11:52 AM, Peter Firmstone <ji...@zeus.net.au>
> wrote:
>
> > Thanks Chris, I'll action your recommendations.  It would be nice to
> try to
> > track down where the problem is, it's a shame DiscoveryEvent isn't
> > immutable.
> >
> > Does anyone need access to the protected fields in DiscoveryEvent?
> >
> > What are the ramifications of making DiscoveryEvent immutable? How
> much
> > breakage of application code?.
> >
> > Or all Event's for that matter.
> >
> > Cheers,
> >
> > Peter.
> >
> >
> > Chris Dolan (JIRA) wrote:
> >
> >> Attempted discard of unknown registrar kills LookupLocatorDiscovery
> thread
> >>
> ------------------------------------------------------------------------
> --
> >>
> >>                 Key: RIVER-337
> >>                 URL: https://issues.apache.org/jira/browse/RIVER-337
> >>             Project: River
> >>          Issue Type: Bug
> >>          Components: net_jini_discovery, net_jini_lookup
> >>    Affects Versions: AR1, jtsk_2.1
> >>            Reporter: Chris Dolan
> >>
> >>
> >> The method
> >>
> >>
> net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(Discov
> eryEvent)
> >> has the following code that throws a RuntimeException (the code
> comment
> >> suggests that it is supposed to be impossible, but it's not).
> >>
> >>        ProxyReg reg = findReg(proxys[i]);
> >>        if(reg != null ) { // this check can be removed.
> >>            proxyRegSet.remove(proxyRegSet.indexOf(reg));
> >>            drops.add(reg);
> >>        } else {
> >>            throw new RuntimeException("discard error");
> >>        }//endif
> >>
> >> Our QA does failover testing with two servers, each with a Reggie,
> where
> >> we deliberately crash and reboot server 1 then server 2 every 30
> minutes
> >> continuously.  In one case, we hit that RuntimeException.  I don't
> know why
> >> we got a null reg (that's a problem for another defect, maybe an
> undiagnosed
> >> race of two discards put on a task queue?  Maybe related to
> RIVER-37?).  But
> >> it caused a catastrophic chain of events because the RuntimeException
> is not
> >> caught anywhere up the stack.  In our case, it killed the
> >> LookupLocatorDiscovery$Notifier thread.
> >>
> >> java.lang.RuntimeException: discard error
> >>        at
> >>
> net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(2639)
> >>        at
> net.jini.discovery.LookupDiscoveryManager.notifyListener(1375)
> >>        at
> net.jini.discovery.LookupDiscoveryManager.notifyListener(1356)
> >>        at net.jini.discovery.LookupDiscoveryManager.access$500(92)
> >>        at
> >>
> net.jini.discovery.LookupDiscoveryManager$LocatorDiscoveryListener.disca
> rded(543)
> >>        at net.jini.discovery.LookupLocatorDiscovery$Notifier.run(650)
> >>
> >> I propose three changes:
> >>
> >>  1) change the discarded() method above to simply warn instead of
> throwing
> >>  2) put a try/catch(Throwable) around the listener invocation in
> >>  LookupLocatorDiscovery$Notifier.run()
> >>  3) put a similar try/catch around listener invocation in
> >> LookupDiscoveryManager.notifyListener
> >>
> >> The idea behind #2 and #3 is that misbehaving listeners should not be
> >> allowed to derail the discovery process.
> >>
> >>
> >>
> >>
> >
> >
>

RE: [jira] Created: (RIVER-337) Attempted discard of unknown registrar kills LookupLocatorDiscovery thread

Posted by Christopher Dolan <ch...@avid.com>.
Unless I've misunderstood RIVER-52, I can't see how it could be related
to my bug because that's on the Reggie side.

My scenario is that we have a Reggie on Server#1 and a Reggie on
Server#2 and when we shut down Server#1, then a *client* also running on
Server#2 gets this error.  That client is still in contact with the
Reggie on Server#2, but appears to be executing two discards for
Reggie#1.

I just had a new thought: could it be that we're getting one discard
from LookupDiscovery and one from LookupLocatorDiscovery that are
coincidentally firing at the same time when Reggie#1 shuts down?  I know
for a fact that my two servers are on the same subnet and that they also
have locators pointing at each other, so both discovery mechanisms are
active in the client.

Thanks for your attention.
Chris

-----Original Message-----
From: Tom Hobbs [mailto:tvhobbs@googlemail.com] 
Sent: Wednesday, April 21, 2010 6:22 AM
To: river-dev@incubator.apache.org
Subject: Re: [jira] Created: (RIVER-337) Attempted discard of unknown
registrar kills LookupLocatorDiscovery thread

There's a related problem on busy networks when reggies disappear; see
RIVER-52.  Although (probably) not the cause of Chris' trouble it's in a
related area... I think.

I sure that the problem described in RIVER-52 exists because I've
encountered it in the wild, but I'm still having trouble reproducing it
at
will.  I can take a look at both of these when time allows.

Peter; I don't know about DiscoveryEvent's fields.  I can't think of any
reason off the top of my head as to why they can't be made protected.  I
remember a little while ago you were talking about MarshalledInstance
for
some reason.  (Or did I just make that up?)  What are your thoughts on
RIVER-29?

Cheers,

Tom



On Wed, Apr 21, 2010 at 11:52 AM, Peter Firmstone <ji...@zeus.net.au>
wrote:

> Thanks Chris, I'll action your recommendations.  It would be nice to
try to
> track down where the problem is, it's a shame DiscoveryEvent isn't
> immutable.
>
> Does anyone need access to the protected fields in DiscoveryEvent?
>
> What are the ramifications of making DiscoveryEvent immutable? How
much
> breakage of application code?.
>
> Or all Event's for that matter.
>
> Cheers,
>
> Peter.
>
>
> Chris Dolan (JIRA) wrote:
>
>> Attempted discard of unknown registrar kills LookupLocatorDiscovery
thread
>>
------------------------------------------------------------------------
--
>>
>>                 Key: RIVER-337
>>                 URL: https://issues.apache.org/jira/browse/RIVER-337
>>             Project: River
>>          Issue Type: Bug
>>          Components: net_jini_discovery, net_jini_lookup
>>    Affects Versions: AR1, jtsk_2.1
>>            Reporter: Chris Dolan
>>
>>
>> The method
>>
>>
net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(Discov
eryEvent)
>> has the following code that throws a RuntimeException (the code
comment
>> suggests that it is supposed to be impossible, but it's not).
>>
>>        ProxyReg reg = findReg(proxys[i]);
>>        if(reg != null ) { // this check can be removed.
>>            proxyRegSet.remove(proxyRegSet.indexOf(reg));
>>            drops.add(reg);
>>        } else {
>>            throw new RuntimeException("discard error");
>>        }//endif
>>
>> Our QA does failover testing with two servers, each with a Reggie,
where
>> we deliberately crash and reboot server 1 then server 2 every 30
minutes
>> continuously.  In one case, we hit that RuntimeException.  I don't
know why
>> we got a null reg (that's a problem for another defect, maybe an
undiagnosed
>> race of two discards put on a task queue?  Maybe related to
RIVER-37?).  But
>> it caused a catastrophic chain of events because the RuntimeException
is not
>> caught anywhere up the stack.  In our case, it killed the
>> LookupLocatorDiscovery$Notifier thread.
>>
>> java.lang.RuntimeException: discard error
>>        at
>>
net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(2639)
>>        at
net.jini.discovery.LookupDiscoveryManager.notifyListener(1375)
>>        at
net.jini.discovery.LookupDiscoveryManager.notifyListener(1356)
>>        at net.jini.discovery.LookupDiscoveryManager.access$500(92)
>>        at
>>
net.jini.discovery.LookupDiscoveryManager$LocatorDiscoveryListener.disca
rded(543)
>>        at net.jini.discovery.LookupLocatorDiscovery$Notifier.run(650)
>>
>> I propose three changes:
>>
>>  1) change the discarded() method above to simply warn instead of
throwing
>>  2) put a try/catch(Throwable) around the listener invocation in
>>  LookupLocatorDiscovery$Notifier.run()
>>  3) put a similar try/catch around listener invocation in
>> LookupDiscoveryManager.notifyListener
>>
>> The idea behind #2 and #3 is that misbehaving listeners should not be
>> allowed to derail the discovery process.
>>
>>
>>
>>
>
>

Re: River-29

Posted by Peter Firmstone <ji...@zeus.net.au>.
Hi Tom,

Hmm... My Thoughts: Rather than make those fields protected, we could 
make protected get methods return them?

Although the reference to the array is final, the contents of the array 
are still mutable.

On one hand, MarshalledInstance implements Serializable, which means the 
internal structure is publicly published, however if we want to access 
MarshalledInstance from multiple threads, we want it to be immutable.

Really we should use defensive copying in MarshalledInstance's 
constructor, as Mark has correctly highlighted, the array is still 
mutable as it isn't defensively copied during construction. 

However the original author's intent was for it to be immutable by 
making the field final.

We'd also need to defensively copy the field in writeObject() and 
readObject(), so an array reference cannot be stolen.

We can give a subclass access to the byte array by providing a protected 
method that returns a copy of the array, subclasses can then utilise it 
for overriding superclass methods and cache a transient copy.

We can't afford to ignore the march toward multiprocessor chips and 
inevitable concurrency, the sooner we come to grips with it the better.

Immutable classes (eg String) are the simplest form of concurrency, I'd 
like to utilise this pattern as much as possible.

Cheers,

Peter.

Tom Hobbs wrote:
> Peter; I don't know about DiscoveryEvent's fields.  I can't think of any
> reason off the top of my head as to why they can't be made protected.  I
> remember a little while ago you were talking about MarshalledInstance for
> some reason.  (Or did I just make that up?)  What are your thoughts on
> RIVER-29?
>
> Cheers,
>
> Tom
>
>   

Re: [jira] Created: (RIVER-337) Attempted discard of unknown registrar kills LookupLocatorDiscovery thread

Posted by Tom Hobbs <tv...@googlemail.com>.
There's a related problem on busy networks when reggies disappear; see
RIVER-52.  Although (probably) not the cause of Chris' trouble it's in a
related area... I think.

I sure that the problem described in RIVER-52 exists because I've
encountered it in the wild, but I'm still having trouble reproducing it at
will.  I can take a look at both of these when time allows.

Peter; I don't know about DiscoveryEvent's fields.  I can't think of any
reason off the top of my head as to why they can't be made protected.  I
remember a little while ago you were talking about MarshalledInstance for
some reason.  (Or did I just make that up?)  What are your thoughts on
RIVER-29?

Cheers,

Tom



On Wed, Apr 21, 2010 at 11:52 AM, Peter Firmstone <ji...@zeus.net.au> wrote:

> Thanks Chris, I'll action your recommendations.  It would be nice to try to
> track down where the problem is, it's a shame DiscoveryEvent isn't
> immutable.
>
> Does anyone need access to the protected fields in DiscoveryEvent?
>
> What are the ramifications of making DiscoveryEvent immutable? How much
> breakage of application code?.
>
> Or all Event's for that matter.
>
> Cheers,
>
> Peter.
>
>
> Chris Dolan (JIRA) wrote:
>
>> Attempted discard of unknown registrar kills LookupLocatorDiscovery thread
>> --------------------------------------------------------------------------
>>
>>                 Key: RIVER-337
>>                 URL: https://issues.apache.org/jira/browse/RIVER-337
>>             Project: River
>>          Issue Type: Bug
>>          Components: net_jini_discovery, net_jini_lookup
>>    Affects Versions: AR1, jtsk_2.1
>>            Reporter: Chris Dolan
>>
>>
>> The method
>>
>> net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(DiscoveryEvent)
>> has the following code that throws a RuntimeException (the code comment
>> suggests that it is supposed to be impossible, but it's not).
>>
>>        ProxyReg reg = findReg(proxys[i]);
>>        if(reg != null ) { // this check can be removed.
>>            proxyRegSet.remove(proxyRegSet.indexOf(reg));
>>            drops.add(reg);
>>        } else {
>>            throw new RuntimeException("discard error");
>>        }//endif
>>
>> Our QA does failover testing with two servers, each with a Reggie, where
>> we deliberately crash and reboot server 1 then server 2 every 30 minutes
>> continuously.  In one case, we hit that RuntimeException.  I don't know why
>> we got a null reg (that's a problem for another defect, maybe an undiagnosed
>> race of two discards put on a task queue?  Maybe related to RIVER-37?).  But
>> it caused a catastrophic chain of events because the RuntimeException is not
>> caught anywhere up the stack.  In our case, it killed the
>> LookupLocatorDiscovery$Notifier thread.
>>
>> java.lang.RuntimeException: discard error
>>        at
>> net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(2639)
>>        at net.jini.discovery.LookupDiscoveryManager.notifyListener(1375)
>>        at net.jini.discovery.LookupDiscoveryManager.notifyListener(1356)
>>        at net.jini.discovery.LookupDiscoveryManager.access$500(92)
>>        at
>> net.jini.discovery.LookupDiscoveryManager$LocatorDiscoveryListener.discarded(543)
>>        at net.jini.discovery.LookupLocatorDiscovery$Notifier.run(650)
>>
>> I propose three changes:
>>
>>  1) change the discarded() method above to simply warn instead of throwing
>>  2) put a try/catch(Throwable) around the listener invocation in
>>  LookupLocatorDiscovery$Notifier.run()
>>  3) put a similar try/catch around listener invocation in
>> LookupDiscoveryManager.notifyListener
>>
>> The idea behind #2 and #3 is that misbehaving listeners should not be
>> allowed to derail the discovery process.
>>
>>
>>
>>
>
>

Re: [jira] Created: (RIVER-337) Attempted discard of unknown registrar kills LookupLocatorDiscovery thread

Posted by Peter Firmstone <ji...@zeus.net.au>.
Thanks Chris, I'll action your recommendations.  It would be nice to try 
to track down where the problem is, it's a shame DiscoveryEvent isn't 
immutable.

Does anyone need access to the protected fields in DiscoveryEvent?

What are the ramifications of making DiscoveryEvent immutable? How much 
breakage of application code?.

Or all Event's for that matter.

Cheers,

Peter.

Chris Dolan (JIRA) wrote:
> Attempted discard of unknown registrar kills LookupLocatorDiscovery thread
> --------------------------------------------------------------------------
>
>                  Key: RIVER-337
>                  URL: https://issues.apache.org/jira/browse/RIVER-337
>              Project: River
>           Issue Type: Bug
>           Components: net_jini_discovery, net_jini_lookup
>     Affects Versions: AR1, jtsk_2.1
>             Reporter: Chris Dolan
>
>
> The method
>    net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(DiscoveryEvent)
> has the following code that throws a RuntimeException (the code comment suggests that it is supposed to be impossible, but it's not).
>
>         ProxyReg reg = findReg(proxys[i]);
>         if(reg != null ) { // this check can be removed.
>             proxyRegSet.remove(proxyRegSet.indexOf(reg));
>             drops.add(reg);
>         } else {
>             throw new RuntimeException("discard error");
>         }//endif
>
> Our QA does failover testing with two servers, each with a Reggie, where we deliberately crash and reboot server 1 then server 2 every 30 minutes continuously.  In one case, we hit that RuntimeException.  I don't know why we got a null reg (that's a problem for another defect, maybe an undiagnosed race of two discards put on a task queue?  Maybe related to RIVER-37?).  But it caused a catastrophic chain of events because the RuntimeException is not caught anywhere up the stack.  In our case, it killed the LookupLocatorDiscovery$Notifier thread.
>
> java.lang.RuntimeException: discard error
> 	at net.jini.lookup.ServiceDiscoveryManager$DiscMgrListener.discarded(2639)
> 	at net.jini.discovery.LookupDiscoveryManager.notifyListener(1375)
> 	at net.jini.discovery.LookupDiscoveryManager.notifyListener(1356)
> 	at net.jini.discovery.LookupDiscoveryManager.access$500(92)
> 	at net.jini.discovery.LookupDiscoveryManager$LocatorDiscoveryListener.discarded(543)
> 	at net.jini.discovery.LookupLocatorDiscovery$Notifier.run(650)
>
> I propose three changes:
>
>   1) change the discarded() method above to simply warn instead of throwing
>   2) put a try/catch(Throwable) around the listener invocation in 
>      LookupLocatorDiscovery$Notifier.run()
>   3) put a similar try/catch around listener invocation in LookupDiscoveryManager.notifyListener
>
> The idea behind #2 and #3 is that misbehaving listeners should not be allowed to derail the discovery process.
>
>
>