You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@river.apache.org by Christopher Dolan <ch...@avid.com> on 2011/08/09 17:29:05 UTC

NPE in Jeri NIO SelectionManager

Has anyone else ever seen this exception?

select loop throws
java.lang.NullPointerException
                at com.sun.jini.jeri.internal.runtime.SelectionManager.waitForReadyKey(419)
                at com.sun.jini.jeri.internal.runtime.SelectionManager.access$600(80)
                at com.sun.jini.jeri.internal.runtime.SelectionManager$SelectLoop.run(287)
                at com.sun.jini.thread.ThreadPool$Worker.run(150)
                at java.lang.Thread.run(619) 

We recently switched from blocking Jeri to async NIO to save some RAM via fewer threads. We had only one known occurrence of this problem so I have no debugging information, but when it happened it kept happening for hours and the service was effectively dead to the outside world.

I studied the SelectionManager code pretty closely, and I can't see how a null can be attached to any selection key that matters. The only two places where keys are registered are in processRenewQueue() where the attachment is guaranteed non-null and in the SelectionManager constructor, where the wakeup key indeed has a null attachment. But the wakeup key is filtered out in waitForReadyKey line 391, so that can't be the cause of the NPE.

There were many other exceptions in the process logs, but I thought I'd start with this one because the others looked more normal (like failed unicast calls via LookupLocators).

Any ideas?

Thanks,
Chris

RE: NPE in Jeri NIO SelectionManager

Posted by Christopher Dolan <ch...@avid.com>.
On Tuesday, August 09, 2011 Gregg Wonderly wrote:
> On 8/9/2011 10:29 AM, Christopher Dolan wrote:
>> Has anyone else ever seen this exception?
>>
>> select loop throws
>> java.lang.NullPointerException
>>                  at
com.sun.jini.jeri.internal.runtime.SelectionManager.waitForReadyKey(419)
>>                  at
com.sun.jini.jeri.internal.runtime.SelectionManager.access$600(80)
>>                  at
com.sun.jini.jeri.internal.runtime.SelectionManager$SelectLoop.run(287)
>>                  at com.sun.jini.thread.ThreadPool$Worker.run(150)
>>                  at java.lang.Thread.run(619)
>>
> 
> Looking at this code too Chris, I don't see anything instantly
obvious.  But, I 
> do wonder about the use of 'lock' for synchronization given this
comment in the 
> Javadoc:
> 
> "* A selector's key and selected-key sets are not, in general, safe
for use
>   * by multiple concurrent threads.  If such a thread might modify one
of these
>   * sets directly then access should be controlled by synchronizing on
the set
>   * itself."

Interesting, I didn't know that about selectors. In this case, I think
we may be safe because keys are only added in the constructor and in
processRenewQueue(). The latter is only called from the SelectLoop
thread, and SelectionManager.concurrency is hard-coded to 1 so there's
only one SelectLoop thread. So, the only thread that can attach is the
same thread that calls select().

As a short-term solution, I'm going to recommend that my team adds a
null check, but I'm not sure what to do if the attachment does turn out
to be null. And I'll have to wait and see if this error happens again.

Chris

RE: NPE in Jeri NIO SelectionManager

Posted by Christopher Dolan <ch...@avid.com>.
On Tuesday, August 09, 2011 Gregg Wonderly wrote:
> On 8/9/2011 10:29 AM, Christopher Dolan wrote:
>> Has anyone else ever seen this exception?
>>
>> select loop throws
>> java.lang.NullPointerException
>>                  at
com.sun.jini.jeri.internal.runtime.SelectionManager.waitForReadyKey(419)
>>                  at
com.sun.jini.jeri.internal.runtime.SelectionManager.access$600(80)
>>                  at
com.sun.jini.jeri.internal.runtime.SelectionManager$SelectLoop.run(287)
>>                  at com.sun.jini.thread.ThreadPool$Worker.run(150)
>>                  at java.lang.Thread.run(619)
>>
> 
> Looking at this code too Chris, I don't see anything instantly
obvious.  But, I 
> do wonder about the use of 'lock' for synchronization given this
comment in the 
> Javadoc:
> 
> "* A selector's key and selected-key sets are not, in general, safe
for use
>   * by multiple concurrent threads.  If such a thread might modify one
of these
>   * sets directly then access should be controlled by synchronizing on
the set
>   * itself."

Interesting, I didn't know that about selectors. In this case, I think
we may be safe because keys are only added in the constructor and in
processRenewQueue(). The latter is only called from the SelectLoop
thread, and SelectionManager.concurrency is hard-coded to 1 so there's
only one SelectLoop thread. So, the only thread that can attach is the
same thread that calls select().

As a short-term solution, I'm going to recommend that my team adds a
null check, but I'm not sure what to do if the attachment does turn out
to be null. And I'll have to wait and see if this error happens again.

Chris

Re: NPE in Jeri NIO SelectionManager

Posted by Gregg Wonderly <gr...@wonderly.org>.
On 8/9/2011 10:29 AM, Christopher Dolan wrote:
> Has anyone else ever seen this exception?
>
> select loop throws
> java.lang.NullPointerException
>                  at com.sun.jini.jeri.internal.runtime.SelectionManager.waitForReadyKey(419)
>                  at com.sun.jini.jeri.internal.runtime.SelectionManager.access$600(80)
>                  at com.sun.jini.jeri.internal.runtime.SelectionManager$SelectLoop.run(287)
>                  at com.sun.jini.thread.ThreadPool$Worker.run(150)
>                  at java.lang.Thread.run(619)
>
> We recently switched from blocking Jeri to async NIO to save some RAM via fewer threads. We had only
 > one known occurrence of this problem so I have no debugging information, but 
when it happened it
 > kept happening for hours and the service was effectively dead to the outside 
world.
>
> I studied the SelectionManager code pretty closely, and I can't see how a null can be attached to
 > any selection key that matters. The only two places where keys are registered 
are in
 > processRenewQueue() where the attachment is guaranteed non-null and in the 
SelectionManager
 > constructor, where the wakeup key indeed has a null attachment. But the 
wakeup key is filtered
 > out in waitForReadyKey line 391, so that can't be the cause of the NPE.

Looking at this code too Chris, I don't see anything instantly obvious.  But, I 
do wonder about the use of 'lock' for synchronization given this comment in the 
Javadoc:

"* A selector's key and selected-key sets are not, in general, safe for use
  * by multiple concurrent threads.  If such a thread might modify one of these
  * sets directly then access should be controlled by synchronizing on the set
  * itself."

I see that "attachment" is volatile so it can be "modified" from any context and 
should be "visible".  But, without more staring at the code for "thread context" 
it will be difficult to see if there is a "rogue" thread issue.  Unfortunately, 
there isn't a simple way to plug in a SelectionKey implementation that would 
bark about SelectionKey.attach() being called with a "null" value.  But, you 
might try that with a bootclasspath change to a custom version of the class as a 
"check" to see if you can see when it is unexpectedly set to null.  I've done 
this on other occasions where I use a "set" to store the "String" value of the 
new Throwable().printStackTrace( Writer ) to a string buffer and then just print 
that out when the number of elements in the set changes when I add it to the 
set.  This will be not so spammy, but will show you where attach() is being 
called with null.

Gregg

> There were many other exceptions in the process logs, but I thought I'd start with this one
 > because the others looked more normal (like failed unicast calls via 
LookupLocators).
>
> Any ideas?
>
> Thanks,
> Chris
>


Re: NPE in Jeri NIO SelectionManager

Posted by Gregg Wonderly <gr...@wonderly.org>.
On 8/9/2011 10:29 AM, Christopher Dolan wrote:
> Has anyone else ever seen this exception?
>
> select loop throws
> java.lang.NullPointerException
>                  at com.sun.jini.jeri.internal.runtime.SelectionManager.waitForReadyKey(419)
>                  at com.sun.jini.jeri.internal.runtime.SelectionManager.access$600(80)
>                  at com.sun.jini.jeri.internal.runtime.SelectionManager$SelectLoop.run(287)
>                  at com.sun.jini.thread.ThreadPool$Worker.run(150)
>                  at java.lang.Thread.run(619)
>
> We recently switched from blocking Jeri to async NIO to save some RAM via fewer threads. We had only
 > one known occurrence of this problem so I have no debugging information, but 
when it happened it
 > kept happening for hours and the service was effectively dead to the outside 
world.
>
> I studied the SelectionManager code pretty closely, and I can't see how a null can be attached to
 > any selection key that matters. The only two places where keys are registered 
are in
 > processRenewQueue() where the attachment is guaranteed non-null and in the 
SelectionManager
 > constructor, where the wakeup key indeed has a null attachment. But the 
wakeup key is filtered
 > out in waitForReadyKey line 391, so that can't be the cause of the NPE.

Looking at this code too Chris, I don't see anything instantly obvious.  But, I 
do wonder about the use of 'lock' for synchronization given this comment in the 
Javadoc:

"* A selector's key and selected-key sets are not, in general, safe for use
  * by multiple concurrent threads.  If such a thread might modify one of these
  * sets directly then access should be controlled by synchronizing on the set
  * itself."

I see that "attachment" is volatile so it can be "modified" from any context and 
should be "visible".  But, without more staring at the code for "thread context" 
it will be difficult to see if there is a "rogue" thread issue.  Unfortunately, 
there isn't a simple way to plug in a SelectionKey implementation that would 
bark about SelectionKey.attach() being called with a "null" value.  But, you 
might try that with a bootclasspath change to a custom version of the class as a 
"check" to see if you can see when it is unexpectedly set to null.  I've done 
this on other occasions where I use a "set" to store the "String" value of the 
new Throwable().printStackTrace( Writer ) to a string buffer and then just print 
that out when the number of elements in the set changes when I add it to the 
set.  This will be not so spammy, but will show you where attach() is being 
called with null.

Gregg

> There were many other exceptions in the process logs, but I thought I'd start with this one
 > because the others looked more normal (like failed unicast calls via 
LookupLocators).
>
> Any ideas?
>
> Thanks,
> Chris
>