You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Jim Carlson <jc...@jnous.com> on 2003/05/21 01:57:45 UTC

deadlock in cluster module's TcpReplicationThread.java?

I have a concern about some logic in 
org/apache/catalina/cluster/tcp/TcpReplicationThread.java.  Specifically, this code:

     void drainChannel (SelectionKey key)
         throws Exception
     {
	[... snip ...]
         // resume interest in OP_READ
         key.interestOps (key.interestOps() | SelectionKey.OP_READ);
         // cycle the selector so this key is active again
         key.selector().wakeup();
     }


key.interestOps() is called to update the interest set on the selectionKey, and 
then the select is awoken.  This looks good, and it comes straight out of 
O'Reilly's "Java NIO" book (S4.5).  However, the Java API docs for SelectionKey 
state that:

"The operations of reading and writing the interest set will, in general, be 
synchronized with certain operations of the selector. Exactly how this 
synchronization is performed is implementation-dependent: In a naive 
implementation, reading or writing the interest set MAY BLOCK INDEFINITELY if a 
selection operation is already in progress" [all-caps mine]

Doesn't this mean that key.interestOps() could block for the duration of the 
ongoing select() call on certain NIO implementations, essentially resulting in 
deadlock?

To be honest, I have little idea how the cluster module works or how important 
it is for Tomcat -- my interest is in the Java NIO pattern of having multiple 
threads servicing a single Selector.  I too would like to borrow this code from 
the O'Reilly book, but as I say, I'm worried that it isn't robust.

Hopefully I'm wrong!  Any feedback is welcome.

Thanks,

Jim


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


RE: deadlock in cluster module's TcpReplicationThread.java?

Posted by Filip Hanik <ma...@filip.net>.
yes, funny that the Solaris implementation on this (Sun's own) totally sucks
on the select, doesn't work according to the spec, hence the timeout is a
necessity

Filip

> -----Original Message-----
> From: Jim Carlson [mailto:jcarlson@jnous.com]
> Sent: Tuesday, May 20, 2003 8:06 PM
> To: Tomcat Developers List
> Subject: Re: deadlock in cluster module's TcpReplicationThread.java?
>
>
> Also...
>
> Filip Hanik wrote:
> > [...] most of the implementations are
> > having problem with the wakeup method, hence there is a flag
> for timing out
> > the select() statement.
> >
>
> Interesting.  Being able to set a timeout for the select()
> statement solves my
> problem too.  (A work-around, anyway.)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: deadlock in cluster module's TcpReplicationThread.java?

Posted by Jim Carlson <jc...@jnous.com>.
Also...

Filip Hanik wrote:
> [...] most of the implementations are
> having problem with the wakeup method, hence there is a flag for timing out
> the select() statement.
> 

Interesting.  Being able to set a timeout for the select() statement solves my 
problem too.  (A work-around, anyway.)


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: deadlock in cluster module's TcpReplicationThread.java?

Posted by Jim Carlson <jc...@jnous.com>.
Filip Hanik wrote:
> also, if you look at the code, the key is removed from the iterator when it
> is being operated on,
> so I am not sure where your dead lock would occur :)
> 

The deadlock is supposed to occur between interestOps() and select().  I see 
that the SelectionKey is removed from the selected set in 
ReplicationListener.java, but that doesn't cancel the key, so it shouldn't 
affect the blocking behaviour of the next select() call (which occurs 
contemporaneously with drainChannel()).  TcpReplicationThread does turn off the 
OP_READ flag, like so:

         key.interestOps (key.interestOps() & (~SelectionKey.OP_READ));

until the channel is drained.  But that doesn't cancel the key either.  Perhaps 
clearing the interest set prevents the Selector from synchronizing on the Key, 
but we don't have any guarantees that this is the case.

Nevertheless, with respect to your previous email, it does seem unlikely that a 
real JDK implementation would lock all those keys during an entire call to 
select().  Perhaps my deadlock is merely theoretical, and would never happen in 
the real world.

Jim


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


RE: deadlock in cluster module's TcpReplicationThread.java?

Posted by Filip Hanik <ma...@filip.net>.
also, if you look at the code, the key is removed from the iterator when it
is being operated on,
so I am not sure where your dead lock would occur :)

Filip

> -----Original Message-----
> From: Filip Hanik [mailto:mail@filip.net]
> Sent: Tuesday, May 20, 2003 5:51 PM
> To: Tomcat Developers List
> Subject: RE: deadlock in cluster module's TcpReplicationThread.java?
>
>
> let me know when you find a JDK implementation that doesn't handle this
> well, and I will rewrite it :)
> so far I've had great results with it, most of the implementations are
> having problem with the wakeup method, hence there is a flag for
> timing out
> the select() statement.
>
> filip
>
> > -----Original Message-----
> > From: Jim Carlson [mailto:jcarlson@jnous.com]
> > Sent: Tuesday, May 20, 2003 4:58 PM
> > To: tomcat-dev@jakarta.apache.org
> > Subject: deadlock in cluster module's TcpReplicationThread.java?
> >
> >
> > I have a concern about some logic in
> > org/apache/catalina/cluster/tcp/TcpReplicationThread.java.
> > Specifically, this code:
> >
> >      void drainChannel (SelectionKey key)
> >          throws Exception
> >      {
> > 	[... snip ...]
> >          // resume interest in OP_READ
> >          key.interestOps (key.interestOps() | SelectionKey.OP_READ);
> >          // cycle the selector so this key is active again
> >          key.selector().wakeup();
> >      }
> >
> >
> > key.interestOps() is called to update the interest set on the
> > selectionKey, and
> > then the select is awoken.  This looks good, and it comes
> straight out of
> > O'Reilly's "Java NIO" book (S4.5).  However, the Java API docs
> > for SelectionKey
> > state that:
> >
> > "The operations of reading and writing the interest set will, in
> > general, be
> > synchronized with certain operations of the selector. Exactly how this
> > synchronization is performed is implementation-dependent: In a naive
> > implementation, reading or writing the interest set MAY BLOCK
> > INDEFINITELY if a
> > selection operation is already in progress" [all-caps mine]
> >
> > Doesn't this mean that key.interestOps() could block for the
> > duration of the
> > ongoing select() call on certain NIO implementations, essentially
> > resulting in
> > deadlock?
> >
> > To be honest, I have little idea how the cluster module works or
> > how important
> > it is for Tomcat -- my interest is in the Java NIO pattern of
> > having multiple
> > threads servicing a single Selector.  I too would like to borrow
> > this code from
> > the O'Reilly book, but as I say, I'm worried that it isn't robust.
> >
> > Hopefully I'm wrong!  Any feedback is welcome.
> >
> > Thanks,
> >
> > Jim
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


RE: deadlock in cluster module's TcpReplicationThread.java?

Posted by Filip Hanik <ma...@filip.net>.
let me know when you find a JDK implementation that doesn't handle this
well, and I will rewrite it :)
so far I've had great results with it, most of the implementations are
having problem with the wakeup method, hence there is a flag for timing out
the select() statement.

filip

> -----Original Message-----
> From: Jim Carlson [mailto:jcarlson@jnous.com]
> Sent: Tuesday, May 20, 2003 4:58 PM
> To: tomcat-dev@jakarta.apache.org
> Subject: deadlock in cluster module's TcpReplicationThread.java?
>
>
> I have a concern about some logic in
> org/apache/catalina/cluster/tcp/TcpReplicationThread.java.
> Specifically, this code:
>
>      void drainChannel (SelectionKey key)
>          throws Exception
>      {
> 	[... snip ...]
>          // resume interest in OP_READ
>          key.interestOps (key.interestOps() | SelectionKey.OP_READ);
>          // cycle the selector so this key is active again
>          key.selector().wakeup();
>      }
>
>
> key.interestOps() is called to update the interest set on the
> selectionKey, and
> then the select is awoken.  This looks good, and it comes straight out of
> O'Reilly's "Java NIO" book (S4.5).  However, the Java API docs
> for SelectionKey
> state that:
>
> "The operations of reading and writing the interest set will, in
> general, be
> synchronized with certain operations of the selector. Exactly how this
> synchronization is performed is implementation-dependent: In a naive
> implementation, reading or writing the interest set MAY BLOCK
> INDEFINITELY if a
> selection operation is already in progress" [all-caps mine]
>
> Doesn't this mean that key.interestOps() could block for the
> duration of the
> ongoing select() call on certain NIO implementations, essentially
> resulting in
> deadlock?
>
> To be honest, I have little idea how the cluster module works or
> how important
> it is for Tomcat -- my interest is in the Java NIO pattern of
> having multiple
> threads servicing a single Selector.  I too would like to borrow
> this code from
> the O'Reilly book, but as I say, I'm worried that it isn't robust.
>
> Hopefully I'm wrong!  Any feedback is welcome.
>
> Thanks,
>
> Jim
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org