You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Alex Karasulu <ao...@bellsouth.net> on 2004/09/16 16:56:31 UTC

[seda] Race condition between disconnect and output events

On Thu, 2004-09-16 at 04:38, jira@apache.org wrote:
> The following comment has been added to this issue:
> 
>      Author: Trustin Lee
>     Created: Thu, 16 Sep 2004 1:37 AM
>        Body:

> If it is guarenteed that all Subscribers knows the list
> of managed channels, we will be able to resolve the 

For the record (i forgot) the first case so here it is:

1). the channel was not put into the output manager before first output
event was processed

Are you suggesting that there be a single list of channels which all
Subscribers can see?  Right now only the Input/OutputManagers have
access to the channel.  I'm just trying to understand how this will make
sure the OutputEvent is processed after the ConnectEvent. 

> first case.  Subscribing and unsubscribing rarely happens, 
> so it should be okay.  The performance problem exists 
> when the connections are very ofen established and 
> closed soon.

Right that's when you need the synchronization constructs which are
expensive.

> Using a priority queue will be helpful here.  This solution brings 
> up another synchronization issue that can slow down overall 
> performance per channel although the synchronization block (getting nextval) 
> is small enough.  Plus a customized high-performance priority queue 
> implementation is required.  Each event types will have their own 
> priorities and the sequence of an event will be of the second priority 
> which can be disabled by user.  This solution will solve issue 
> DIRSEDA-5, too.

Ok bare with me.  What you suggest is a priority queue based on two
different kinds of priorities.  The first kind is an Event type priority
and the second kind is a priority that orders events of the same type. 
Is this correct?

> The second case is more complicated.  In worst case, we can receive 
> output event one or two seconds later disconnection event is arrived.  
> There is no easy solution because EventRouter cannot predict there will 
> be more output events which is scheduled.  Notifying the user that the 
> event was not processed due to unexpected disconnection would be the 
> best we can do; the user will choose whether to retry it later or just 
> to drop it.

Here's the second case for continuity's sake:

2). the channel was removed from the output manager before all the
output events could be processed

Yes this is a bit more complex.  Obviously there is nothing you can do
if the client falls off the face of the earth.  Even if you had the
referrence to the channel it does no good to write to a closed channel. 
Something's going to fail regardless of what we do.  

What perplexes me about this situation is the fact that the client is
synchronous in the connect->write->read->disconnect sequence.  Here's
the client code out of the test case:

   1     EchoTCPClient client = new EchoTCPClient();
   2     client.connect( "localhost", 7 );
   3     byte[] toSend = "Hello world!".getBytes();
   4     byte[] recieved = new byte[toSend.length];
   5     client.getOutputStream().write( toSend );
   6     client.getInputStream().read( recieved );
   7     client.disconnect();
   8     assertEquals( new String( toSend ), new String( recieved ) );

So in lines 6 & 7 the client must read all the input before a disconnect
occurs to trigger a Disconnect event.  The server is not going to
disconnect unless there is a specific protocol message that triggers
that like an LDAP UnbindRequest - here we have nothing like that.

The question then is how the heck is a DisconnectEvent outrunning an
OutputEvent when all OutputEvents should have been processed already
before the DisconnectEvent is even created?  Can you see the ugliness of
staged event driven archs when it comes to debugging them.

BTW the fact that this client is synchronous read first then a
disconnect does not mean case 2 will be out of the question every time. 
Other protocols can still have a DisconnectEvent outrun other
OutputEvents as in LDAP with the UnbindRequest.  Other requests like
SearchRequests whose responses are being processed can be terminated by
a disconnect due to an UnbindRequest.

I guess this is more food for thought.  I still want to think about this
priority queue approach.  I guess the PQ works if you have recieved all
the events you need to order when you're looking at it to dequeue.  If
some events just have not arrived yet but should be the next to be
processed then the PQ I'm afraid will fail us.  We need something more
is what I'm thinking.  Something where there is centralized accounting
going on for events and those other events they generate.  This way
stages can determine event processing order and even use Barriers to
synchronize or join multiple threads across stages.  This however scares
me because of the cost to synchronize.

Alex

> ---------------------------------------------------------------------
> View this comment:
>   http://issues.apache.org/jira/browse/DIRSEDA-6?page=comments#action_53128
> 
> ---------------------------------------------------------------------
> View the issue:
>   http://issues.apache.org/jira/browse/DIRSEDA-6
> 
> Here is an overview of the issue:
> ---------------------------------------------------------------------
>         Key: DIRSEDA-6
>     Summary: Race condition between disconnect and output events
>        Type: Bug
> 
>      Status: Open
>    Priority: Major
> 
>     Project: Seda Framework
> 
>    Assignee: Alex Karasulu
>    Reporter: Alex Karasulu
> 
>     Created: Thu, 16 Sep 2004 12:23 AM
>     Updated: Thu, 16 Sep 2004 1:37 AM
> 
> Description:
> On occasion I get the following failure from the echo server test:
> 
> -- o error message o --
> 
> Sep 16, 2004 2:50:52 AM org.apache.seda.output.LoggingOutputMonitor channelMissing WARNING: org.apache.seda.output.DefaultOutputManager@2d9c06 could not find channel for client 127.0.0.1:7<-127.0.0.1:2402
> 
> -- o error message o --
> 
> Now this means a channel for the client was expected in the output manager but was not found.  This can be caused by two possible conditions:
> 
>  1). the channel was not put into the output manager before first output event was processed
>  2). the channel was removed from the output manager before all the output events could be processed
> 
> In the first case we have a race condition between the thread processing a ConnectEvent and a thread processing an OutputEvent.  The ConnectEvent processing is really slow in this case because all the stages were traversed via input->decode->reqproc->output before the ConnectEvent was handled.  That's a little far fetched so I'm going to presume that the second case is more likely.
> 
> In the second case the race condition is between the thread processing a DisconnectEvent and a thread processing an OutputEvent.  The DisconnectEvent in this case is outrunning the processing of the OutputEvent.  Before the OutputEvent can flush out data to the client the channel to the client is removed from the output manager by the DisconnectEvent.
> 
> 
> 
> ---------------------------------------------------------------------
> JIRA INFORMATION:
> This message is automatically generated by JIRA.
> 
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> 
> If you want more information on JIRA, or have a bug to report see:
>    http://www.atlassian.com/software/jira
> 


Re: [seda] Race condition between disconnect and output events

Posted by Trustin Lee <tr...@gmail.com>.
> > If it is guarenteed that all Subscribers knows the list
> > of managed channels, we will be able to resolve the
> 
> For the record (i forgot) the first case so here it is:
> 
> 1). the channel was not put into the output manager before first output
> event was processed
> 
> Are you suggesting that there be a single list of channels which all
> Subscribers can see?  Right now only the Input/OutputManagers have
> access to the channel.  I'm just trying to understand how this will make
> sure the OutputEvent is processed after the ConnectEvent.

It does not make sure the order of events, but it makes sure the
channel is put into OutputManager (actually it is not put into but
gets global ClientKey list from EventRouter)

> > Using a priority queue will be helpful here.  This solution brings
> > up another synchronization issue that can slow down overall
> > performance per channel although the synchronization block (getting nextval)
> > is small enough.  Plus a customized high-performance priority queue
> > implementation is required.  Each event types will have their own
> > priorities and the sequence of an event will be of the second priority
> > which can be disabled by user.  This solution will solve issue
> > DIRSEDA-5, too.
> 
> Ok bare with me.  What you suggest is a priority queue based on two
> different kinds of priorities.  The first kind is an Event type priority
> and the second kind is a priority that orders events of the same type.
> Is this correct?

Correct!

> > The second case is more complicated.  In worst case, we can receive
> > output event one or two seconds later disconnection event is arrived.
> > There is no easy solution because EventRouter cannot predict there will
> > be more output events which is scheduled.  Notifying the user that the
> > event was not processed due to unexpected disconnection would be the
> > best we can do; the user will choose whether to retry it later or just
> > to drop it.
> 
> Here's the second case for continuity's sake:
> 
> 2). the channel was removed from the output manager before all the
> output events could be processed
> 
> Yes this is a bit more complex.  Obviously there is nothing you can do
> if the client falls off the face of the earth.  Even if you had the
> referrence to the channel it does no good to write to a closed channel.
> Something's going to fail regardless of what we do.
> 
> What perplexes me about this situation is the fact that the client is
> synchronous in the connect->write->read->disconnect sequence.  Here's
> the client code out of the test case:
> 
>    1     EchoTCPClient client = new EchoTCPClient();
>    2     client.connect( "localhost", 7 );
>    3     byte[] toSend = "Hello world!".getBytes();
>    4     byte[] recieved = new byte[toSend.length];
>    5     client.getOutputStream().write( toSend );
>    6     client.getInputStream().read( recieved );
>    7     client.disconnect();
>    8     assertEquals( new String( toSend ), new String( recieved ) );
> 
> So in lines 6 & 7 the client must read all the input before a disconnect
> occurs to trigger a Disconnect event.  The server is not going to
> disconnect unless there is a specific protocol message that triggers
> that like an LDAP UnbindRequest - here we have nothing like that.
> 
> The question then is how the heck is a DisconnectEvent outrunning an
> OutputEvent when all OutputEvents should have been processed already
> before the DisconnectEvent is even created?  Can you see the ugliness of
> staged event driven archs when it comes to debugging them.

OutputEvent seems to take longer than DisconnectEvent and often
outruns it..  Priority queue cannot handle this situation.  We can
wait until all OutputEvents are processed if we have a global event
search mechanism, but it brings up synchronization overhead issue
again.

> I guess this is more food for thought.  I still want to think about this
> priority queue approach.  I guess the PQ works if you have recieved all
> the events you need to order when you're looking at it to dequeue.  If
> some events just have not arrived yet but should be the next to be
> processed then the PQ I'm afraid will fail us.  We need something more
> is what I'm thinking.  Something where there is centralized accounting
> going on for events and those other events they generate.  This way
> stages can determine event processing order and even use Barriers to
> synchronize or join multiple threads across stages.  This however scares
> me because of the cost to synchronize.

Yes, it costs alot although it is very attractive solution.  Imagine
'an intelligent EventRouter implementation' which controls the number
of assigned threads for each stages intelligently. :)  IMHO we cannot
control the order of events if we dont have a control over all stages
and events.  What we can do best is selectively applying this
technique, and it is better to implement it later.

Trustin
-- 
what we call human nature in actually is human habit
--
http://gleamynode.net/