You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@synapse.apache.org by Oleg Kalnichevski <ol...@apache.org> on 2007/03/25 17:10:45 UTC

Re: Transport appears to be hanging because an unchecked exception caused the I/O dispatch thread to terminate

On Sun, 2007-03-25 at 09:43 +0000, ant elder wrote:
> The symptoms I get do seem to match what you describe. There's still
> two problems with that though which I'd like to understand better.
> 
> 1) Why don't I see this with the non-NIO transport? For example  I can
> run the Synapse server samples in either the Synapse sample server
> which uses the NIO transport, or I can just use a separate axis2-1.1.1
> distro with the non-NIO transport. When using JMeter against
> axis2-1.1.1 it works fine and i can send tens of thousands of requests
> without any errors. Whats different here, the underlying TCP stack and
> config is the same isn't it? 

Anthony, Asankha, at al

The problem appears to be caused by Synapse opening an I/O pipe per
*every* incoming and outgoing HTTP message. On some platforms this can
be a very expensive operation both in terms of performance and system
resources. On Windows opening a I/O pipe apparently requires a local IP
port to be allocated. No wonder Synapse chokes only after a few thousand
of requests.

I see absolutely no reason why Synapse should make use of I/O pipes.
Essentially pipes are being used to bridge event-driven NIO and stream
based classic IO. There are other ways to get the job done. A trivial
shared buffer with synchronized access should perfectly suffice. I'll
happily lend you a helping hand if necessary.

> 2) Synapse often hangs after the IO error and needs to be restarted.
> Is there any way we can make it recover from this without requiring a
> restart? By handling the exception differently or something?
> 

Please let me know if you see any unchecked exceptions thrown by I/O
reactors, as those exceptions cause I/O dispatch threads to terminate,
effectively locking up the I/O reactor.

Oleg


>    ...ant 
> 
> On 3/24/07, Asankha C. Perera < asankha@wso2.com> wrote:
>         Ant
>         
>         This is the same error seen by Indika on Windows.. and I think
>         my analysis is correct. If you run the test for the first time
>         or after a few minutes of running the test last, you should be
>         able to go to around 1000 iterations. After you start to hit
>         this issue, even 200 iterations would give you the error. At
>         this time, doing a netstat -na should show you that most of
>         the tcp ports are in TIME_WAIT state. Usually it could take at
>         least one minute till a port is cleared up by the OS. The
>         tuning parameters I specified for Linux tells the OS to use
>         the full port range for applications, and to set the tcp fin
>         timeout to 30 secs - to clear up the ports as quickly as
>         possible. Without *any* OS tuning and on a Windows XP system -
>         you definitely will encounter this issue.
>         
>         
>         asankha
>         
>         ant elder wrote: 
>         > I've tried again with the latest Synapse and HTTP components
>         > code and several JVMs. The results feel slightly different
>         > than before but the end result is still always the root
>         > exception included below. Sometime it doesn't occur till
>         > around 1000 requests, but sometimes it happens after not
>         > many requests at all.  
>         > 
>         >    ...ant
>         > 
>         > java.io.IOException: Unable to establish loopback connection
>         >         at sun.nio.ch.PipeImpl$Initializer.run(Unknown
>         > Source)
>         >         at
>         > java.security.AccessController.doPrivileged(Native Method) 
>         >         at sun.nio.ch.PipeImpl.<init>(Unknown Source)
>         >         at sun.nio.ch.SelectorProviderImpl.openPipe(Unknown
>         > Source)
>         >         at java.nio.channels.Pipe.open(Unknown Source)
>         >         at
>         > org.apache.axis2.transport.nhttp.ServerHandler.requestReceived (ServerHandler.java:108)
>         >         at
>         > org.apache.axis2.transport.nhttp.LoggingNHttpServiceHandler.requestReceived(LoggingNHttpServiceHandler.java:83)
>         >         at
>         > org.apache.http.impl.nio.DefaultNHttpServerConnection.consumeInput (DefaultNHttpServerConnection.java:96)
>         >         at
>         > org.apache.axis2.transport.nhttp.PlainServerIOEventDispatch.inputReady(PlainServerIOEventDispatch.java:67)
>         >         at
>         > org.apache.http.impl.nio.reactor.BaseIOReactor.readable
>         > (BaseIOReactor.java:68)
>         >         at
>         > org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:160)
>         >         at
>         > org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java :145)
>         >         at
>         > org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:127)
>         >         at
>         > org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java :153)
>         >         at java.lang.Thread.run(Unknown Source)
>         > Caused by: java.net.BindException: Address already in use:
>         > connect
>         >         at sun.nio.ch.Net.connect(Native Method)
>         >         at sun.nio.ch.SocketChannelImpl.connect (Unknown
>         > Source)
>         >         at java.nio.channels.SocketChannel.open(Unknown
>         > Source)
>         > 
>         > On 3/23/07, Asankha C. Perera <as...@wso2.com> wrote: 
>         >         Ant
>         >         
>         >         I am quite sure that the problem seen by Indika now
>         >         was related to the ports being exhausted - see the
>         >         following articles and esp. the "MaxUserPort" and
>         >         "TcpTimedWaitDelay" parameters that could tweaked -
>         >         to be consistent with what I am using before running
>         >         a load test on Linux. I will ask Indika to check
>         >         these on Monday - but you may try this in the
>         >         meantime if you get a chance
>         >         
>         >         http://www.microsoft.com/technet/network/deploy/depovg/tcpip2k.mspx 
>         >         http://www.microsoft.com/technet/community/columns/cableguy/cg1205.mspx 
>         >         http://www.psc.edu/networking/projects/tcptune/OStune/winxp/winxp_stepbystep.html  
>         >         
>         >         
>         >         asankha
>         >         
>         >         Asankha C. Perera wrote: 
>         >         > Hi Ant
>         >         > 
>         >         > I fixed this for Linux and JDK 1.5 - I am
>         >         > confident of this fix as I was able to first
>         >         > recreate the issue consistently and then see the
>         >         > fix in action using 5 concurrent users sending a
>         >         > total of 5000 messages multiple times. However
>         >         > Indika is still seeing a 'similar' issue in
>         >         > Windows using JDK 1.4. We will try to see if its
>         >         > related to JDK 1.4 or Windows. If you get the
>         >         > latest nhttp code and build the nhttp JAR you
>         >         > could verify this fix - and let me know.
>         >         > 
>         >         > I am listing some of the linux commands that came
>         >         > in handy for the resolution incase someone wants
>         >         > to check this.
>         >         > 
>         >         > lsof -p 7426 => lists the open files for the pid
>         >         > given after the -p option
>         >         > 
>         >         > ls -l /proc/9976/fd | wc -l => for each process
>         >         > the /proc filesystem lists the files used and thus
>         >         > you could count the open files with this command
>         >         > 
>         >         > asankha
>         >         > 
>         >         > Asankha C. Perera wrote: 
>         >         > > Ant / Oleg
>         >         > > 
>         >         > > I can recreate this issue on both Windows and
>         >         > > Linux and think its caused by my code related to
>         >         > > use of Pipes.. and I am actively looking into
>         >         > > this right now.. will get back to you on what I
>         >         > > find.
>         >         > > 
>         >         > > asankha
>         >         > > 
>         >         > > ant elder wrote: 
>         >         > > > I've tried on several JDKs now and _always_
>         >         > > > get similar intermittent I/O related errors. I
>         >         > > > can use JMeter directly against Axis2-1.1.1
>         >         > > > without any problems at all, so this does look
>         >         > > > like some issue with the NIO transport. Be
>         >         > > > really good to hear from other Windows users
>         >         > > > to see if this is just my specific environment
>         >         > > > or  a more general problem problem. 
>         >         > > > 
>         >         > > > To recreate:
>         >         > > > 
>         >         > > > 1) build Synapse server sample by running
>         >         > > > 'ant' in the samples\axis2Server\src
>         >         > > > \SimpleStockQuoteService directory
>         >         > > > 2) start the sample service by running samples
>         >         > > > \axis2Server\axis2server.bat 
>         >         > > > 3) get the Synapse config  (either 8 or 501)
>         >         > > > from http://people.apache.org/~antelder/temp/,
>         >         > > > put in repository\conf\sample and start
>         >         > > > syanps: bin\synapse.bat -sample=8 
>         >         > > > 4) get the JMeter config test1.jmx from
>         >         > > > http://people.apache.org/~antelder/temp/,
>         >         > > > start Jmeter and File -> Open and point to the
>         >         > > > test1.jmx file
>         >         > > > 5) JMeter Run -> Start and after not to long
>         >         > > > IO errors should appear in the Syanpse
>         >         > > > console 
>         >         > > > 
>         >         > > >    ...ant 
>         >         > > > 
>         >         > > > ---------- Forwarded message ----------
>         >         > > > From: Asankha C. Perera <as...@wso2.com>
>         >         > > > Date: Mar 22, 2007 4:58 PM 
>         >         > > > Subject: Re: [jira] Resolved: (HTTPCORE-60)
>         >         > > > Transport appears to be hanging because an
>         >         > > > unchecked exception caused the I/O dispatch
>         >         > > > thread to terminate
>         >         > > > To: HttpComponents Project
>         >         > > > <ht...@jakarta.apache.org>
>         >         > > > 
>         >         > > > Oleg/Ant 
>         >         > > > 
>         >         > > > I am guessing this is something to do with
>         >         > > > Windows or the JDK you use.. But I am unable
>         >         > > > to test this week, so will try to my best to
>         >         > > > try this sometime next week. As I said, on
>         >         > > > Linux I have run the system through thousands
>         >         > > > of messages and multiple threads concurrently
>         >         > > > and have fixed all the issues I came across.
>         >         > > > 
>         >         > > > So Oleg, I do not see this as a blocker for
>         >         > > > the HttpCore release - but I will use your
>         >         > > > latest snapshots in Synapse to check on this
>         >         > > > in future if it occurs again
>         >         > > > 
>         >         > > > thanks
>         >         > > > asankha
>         >         > > > 
>         >         > > > Oleg Kalnichevski (JIRA) wrote: 
>         >         > > > >      [ 
>         >         > > > > 
>         >         > > > > https://issues.apache.org/jira/browse/HTTPCORE-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>         >         > > > > 
>         >         > > > >  ]
>         >         > > > > 
>         >         > > > > Oleg Kalnichevski resolved HTTPCORE-60.
>         >         > > > > ---------------------------------------
>         >         > > > > 
>         >         > > > > 
>         >         > > > > 
>         >         > > > >     Resolution: Fixed
>         >         > > > > 
>         >         > > > > Anthony
>         >         > > > > It turned out ClosedChannelException is a checked I/O exception so it cannot kill the I/O dispatch thread. So, apparently I was wrong in my initial assertion about the cause of the Synapse I/O transport lockup. I tweaked HttpCore code a little and changed the IOSessionImpl to catch all ChannelClosedException-s thrown by the underlying byte channel just in case.
>         >         > > > > 
>         >         > > > > 
>         >         > > > > 
>         >         > > > > 
>         >         > > > > 
>         >         > > > > Please review the changes and let me know if it is okay to proceed with the release
>         >         > > > > 
>         >         > > > > Oleg
>         >         > > > > 
>         >         > > > >   
>         >         > > > > > Transport appears to be hanging because an unchecked exception caused the I/O dispatch thread to terminate
>         >         > > > > > ----------------------------------------------------------------------------------------------------------
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > >                 Key: HTTPCORE-60
>         >         > > > > >                 URL: https://issues.apache.org/jira/browse/HTTPCORE-60
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > >             Project: HttpComponents Core
>         >         > > > > >          Issue Type: Bug
>         >         > > > > >    Affects Versions: 4.0-alpha4
>         >         > > > > >            Reporter: ant elder
>         >         > > > > >         Assigned To: Oleg Kalnichevski
>         >         > > > > >             Fix For: 4.0-alpha4
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > See discussion on synapse-dev mailing list: http://www.nabble.com/Intermittent-IO-Errors-using-Synapse-tf3439957.html
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > 
>         >         > > > > > The transport appears to be hanging because an unchecked exception
>         >         > > > > > caused the I/O dispatch thread to terminate. I believe there are several
>         >         > > > > > different types of problems (at least two) that we are seeing here.
>         >         > > > > > 
>         >         > > > > > [I/O reactor worker thread 5] ERROR ServerHandler - I/O Error : null
>         >         > > > > >     
>         >         > > > > > > java.nio.channels.ClosedChannelException
>         >         > > > > > >         at
>         >         > > > > > > sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:112)
>         >         > > > > > >         at
>         >         > > > > > > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java
>         >         > > > > > > 
>         >         > > > > > > 
>         >         > > > > > > :139)
>         >         > > > > > > 
>         >         > > > > > >       
>         >         > > > >   
>         >         > > > --------------------------------------------------------------------- To unsubscribe, e-mail: httpcomponents-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: httpcomponents-dev-help@jakarta.apache.org 
>         >         > > --------------------------------------------------------------------- To unsubscribe, e-mail: synapse-dev-unsubscribe@ws.apache.org For additional commands, e-mail: synapse-dev-help@ws.apache.org 
>         >         > --------------------------------------------------------------------- To unsubscribe, e-mail: synapse-dev-unsubscribe@ws.apache.org For additional commands, e-mail: synapse-dev-help@ws.apache.org 
>         >         --------------------------------------------------------------------- To unsubscribe, e-mail: synapse-dev-unsubscribe@ws.apache.org For additional commands, e-mail: synapse-dev-help@ws.apache.org 
>         > 
>         --------------------------------------------------------------------- To unsubscribe, e-mail: synapse-dev-unsubscribe@ws.apache.org For additional commands, e-mail: synapse-dev-help@ws.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-dev-help@ws.apache.org


Re: Transport appears to be hanging because an unchecked exception caused the I/O dispatch thread to terminate

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Mon, 2007-03-26 at 13:16 +0530, Asankha C. Perera wrote:
> Hi Oleg
> 
> Oleg Kalnichevski wrote:
> > The problem appears to be caused by Synapse opening an I/O pipe per
> > *every* incoming and outgoing HTTP message. On some platforms this can
> > be a very expensive operation both in terms of performance and system
> > resources. On Windows opening a I/O pipe apparently requires a local IP
> > port to be allocated. No wonder Synapse chokes only after a few thousand
> > of requests.
> >
> > I see absolutely no reason why Synapse should make use of I/O pipes.
> > Essentially pipes are being used to bridge event-driven NIO and stream
> > based classic IO. There are other ways to get the job done. A trivial
> > shared buffer with synchronized access should perfectly suffice. I'll
> > happily lend you a helping hand if necessary.
> >   
> Could you help me a bit here.. The Pipe class seemed to let us do what 
> we wanted - i.e. bridge streams to channels - without having to write 
> our own code. Could you elaborate more on how you propose to get around 
> using Pipes? or would you have a pointer to any code?
> 

Hi Asankha,

This does require writing some custom code, but it think it is well
worth the trouble.

Basically all you need is an object with synchronized access to its
internal buffer, so it could be used by the I/O dispatch thread to
produce data and by the worker thread to consume data or the other way
around. 

HttpCore NIO provides two interfaces for that end, which you may want to
take a starting point:

http://svn.apache.org/repos/asf/jakarta/httpcomponents/httpcore/trunk/module-nio/src/main/java/org/apache/http/nio/util/ContentInputBuffer.java
http://svn.apache.org/repos/asf/jakarta/httpcomponents/httpcore/trunk/module-nio/src/main/java/org/apache/http/nio/util/ContentOutputBuffer.java

There are also some concrete implementations of those interfaces
provided out of the box by HttpCore NIO.   

http://svn.apache.org/repos/asf/jakarta/httpcomponents/httpcore/trunk/module-nio/src/main/java/org/apache/http/nio/util/SharedInputBuffer.java
http://svn.apache.org/repos/asf/jakarta/httpcomponents/httpcore/trunk/module-nio/src/main/java/org/apache/http/nio/util/SharedOutputBuffer.java

These shared buffer classes are pretty advanced, as they are capable of
throttling the frequency of I/O events to make sure the internal buffer
does not get overflown, that is, the worker thread can temporarily
suspend data input / output on the socket channel if it cannot keep up
with the I/O rate and take the time needed to do data processing and
free up more space in the shared buffers. This can help ensure that the
transport operates with a nearly constant memory footprint, so once the
connection is established and is fully initialized (content buffers
allocated and all) it will never go down due to out of memory condition.
There is hardly anything worse for an HTTP transport then dropping
connections while streaming out response body after already having sent
HTTP 200 OK back to the client.

You can take a look at the throttling version of the HTTP service
handler for an example of shared buffers in action.

http://svn.apache.org/repos/asf/jakarta/httpcomponents/httpcore/trunk/module-nio/src/main/java/org/apache/http/nio/protocol/ThrottlingHttpServiceHandler.java

BUT the bad news is there is nearly no test coverage for these classes
yet as I wanted to spend more time working on them during ALPHA5. This
code can certainly benefit from more testing.

So, you might want to start with a somewhat simpler custom
implementation for Synapse 1.0 that always expands the buffers whenever
more input / output is made available, and then consider making it more
sophisticated for 1.1. Just an idea.

Oleg 


> thanks
> asankha
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: synapse-dev-unsubscribe@ws.apache.org
> For additional commands, e-mail: synapse-dev-help@ws.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-dev-help@ws.apache.org


Re: Transport appears to be hanging because an unchecked exception caused the I/O dispatch thread to terminate

Posted by "Asankha C. Perera" <as...@wso2.com>.
Hi Oleg

Oleg Kalnichevski wrote:
> The problem appears to be caused by Synapse opening an I/O pipe per
> *every* incoming and outgoing HTTP message. On some platforms this can
> be a very expensive operation both in terms of performance and system
> resources. On Windows opening a I/O pipe apparently requires a local IP
> port to be allocated. No wonder Synapse chokes only after a few thousand
> of requests.
>
> I see absolutely no reason why Synapse should make use of I/O pipes.
> Essentially pipes are being used to bridge event-driven NIO and stream
> based classic IO. There are other ways to get the job done. A trivial
> shared buffer with synchronized access should perfectly suffice. I'll
> happily lend you a helping hand if necessary.
>   
Could you help me a bit here.. The Pipe class seemed to let us do what 
we wanted - i.e. bridge streams to channels - without having to write 
our own code. Could you elaborate more on how you propose to get around 
using Pipes? or would you have a pointer to any code?

thanks
asankha

---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-dev-help@ws.apache.org