You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@synapse.apache.org by "Hubert, Eric" <er...@jamba.net> on 2008/06/25 18:13:45 UTC

Possible Causes for "Connection reset by peer" when using NIO

Hi devs!

first of all I'd like to apologize for posting a "user-problem" to two dev-lists. I only did this as have not much background knowledge of the NIO implementation and think a solid understanding of NIO is necessary to help tackling our problem.

We are using the WSO2 ESB which is based on Apache Synapse, Apache Axis2 and the HTTP Core NIO module. As the stacktrace only contains http-nio details, I cc'ed the http components dev list. Hopefully someone can help out.

When sending about 3000 Hessian-requests per hour from clients (Tomcat) over the ESB (Synapse 1.2 running on JDK 1.5.15, Linux 2.6.23.1-amd64-75) to a Bea Weblogic 8.1 we see about 1 to 10 exceptions of type "java.io.IOException: Connection reset by peer" in the ESB-log. 

If I understand it right the ESB then executes a failover to the next service node as we are using a load balancing group. So the client is not affected, but the endpoint with the failure will be marked as inactive.

The problem is I don't understand the cause of this exception. It occurs during the read on a Socket-Channel. So I think the server might close the connection while the ESB is reading. But maybe internally some kind of pool is used and a connection can change to some abnormal state?

We have seen such Exceptions before when we were using HTTP 1.1 in combination with the Bea Weblogic server. Very likely an issue with HTTP keepalive (persistent connections). So for any connection to a Bea service we use the property mediator of Synapse to change the connection ESB <-> Bea to use HTTP 1.0:
<syn:property name="FORCE_HTTP_1.0" value="true" scope="axis2" />

Since then we hadn't seen this exception again. But now switching to another environment we see this exception again, but only for Hessian services.
I have no clue what else could cause this exception. How can we detect the cause? How to narrow down possible causes, if there are different possibilities. I don't expect any network outages to be the reason, as other services (SOAP)-based are working pretty well.

The exact exception we are getting is:

java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
        at sun.nio.ch.IOUtil.read(IOUtil.java:206)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
        at org.apache.http.impl.nio.reactor.SessionInputBufferImpl.fill(SessionInputBufferImpl.java:85)
        at org.apache.http.impl.nio.codecs.AbstractMessageParser.fillBuffer(AbstractMessageParser.java:97)
        at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:113)
        at org.apache.http.impl.nio.DefaultClientIOEventDispatch.inputReady(DefaultClientIOEventDispatch.java:99)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:98)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:195)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:180)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:142)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:70)
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:318) 


This exception occurs consistently a few time per hour on every possible combination of client node, esb node and service endpoint node.

Any pointer or idea is greatly appreciated. Thanks a lot in advance!


Regards,
   Eric

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
For additional commands, e-mail: dev-help@synapse.apache.org


Re: AW: Possible Causes for "Connection reset by peer" when using NIO

Posted by "Asankha C. Perera" <as...@wso2.com>.
Hi Eric
> "However, a persistent connection with an HTTP/1.0 client cannot make use of the chunked transfer-coding, and therefore MUST use a Content-Length for marking the ending boundary of each message"
>  
> a new idea came to my mind. Could this exception also occure if the server closes the connection before reading the whole request? What if the content length would be wrong for some reason? So maybe it would be too small just for certain messages (depending on the binary content)? Could this cause the same exception? If so, how do you calculate the content length?
>   
I do not think the content length could be wrong, as its a clear 
violation of the HTTP spec then.. However, I have seen ApacheBench 
reporting incorrect length errors, and I think this is using HTTP 1.0, 
without keepalive, in which case the stream is closed, signalling the 
end of the message, even if the Content-Length header said something 
larger.. if the Content-length stated something lesser than the actual 
payload, as I understand, this should result in a serious error.
> PS: Sorry for the wild guesses. I have no idea how to further analyze the issue. Any help is greatly appreciated.
>   
No problem.. one option would be for you to develop a small scenario 
which I could deploy on WebLogic, and try to recreate.. generally I like 
bugs which I can re-create, as I can then find the cause and the 
solutions as well.. so if you can get this done from your end, I can 
take over the task of finding the underlying reason and a workaround

asankha
-- 
Asankha C. Perera

WSO2 - http://wso2.org
http://esbmagic.blogspot.com


AW: Possible Causes for "Connection reset by peer" when using NIO

Posted by "Hubert, Eric" <er...@jamba.net>.
Hi all,
 
unfortunately decreasing the synapse (nhttp) socket timeout to 25000 ms did not change anything to the current situation. I'm also thinking about what may be the difference between Hessain and SOAP in this regard.
 
Reading RFC 2068 Appendix 19.7:
 
"However, a persistent connection with an HTTP/1.0 client cannot make use of the chunked transfer-coding, and therefore MUST use a Content-Length for marking the ending boundary of each message"
 
a new idea came to my mind. Could this exception also occure if the server closes the connection before reading the whole request? What if the content length would be wrong for some reason? So maybe it would be too small just for certain messages (depending on the binary content)? Could this cause the same exception? If so, how do you calculate the content length?
 
Regards,
  Eric
 
PS: Sorry for the wild guesses. I have no idea how to further analyze the issue. Any help is greatly appreciated.
 

________________________________


		I think this is a good idea.. as we will close the session on our own
		without an exception, and then BEA can close it from that side
		    

	Ok, then I will go ahead and try this out. Is there a way to check
	whether this property has been applied properly by Synapse? Some
	JMX-monitoring possibility or so?
	  

Unfortunately no.. not unless you set the org.apache.synapse.transport.nhttp.NHttpConfiguration log level to debug, in which case you will see something like ""Using nhttp tuning parameter : ..." at startup. You will need to restart the system for changes to take effect

asankha


-- 
Asankha C. Perera 

WSO2 - http://wso2.org <http://wso2.org/> 
http://esbmagic.blogspot.com <http://esbmagic.blogspot.com/> 



Re: Possible Causes for "Connection reset by peer" when using NIO

Posted by "Asankha C. Perera" <as...@wso2.com>.
Hi Eric
>> I think this is a good idea.. as we will close the session on our own
>> without an exception, and then BEA can close it from that side
>>     
>
> Ok, then I will go ahead and try this out. Is there a way to check
> whether this property has been applied properly by Synapse? Some
> JMX-monitoring possibility or so?
>   
Unfortunately no.. not unless you set the 
org.apache.synapse.transport.nhttp.NHttpConfiguration log level to 
debug, in which case you will see something like ""Using nhttp tuning 
parameter : ..." at startup. You will need to restart the system for 
changes to take effect

asankha

-- 
Asankha C. Perera

WSO2 - http://wso2.org
http://esbmagic.blogspot.com


RE: Possible Causes for "Connection reset by peer" when using NIO

Posted by "Hubert, Eric" <er...@jamba.net>.
Hi Asankha,
  
> I think this is a good idea.. as we will close the session on our own
> without an exception, and then BEA can close it from that side

Ok, then I will go ahead and try this out. Is there a way to check
whether this property has been applied properly by Synapse? Some
JMX-monitoring possibility or so?

Regards,
   Eric 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
For additional commands, e-mail: dev-help@synapse.apache.org


Re: Possible Causes for "Connection reset by peer" when using NIO

Posted by "Asankha C. Perera" <as...@wso2.com>.
Hi Eric
> This brings me to a new idea and I what like to hear what you think
> about.
> What if I would decrease the value for http.socket.timeout to 20000 for
> Synapse, so to be definitely lower than the one on the server side. What
> would be the expected result? Would I see another exception, if the
> timeout on the Synapse side is reached? Maybe I'm wrong and there are
> requests which take longer, even if they are neither listed in our
> statistics nor in the http access logs of the Bea server.
>   
I think this is a good idea.. as we will close the session on our own 
without an exception, and then BEA can close it from that side

asankha

-- 
Asankha C. Perera

WSO2 - http://wso2.org
http://esbmagic.blogspot.com


Re: Possible Causes for "Connection reset by peer" when using NIO

Posted by "Asankha C. Perera" <as...@wso2.com>.
Hi Eric
> This brings me to a new idea and I what like to hear what you think
> about.
> What if I would decrease the value for http.socket.timeout to 20000 for
> Synapse, so to be definitely lower than the one on the server side. What
> would be the expected result? Would I see another exception, if the
> timeout on the Synapse side is reached? Maybe I'm wrong and there are
> requests which take longer, even if they are neither listed in our
> statistics nor in the http access logs of the Bea server.
>   
I think this is a good idea.. as we will close the session on our own 
without an exception, and then BEA can close it from that side

asankha

-- 
Asankha C. Perera

WSO2 - http://wso2.org
http://esbmagic.blogspot.com


RE: Possible Causes for "Connection reset by peer" when using NIO

Posted by "Hubert, Eric" <er...@jamba.net>.
Hi Asankha,

thanks for your reply as well! Please see my comments below!
> One thing you could analyze is the TCP socket timeout times in the
> different environments.. 
Hmm, TCP socket timeout should be the same, but I will investigate.
Also our response times are pretty low. Average of 10-15 ms with a
maximum of less than 7 seconds so far (happened only a few time since
three days).

> "nhttp.properties" into the ESB classpath, and add the following line
> into it, you can change the Synapse socket timeout
> "http.socket.timeout=60000" <- this is in ms. Maybe you can do a
similar
> thing with the BEA server..
This brings me to a new idea and I what like to hear what you think
about.
What if I would decrease the value for http.socket.timeout to 20000 for
Synapse, so to be definitely lower than the one on the server side. What
would be the expected result? Would I see another exception, if the
timeout on the Synapse side is reached? Maybe I'm wrong and there are
requests which take longer, even if they are neither listed in our
statistics nor in the http access logs of the Bea server.

Regards,
   Eric 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
For additional commands, e-mail: dev-help@synapse.apache.org


RE: Possible Causes for "Connection reset by peer" when using NIO

Posted by "Hubert, Eric" <er...@jamba.net>.
Hi Asankha,

thanks for your reply as well! Please see my comments below!
> One thing you could analyze is the TCP socket timeout times in the
> different environments.. 
Hmm, TCP socket timeout should be the same, but I will investigate.
Also our response times are pretty low. Average of 10-15 ms with a
maximum of less than 7 seconds so far (happened only a few time since
three days).

> "nhttp.properties" into the ESB classpath, and add the following line
> into it, you can change the Synapse socket timeout
> "http.socket.timeout=60000" <- this is in ms. Maybe you can do a
similar
> thing with the BEA server..
This brings me to a new idea and I what like to hear what you think
about.
What if I would decrease the value for http.socket.timeout to 20000 for
Synapse, so to be definitely lower than the one on the server side. What
would be the expected result? Would I see another exception, if the
timeout on the Synapse side is reached? Maybe I'm wrong and there are
requests which take longer, even if they are neither listed in our
statistics nor in the http access logs of the Bea server.

Regards,
   Eric 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: Possible Causes for "Connection reset by peer" when using NIO

Posted by "Asankha C. Perera" <as...@wso2.com>.
Eric
>> Since then we hadn't seen this exception again. But now switching to another environment we see this exception again, but only for Hessian services.
>> I have no clue what else could cause this exception. How can we detect the cause? How to narrow down possible causes, if there are different possibilities. I don't expect any network outages to be the reason, as other services (SOAP)-based are working pretty well.
>>     
One thing you could analyze is the TCP socket timeout times in the 
different environments.. If you drop a file with the name 
"nhttp.properties" into the ESB classpath, and add the following line 
into it, you can change the Synapse socket timeout 
"http.socket.timeout=60000" <- this is in ms. Maybe you can do a similar 
thing with the BEA server..

asankha

-- 
Asankha C. Perera

WSO2 - http://wso2.org
http://esbmagic.blogspot.com


Re: Possible Causes for "Connection reset by peer" when using NIO

Posted by Ruwan Linton <ru...@gmail.com>.
Hi Eric,

>
>
>
> > I do not know Hessian well enough protocol to comment on it.
> I also can't see any logical relation to the application protocol used on
> top of the transport protocol.


Yes, for Oleg you can assume it to be just a byte stream the protocol does
not matter I guess.


>
>
> If we send HTTP 1.0 requests with Synapse can it still happen that the
> server uses persistent connections? I think this should not be the case. But
> if the server does not use persistent connections and for each request a new
> connection will be created I don't understand how this error might occur.
> One idea which came into my mind was a timeout. But our response times are
> pretty low (about 10 ms on average). The longest ever running request took
> 6500 ms according to our statistic data. The default nhttp timeout should be
> 60000 ms. I'll try to see if there might be some connection timeout on the
> server side.


OK, you may increase the socket timeout of Synapse to verify this, you can
put a file nhttp.properties file with the entry http.socket.timeout with
70000 as the value.

Thanks,
Ruwan


>
>
> Any other idea?
>
>
> Regards,
>    Eric
>



-- 
Ruwan Linton
http://wso2.org - "Oxygenating the Web Services Platform"
http://ruwansblog.blogspot.com/

RE: Possible Causes for "Connection reset by peer" when using NIO

Posted by "Hubert, Eric" <er...@jamba.net>.
Oleg,

thanks a lot for your reply! Please see my comments inline!

> > The problem is I don't understand the cause of this exception. It occurs
> during the read on a Socket-Channel. So I think the server might close the
> connection while the ESB is reading. But maybe internally some kind of
> pool is used and a connection can change to some abnormal state?
> 
> Unlikely. This kind of I/O exception occurs when the connection is
> closed by the _remote_ side.

Yes, this has also been my understanding. So our Bea Application Server seems to close the connection which is still in use by Synapse.

> > <syn:property name="FORCE_HTTP_1.0" value="true" scope="axis2" />
> >
> 
> Connection reset by peer I/O exceptions are perfectly normal with
> persistent HTTP connections. 
Bye the way just another question. If those exceptions are "perfectly normal" why do we see a stacktrace in the log a not a short warning? I mean the complete stack doesn't help much and just bloats the server logs.
Is there a particular reason for this. How shall this exception normally be handled from any application using the http core nio module? Just a retry of the request?



> The most likely cause of it is that the
> connection was closed on the server side due to the timeout (maximum
> period of inactivity) about the same moment the client started sending
> data to the server. Situations like that can happen.
Ok, I understand this now for http 1.1 (with persistent connnections).


> I do not know Hessian well enough protocol to comment on it.
I also can't see any logical relation to the application protocol used on top of the transport protocol.

If we send HTTP 1.0 requests with Synapse can it still happen that the server uses persistent connections? I think this should not be the case. But if the server does not use persistent connections and for each request a new connection will be created I don't understand how this error might occur.
One idea which came into my mind was a timeout. But our response times are pretty low (about 10 ms on average). The longest ever running request took 6500 ms according to our statistic data. The default nhttp timeout should be 60000 ms. I'll try to see if there might be some connection timeout on the server side. 

Any other idea?


Regards,
   Eric

RE: Possible Causes for "Connection reset by peer" when using NIO

Posted by "Hubert, Eric" <er...@jamba.net>.
Oleg,

thanks a lot for your reply! Please see my comments inline!

> > The problem is I don't understand the cause of this exception. It occurs
> during the read on a Socket-Channel. So I think the server might close the
> connection while the ESB is reading. But maybe internally some kind of
> pool is used and a connection can change to some abnormal state?
> 
> Unlikely. This kind of I/O exception occurs when the connection is
> closed by the _remote_ side.

Yes, this has also been my understanding. So our Bea Application Server seems to close the connection which is still in use by Synapse.

> > <syn:property name="FORCE_HTTP_1.0" value="true" scope="axis2" />
> >
> 
> Connection reset by peer I/O exceptions are perfectly normal with
> persistent HTTP connections. 
Bye the way just another question. If those exceptions are "perfectly normal" why do we see a stacktrace in the log a not a short warning? I mean the complete stack doesn't help much and just bloats the server logs.
Is there a particular reason for this. How shall this exception normally be handled from any application using the http core nio module? Just a retry of the request?



> The most likely cause of it is that the
> connection was closed on the server side due to the timeout (maximum
> period of inactivity) about the same moment the client started sending
> data to the server. Situations like that can happen.
Ok, I understand this now for http 1.1 (with persistent connnections).


> I do not know Hessian well enough protocol to comment on it.
I also can't see any logical relation to the application protocol used on top of the transport protocol.

If we send HTTP 1.0 requests with Synapse can it still happen that the server uses persistent connections? I think this should not be the case. But if the server does not use persistent connections and for each request a new connection will be created I don't understand how this error might occur.
One idea which came into my mind was a timeout. But our response times are pretty low (about 10 ms on average). The longest ever running request took 6500 ms according to our statistic data. The default nhttp timeout should be 60000 ms. I'll try to see if there might be some connection timeout on the server side. 

Any other idea?


Regards,
   Eric

Re: Possible Causes for "Connection reset by peer" when using NIO

Posted by "Asankha C. Perera" <as...@wso2.com>.
Eric
>> Since then we hadn't seen this exception again. But now switching to another environment we see this exception again, but only for Hessian services.
>> I have no clue what else could cause this exception. How can we detect the cause? How to narrow down possible causes, if there are different possibilities. I don't expect any network outages to be the reason, as other services (SOAP)-based are working pretty well.
>>     
One thing you could analyze is the TCP socket timeout times in the 
different environments.. If you drop a file with the name 
"nhttp.properties" into the ESB classpath, and add the following line 
into it, you can change the Synapse socket timeout 
"http.socket.timeout=60000" <- this is in ms. Maybe you can do a similar 
thing with the BEA server..

asankha

-- 
Asankha C. Perera

WSO2 - http://wso2.org
http://esbmagic.blogspot.com


Re: Possible Causes for "Connection reset by peer" when using NIO

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Wed, 2008-06-25 at 18:13 +0200, Hubert, Eric wrote:
> Hi devs!
> 
> first of all I'd like to apologize for posting a "user-problem" to two dev-lists. I only did this as have not much background knowledge of the NIO implementation and think a solid understanding of NIO is necessary to help tackling our problem.
> 
> We are using the WSO2 ESB which is based on Apache Synapse, Apache Axis2 and the HTTP Core NIO module. As the stacktrace only contains http-nio details, I cc'ed the http components dev list. Hopefully someone can help out.
> 
> When sending about 3000 Hessian-requests per hour from clients (Tomcat) over the ESB (Synapse 1.2 running on JDK 1.5.15, Linux 2.6.23.1-amd64-75) to a Bea Weblogic 8.1 we see about 1 to 10 exceptions of type "java.io.IOException: Connection reset by peer" in the ESB-log. 
> 
> If I understand it right the ESB then executes a failover to the next service node as we are using a load balancing group. So the client is not affected, but the endpoint with the failure will be marked as inactive.
> 
> The problem is I don't understand the cause of this exception. It occurs during the read on a Socket-Channel. So I think the server might close the connection while the ESB is reading. But maybe internally some kind of pool is used and a connection can change to some abnormal state?

Unlikely. This kind of I/O exception occurs when the connection is
closed by the _remote_ side.   

> 
> We have seen such Exceptions before when we were using HTTP 1.1 in combination with the Bea Weblogic server. Very likely an issue with HTTP keepalive (persistent connections). So for any connection to a Bea service we use the property mediator of Synapse to change the connection ESB <-> Bea to use HTTP 1.0:
> <syn:property name="FORCE_HTTP_1.0" value="true" scope="axis2" />
> 

Connection reset by peer I/O exceptions are perfectly normal with
persistent HTTP connections. The most likely cause of it is that the
connection was closed on the server side due to the timeout (maximum
period of inactivity) about the same moment the client started sending
data to the server. Situations like that can happen.

> Since then we hadn't seen this exception again. But now switching to another environment we see this exception again, but only for Hessian services.
> I have no clue what else could cause this exception. How can we detect the cause? How to narrow down possible causes, if there are different possibilities. I don't expect any network outages to be the reason, as other services (SOAP)-based are working pretty well.
> 

I do not know Hessian well enough protocol to comment on it .

Hope this helps

Oleg

> The exact exception we are getting is:
> 
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:206)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
>         at org.apache.http.impl.nio.reactor.SessionInputBufferImpl.fill(SessionInputBufferImpl.java:85)
>         at org.apache.http.impl.nio.codecs.AbstractMessageParser.fillBuffer(AbstractMessageParser.java:97)
>         at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:113)
>         at org.apache.http.impl.nio.DefaultClientIOEventDispatch.inputReady(DefaultClientIOEventDispatch.java:99)
>         at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:98)
>         at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:195)
>         at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:180)
>         at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:142)
>         at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:70)
>         at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:318) 
> 
> 
> This exception occurs consistently a few time per hour on every possible combination of client node, esb node and service endpoint node.
> 
> Any pointer or idea is greatly appreciated. Thanks a lot in advance!
> 
> 
> Regards,
>    Eric
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: Possible Causes for "Connection reset by peer" when using NIO

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Wed, 2008-06-25 at 18:13 +0200, Hubert, Eric wrote:
> Hi devs!
> 
> first of all I'd like to apologize for posting a "user-problem" to two dev-lists. I only did this as have not much background knowledge of the NIO implementation and think a solid understanding of NIO is necessary to help tackling our problem.
> 
> We are using the WSO2 ESB which is based on Apache Synapse, Apache Axis2 and the HTTP Core NIO module. As the stacktrace only contains http-nio details, I cc'ed the http components dev list. Hopefully someone can help out.
> 
> When sending about 3000 Hessian-requests per hour from clients (Tomcat) over the ESB (Synapse 1.2 running on JDK 1.5.15, Linux 2.6.23.1-amd64-75) to a Bea Weblogic 8.1 we see about 1 to 10 exceptions of type "java.io.IOException: Connection reset by peer" in the ESB-log. 
> 
> If I understand it right the ESB then executes a failover to the next service node as we are using a load balancing group. So the client is not affected, but the endpoint with the failure will be marked as inactive.
> 
> The problem is I don't understand the cause of this exception. It occurs during the read on a Socket-Channel. So I think the server might close the connection while the ESB is reading. But maybe internally some kind of pool is used and a connection can change to some abnormal state?

Unlikely. This kind of I/O exception occurs when the connection is
closed by the _remote_ side.   

> 
> We have seen such Exceptions before when we were using HTTP 1.1 in combination with the Bea Weblogic server. Very likely an issue with HTTP keepalive (persistent connections). So for any connection to a Bea service we use the property mediator of Synapse to change the connection ESB <-> Bea to use HTTP 1.0:
> <syn:property name="FORCE_HTTP_1.0" value="true" scope="axis2" />
> 

Connection reset by peer I/O exceptions are perfectly normal with
persistent HTTP connections. The most likely cause of it is that the
connection was closed on the server side due to the timeout (maximum
period of inactivity) about the same moment the client started sending
data to the server. Situations like that can happen.

> Since then we hadn't seen this exception again. But now switching to another environment we see this exception again, but only for Hessian services.
> I have no clue what else could cause this exception. How can we detect the cause? How to narrow down possible causes, if there are different possibilities. I don't expect any network outages to be the reason, as other services (SOAP)-based are working pretty well.
> 

I do not know Hessian well enough protocol to comment on it .

Hope this helps

Oleg

> The exact exception we are getting is:
> 
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:206)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
>         at org.apache.http.impl.nio.reactor.SessionInputBufferImpl.fill(SessionInputBufferImpl.java:85)
>         at org.apache.http.impl.nio.codecs.AbstractMessageParser.fillBuffer(AbstractMessageParser.java:97)
>         at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:113)
>         at org.apache.http.impl.nio.DefaultClientIOEventDispatch.inputReady(DefaultClientIOEventDispatch.java:99)
>         at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:98)
>         at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:195)
>         at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:180)
>         at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:142)
>         at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:70)
>         at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:318) 
> 
> 
> This exception occurs consistently a few time per hour on every possible combination of client node, esb node and service endpoint node.
> 
> Any pointer or idea is greatly appreciated. Thanks a lot in advance!
> 
> 
> Regards,
>    Eric
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
For additional commands, e-mail: dev-help@synapse.apache.org