You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@synapse.apache.org by Gregory Van seghbroeck <gr...@intec.ugent.be> on 2007/11/06 10:47:20 UTC
Synapse looses messages under high load
Dear All,
We have configured and extended synapse to work as a simple message
forwarder. The incoming messages get forwarded in a round-robin way to 2
external Tomcat-axis2 servers.
This works extremely well with a few messages (even if we use several
simultaneous clients). But when we stress test our setup, we notice
something peculiar. We have sent in a particular test run 1110 requests
to synapse, but at the client we only got 878 responses back. Via
monitoring and logging we noticed that our Tomcat-Axis2 servers handled
all the incoming messages correctly, but somewhere between receiving
the response in synapse and sending it to the clients 232 messages were
lost.
Some more monitoring and logging showed us that the Synapse server
machine receives all responses back from our Tomcat-Axis2 servers (this
was done via a network sniffer). So something internally in Synapse
causes the lost messages.
The debug-logs on the org.apache.axis2.transport classes showed a
discrepancy between the content encoder (/[DEBUG] 06 nov 10:51:27.828 AM
I/O reactor worker thread 2
[org.apache.axis2.transport.nhttp.ClientHandler]HTTP connection
[/157.193.215.56:9090]: Content encoder [chunk-coded; completed: true]/)
en content decoder (/BUG] 06 nov 10:51:28.718 AM I/O reactor worker
thread 2 [org.apache.axis2.transport.nhttp.ClientHandler]HTTP connection
[/157.193.215.56:9090]: Content decoder [chunk-coded; completed: true]/)
for both Tomcat-axis2 servers. We noticed 1110 of the content encoder
entries and only 878 entries for the content decoder. These numbers fit
to perfectly to be coincidence.
Is there anybody on this mailing-list that has an idee what may cause
this problem and how to resolve it?
Many thanks,
Gregory
Re: Synapse looses messages under high load
Posted by "Asankha C. Perera" <as...@wso2.com>.
Hi Gregory
> You might have pinpointed the problem. One run of the DummyService
> takes about 900 ms. But since we really calculate something it will
> keep the processor busy when there are simultaneous runs of the
> DummyService. We notice that the service time grows semi-linear.
> With our new tests in our Linux environment, we noticed dropped
> message from 35 simultaneous users. So your 30 s might also be pretty
> accurate, when our Tomcat servers receive 35 simultaneous requests,
> the DummyService will run approximately 31 s.
> Can you please help us fine tuning our Linux environment, but I have
> to mention one more thing: I'm a Linux novice.
>
So this seems like a problem we can solve with proper tuning. From what
I understand, it seems like your client times out the connection from
its end, and I will need to know your exact OS and its version and the
Client software that you use (i.e. is it Axis2, .Net etc..). If you are
going to host Synapse on a Linux machine, the basic tuning I would
recommend is the following. You will need to edit these as "root" and
then reboot the system.
Edit /etc/sysctl.conf and add the following:
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_fin_timeout = 30
fs.file-max = 2097152
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
Edit /etc/security/limits.conf and add the following:
* soft nofile 4096
* hard nofile 65535
Although I am not specialized in tuning Windows etc. I will try to find
the relevant information for you once I know exactly what you use on
your client side.
asankha
---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-user-help@ws.apache.org
Re: Synapse looses messages under high load
Posted by Gregory Van seghbroeck <gr...@intec.ugent.be>.
Asankha,
You might have pinpointed the problem. One run of the DummyService takes
about 900 ms. But since we really calculate something it will keep the
processor busy when there are simultaneous runs of the DummyService. We
notice that the service time grows semi-linear.
With our new tests in our Linux environment, we noticed dropped message
from 35 simultaneous users. So your 30 s might also be pretty accurate,
when our Tomcat servers receive 35 simultaneous requests, the
DummyService will run approximately 31 s.
Can you please help us fine tuning our Linux environment, but I have to
mention one more thing: I'm a Linux novice.
Regards and many thanks,
Gregory
Asankha C. Perera wrote:
> Hi Gregory
>
> I finally got the time to look into your code after being almost done
> building the final Synapse 1.1 artifacts..
>
> I think I found your problem, do you see the time being printed on the
> System.out by the DummyService being more than 30 seconds or so for
> some reason before the errors happen?
>
> e.g.
> time: 18.555
> time: 17.479
> time: 17.062
> time: 6.689
> time: 3.633
> time: 0.946
> time: 0.939
>
> Even if what you see is ~25 seconds.. there is a good possibility that
> some of the requests exceed the default socket timeout of your system
> which usually could be ~30 seconds. I load tested Synapse with our
> SimpleStockQuoteService and another faster version of that I typically
> use, and we still can do many thousands of connections without any
> issues as before.
>
> So first, can you try to make your DummyService reply faster.. say
> within a second or so and try your load test again? If this works, you
> could play around with the OS tuning to support larger timeouts and if
> you use Linux by then I could help you :-)
>
> asankha
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
> For additional commands, e-mail: synapse-user-help@ws.apache.org
Re: Synapse looses messages under high load
Posted by "Asankha C. Perera" <as...@wso2.com>.
Hi Gregory
I finally got the time to look into your code after being almost done
building the final Synapse 1.1 artifacts..
I think I found your problem, do you see the time being printed on the
System.out by the DummyService being more than 30 seconds or so for some
reason before the errors happen?
e.g.
time: 18.555
time: 17.479
time: 17.062
time: 6.689
time: 3.633
time: 0.946
time: 0.939
Even if what you see is ~25 seconds.. there is a good possibility that
some of the requests exceed the default socket timeout of your system
which usually could be ~30 seconds. I load tested Synapse with our
SimpleStockQuoteService and another faster version of that I typically
use, and we still can do many thousands of connections without any
issues as before.
So first, can you try to make your DummyService reply faster.. say
within a second or so and try your load test again? If this works, you
could play around with the OS tuning to support larger timeouts and if
you use Linux by then I could help you :-)
asankha
---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-user-help@ws.apache.org
Re: Synapse looses messages under high load
Posted by Dan Retzlaff <dr...@gmail.com>.
I'm new to Synapse, but without HTTP keep-alive I'd suspect ephemeral port
exhaustion. (Each client connection uses a port between 1025 and 5000, which
I believe the OS won't re-use within the TCP Maximum Segment Lifetime of
~120 seconds.) It sounds like Asankha may be thinking along the same lines.
On Nov 7, 2007 3:26 AM, Gregory Van seghbroeck <
gregory.vanseghbroeck@intec.ugent.be> wrote:
> Hey Asankha,
>
> Thanks a lot for the quick response.
>
> > You do not state your client environment, I am specifically interested
> > to see if it sends HTTP 1.0 or 1.1 requests to Synapse, and if
> > Keepalives are used
> Our client application also uses the HTTP/1.1 protocol, but when working
> with the HTTP/1.1 protocol we cannot choose to set the Keepalives
> parameter. The persistence parameter is false.
> > So are you saying that a direct client to Tomcat test has issues?
> No, we left Synapse in the middle. But the only thing Synapse does is
> forwarding the incoming message to the Tomcat server. Here is the simple
> synapse.xml file:
> <definitions xmlns="http://ws.apache.org/ns/synapse">
> <in>
> <send>
> <endpoint>
> <address
> uri="http://157.193.215.56:9090/axis2/services/DummyService2" />
> </endpoint>
> </send>
> </in>
> <out>
> <send />
> </out>
> </definitions>
> > This is great and I will look forward to your results. Meanwhile, if
> > you can help me reproduce this test with the Apache Bench Java clone
> > (that supports chunking, SSL etc - check the link I gave earlier for
> > this) and Synapse, I can help you find the issue. You may send any
> > confidential information privately to me if you please
> I,ve attached 3 files: our roundrobin mediator, the Web Service
> (DummyService2) and the synapse.xml of the first tests. As you can see
> in the attached files roundrobin is just part of our tests. But
> currently the client is calling Synapse with the following command: POST
> http://<synapse's ip-address>:8000/roundrobin and the following
> SOAP-message:
> <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/
> ">
> <soapenv:Body />
> </soapenv:Envelope>
>
>
> If there is anything else you need to clone our environment, do not
> hesitate to send me an e-mail.
>
> Kind regards and thanks a lot,
> Gregory
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
> For additional commands, e-mail: synapse-user-help@ws.apache.org
>
Re: Synapse looses messages under high load
Posted by Gregory Van seghbroeck <gr...@intec.ugent.be>.
Hey Asankha,
Thanks a lot for the quick response.
> You do not state your client environment, I am specifically interested
> to see if it sends HTTP 1.0 or 1.1 requests to Synapse, and if
> Keepalives are used
Our client application also uses the HTTP/1.1 protocol, but when working
with the HTTP/1.1 protocol we cannot choose to set the Keepalives
parameter. The persistence parameter is false.
> So are you saying that a direct client to Tomcat test has issues?
No, we left Synapse in the middle. But the only thing Synapse does is
forwarding the incoming message to the Tomcat server. Here is the simple
synapse.xml file:
<definitions xmlns="http://ws.apache.org/ns/synapse">
<in>
<send>
<endpoint>
<address
uri="http://157.193.215.56:9090/axis2/services/DummyService2" />
</endpoint>
</send>
</in>
<out>
<send />
</out>
</definitions>
> This is great and I will look forward to your results. Meanwhile, if
> you can help me reproduce this test with the Apache Bench Java clone
> (that supports chunking, SSL etc - check the link I gave earlier for
> this) and Synapse, I can help you find the issue. You may send any
> confidential information privately to me if you please
I,ve attached 3 files: our roundrobin mediator, the Web Service
(DummyService2) and the synapse.xml of the first tests. As you can see
in the attached files roundrobin is just part of our tests. But
currently the client is calling Synapse with the following command: POST
http://<synapse's ip-address>:8000/roundrobin and the following
SOAP-message:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Body />
</soapenv:Envelope>
If there is anything else you need to clone our environment, do not
hesitate to send me an e-mail.
Kind regards and thanks a lot,
Gregory
Re: Synapse looses messages under high load
Posted by "Asankha C. Perera" <as...@wso2.com>.
Hi Gregory
> We are currently using the Synapse 1.0 release. Our Synapse Machine's
> OS is Windows Server 2003 R2 (Enterprise Edition) SP1 with a JDK
> 1.6.0_02-b06.
We haven't tested NIO code with JDK 1.6.x but only with 1.5.x.. from
testing carried out by me on Windows environments, the NIO
implementation on Windows has some issues, and I would strongly suggest
using Linux as an alternative. You could also perform some of the tuning
suggested in http://wso2.org/library/1721 at the bottom.
> Both Tomcat Servers run on a Windows 2000 (5.00.2195) OS with JDK
> 1.5.0_06-b05 and JDK 1.5.0_07-b03. The version of Axis2 is 1.1.1 and
> of Tomcat is 5.5.20. As you already could have guessed, these servers
> run on different physical machines, but they are directly connected to
> each other (our Synapse server machine has multiple network cards).
You do not state your client environment, I am specifically interested
to see if it sends HTTP 1.0 or 1.1 requests to Synapse, and if
Keepalives are used
> We performed the new requested tests (simple message forwarding to one
> of our Tomcat machines). We noticed the same disturbing behaviour
> (even with a dropped load - our Tomcat server couldn't cope with the
> same throughput as with the other tests). The client made 607
> invocations and only received 304 correct answers returned to the client.
So are you saying that a direct client to Tomcat test has issues?
> We are going to test our Synapse broker on a Linux machine (I've read
> somewhere that there might be some concurrency related problems with
> Windows and Java's NIO). I let you know what the outcome is.
This is great and I will look forward to your results. Meanwhile, if you
can help me reproduce this test with the Apache Bench Java clone (that
supports chunking, SSL etc - check the link I gave earlier for this) and
Synapse, I can help you find the issue. You may send any confidential
information privately to me if you please
asankha
---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-user-help@ws.apache.org
Re: Synapse looses messages under high load
Posted by Gregory Van seghbroeck <gr...@intec.ugent.be>.
Hi Asankha,
We are currently using the Synapse 1.0 release. Our Synapse Machine's OS
is Windows Server 2003 R2 (Enterprise Edition) SP1 with a JDK
1.6.0_02-b06. Both Tomcat Servers run on a Windows 2000 (5.00.2195) OS
with JDK 1.5.0_06-b05 and JDK 1.5.0_07-b03. The version of Axis2 is
1.1.1 and of Tomcat is 5.5.20. As you already could have guessed, these
servers run on different physical machines, but they are directly
connected to each other (our Synapse server machine has multiple network
cards).
Regarding your question about our extensions. We just implemented a new
mediator that sets the "To" attribute of the address via
"MessageContext.setTo()". When the new address is set, we just use the
default send-mediator to forward the entire message (with the changed
To-address of course) to the respective Tomcat Server. So we are not
using Synapse's load balancing mechanism (mediators seemed the fastest
way to implement our goal).
We performed the new requested tests (simple message forwarding to one
of our Tomcat machines). We noticed the same disturbing behaviour (even
with a dropped load - our Tomcat server couldn't cope with the same
throughput as with the other tests). The client made 607 invocations and
only received 304 correct answers returned to the client. We also
noticed the same entries in our logs: 607 /[/157.193.215.56:9090]:
Content encoder [chunk-coded; completed: true]/ and /304
[/157.193.215.56:9090]: Content decoder [chunk-coded; completed: true]/.
We are going to test our Synapse broker on a Linux machine (I've read
somewhere that there might be some concurrency related problems with
Windows and Java's NIO). I let you know what the outcome is.
Thanks in advance,
Gregory
Asankha C. Perera wrote:
> Hi Gregory
>
> Could you let me know the version of HTTP (1.0 or 1.1) being used
> between your client and Synapse, and between Synapse and Tomcat. Are
> you using SSL by any means. Also let me know the OS version, and your
> JDK version (including the minor version). Do you see anything in the
> log files? Are the client, Synapse and Tomcat on different physical
> hosts, virtual machines or same host? Also what do you mean by "we
> have extended Synapse.." have you done any code changes? Also are you
> using the Synapse 1.0 release or the 1.1 RC?
>
>> This works extremely well with a few messages (even if we use several
>> simultaneous clients). But when we stress test our setup, we notice
>> something peculiar. We have sent in a particular test run 1110
>> requests to synapse, but at the client we only got 878 responses back.
> Are you using a load balanced endpoint? Do you think you loose
> messages when load balancing is not used? Could you verify with a
> simple test in this same environment?
>> The debug-logs on the org.apache.axis2.transport classes showed a
>> discrepancy between the content encoder (/[DEBUG] 06 nov 10:51:27.828
>> AM I/O reactor worker thread 2
>> [org.apache.axis2.transport.nhttp.ClientHandler]HTTP connection
>> [/157.193.215.56:9090]: Content encoder [chunk-coded; completed:
>> true]/) en content decoder (/BUG] 06 nov 10:51:28.718 AM I/O reactor
>> worker thread 2 [org.apache.axis2.transport.nhttp.ClientHandler]HTTP
>> connection [/157.193.215.56:9090]: Content decoder [chunk-coded;
>> completed: true]/) for both Tomcat-axis2 servers. We noticed 1110 of
>> the content encoder entries and only 878 entries for the content
>> decoder. These numbers fit to perfectly to be coincidence.
> Again.. we just renamed our packaging of the NIO based http/s
> transports to org.apache.synapse.transport.nhttp .. so either you are
> using Synapse 1.0 or the 1.1 RC with a possibly older transport?
>
> asankha
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
> For additional commands, e-mail: synapse-user-help@ws.apache.org
Re: Synapse looses messages under high load
Posted by "Asankha C. Perera" <as...@wso2.com>.
Hi Gregory
Could you let me know the version of HTTP (1.0 or 1.1) being used
between your client and Synapse, and between Synapse and Tomcat. Are you
using SSL by any means. Also let me know the OS version, and your JDK
version (including the minor version). Do you see anything in the log
files? Are the client, Synapse and Tomcat on different physical hosts,
virtual machines or same host? Also what do you mean by "we have
extended Synapse.." have you done any code changes? Also are you using
the Synapse 1.0 release or the 1.1 RC?
> This works extremely well with a few messages (even if we use several
> simultaneous clients). But when we stress test our setup, we notice
> something peculiar. We have sent in a particular test run 1110
> requests to synapse, but at the client we only got 878 responses back.
Are you using a load balanced endpoint? Do you think you loose messages
when load balancing is not used? Could you verify with a simple test in
this same environment?
> The debug-logs on the org.apache.axis2.transport classes showed a
> discrepancy between the content encoder (/[DEBUG] 06 nov 10:51:27.828
> AM I/O reactor worker thread 2
> [org.apache.axis2.transport.nhttp.ClientHandler]HTTP connection
> [/157.193.215.56:9090]: Content encoder [chunk-coded; completed:
> true]/) en content decoder (/BUG] 06 nov 10:51:28.718 AM I/O reactor
> worker thread 2 [org.apache.axis2.transport.nhttp.ClientHandler]HTTP
> connection [/157.193.215.56:9090]: Content decoder [chunk-coded;
> completed: true]/) for both Tomcat-axis2 servers. We noticed 1110 of
> the content encoder entries and only 878 entries for the content
> decoder. These numbers fit to perfectly to be coincidence.
Again.. we just renamed our packaging of the NIO based http/s transports
to org.apache.synapse.transport.nhttp .. so either you are using Synapse
1.0 or the 1.1 RC with a possibly older transport?
asankha
---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-user-help@ws.apache.org