You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@synapse.apache.org by Gregory Van seghbroeck <gr...@intec.ugent.be> on 2007/11/06 10:47:20 UTC

Synapse looses messages under high load

Dear All,

We have configured and extended synapse to work as a simple message 
forwarder. The incoming messages get forwarded in a round-robin way to 2 
external Tomcat-axis2 servers.
This works extremely well with a few messages (even if we use several 
simultaneous clients). But when we stress test our setup, we notice 
something peculiar. We have sent in a particular test run 1110 requests 
to synapse, but at the client we only got 878 responses back. Via 
monitoring and logging we noticed that our Tomcat-Axis2 servers  handled 
all the  incoming messages correctly, but somewhere between receiving 
the response in synapse and sending it to the clients 232 messages were 
lost.
Some more monitoring and logging showed us that the Synapse server 
machine receives all responses back from our Tomcat-Axis2 servers (this 
was done via a network sniffer). So something internally in Synapse 
causes the lost messages.
The debug-logs on the org.apache.axis2.transport classes showed a 
discrepancy between the content encoder (/[DEBUG] 06 nov 10:51:27.828 AM 
I/O reactor worker thread 2 
[org.apache.axis2.transport.nhttp.ClientHandler]HTTP connection 
[/157.193.215.56:9090]: Content encoder [chunk-coded; completed: true]/) 
en content decoder (/BUG] 06 nov 10:51:28.718 AM I/O reactor worker 
thread 2 [org.apache.axis2.transport.nhttp.ClientHandler]HTTP connection 
[/157.193.215.56:9090]: Content decoder [chunk-coded; completed: true]/) 
for both Tomcat-axis2 servers. We noticed 1110 of the content encoder 
entries and only 878 entries for the content decoder. These numbers fit 
to perfectly to be coincidence.

Is there anybody on this mailing-list that has an idee what may cause 
this problem and how to resolve it?

Many thanks,
Gregory


Re: Synapse looses messages under high load

Posted by "Asankha C. Perera" <as...@wso2.com>.
Hi Gregory
> You might have pinpointed the problem. One run of the DummyService 
> takes about 900 ms. But since we really calculate something it will 
> keep the processor busy when there are simultaneous runs of the 
> DummyService. We notice that the service time grows semi-linear.
> With our new tests in our Linux environment, we noticed dropped 
> message from 35 simultaneous users. So your 30 s might also be pretty 
> accurate, when our Tomcat servers receive 35 simultaneous requests, 
> the DummyService will run approximately 31 s.
> Can you please help us fine tuning our Linux environment, but I have 
> to mention one more thing: I'm a Linux novice.
>
So this seems like a problem we can solve with proper tuning. From what 
I understand, it seems like your client times out the connection from 
its end, and I will need to know your exact OS and its version and the 
Client software that you use (i.e. is it Axis2, .Net etc..). If you are 
going to host Synapse on a Linux machine, the basic tuning I would 
recommend is the following. You will need to edit these as "root" and 
then reboot the system.

Edit /etc/sysctl.conf and add the following:

net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_fin_timeout = 30
fs.file-max = 2097152
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

Edit /etc/security/limits.conf and add the following:

* soft nofile 4096
* hard nofile 65535

Although I am not specialized in tuning Windows etc. I will try to find 
the relevant information for you once I know exactly what you use on 
your client side.

asankha

---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-user-help@ws.apache.org


Re: Synapse looses messages under high load

Posted by Gregory Van seghbroeck <gr...@intec.ugent.be>.
Asankha,

You might have pinpointed the problem. One run of the DummyService takes 
about 900 ms. But since we really calculate something it will keep the 
processor busy when there are simultaneous runs of the DummyService. We 
notice that the service time grows semi-linear.
With our new tests in our Linux environment, we noticed dropped message 
from 35 simultaneous users. So your 30 s might also be pretty accurate, 
when our Tomcat servers receive 35 simultaneous requests, the 
DummyService will run approximately 31 s.
Can you please help us fine tuning our Linux environment, but I have to 
mention one more thing: I'm a Linux novice.

Regards and many thanks,
Gregory

Asankha C. Perera wrote:
> Hi Gregory
>
> I finally got the time to look into your code after being almost done 
> building the final Synapse 1.1 artifacts..
>
> I think I found your problem, do you see the time being printed on the 
> System.out by the DummyService being more than 30 seconds or so for 
> some reason before the errors happen?
>
> e.g.
> time: 18.555
> time: 17.479
> time: 17.062
> time: 6.689
> time: 3.633
> time: 0.946
> time: 0.939
>
> Even if what you see is ~25 seconds.. there is a good possibility that 
> some of the requests exceed the default socket timeout of your system 
> which usually could be ~30 seconds. I load tested Synapse with our 
> SimpleStockQuoteService and another faster version of that I typically 
> use, and we still can do many thousands of connections without any 
> issues as before.
>
> So first, can you try to make your DummyService reply faster.. say 
> within a second or so and try your load test again? If this works, you 
> could play around with the OS tuning to support larger timeouts and if 
> you use Linux by then I could help you :-)
>
> asankha
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
> For additional commands, e-mail: synapse-user-help@ws.apache.org



Re: Synapse looses messages under high load

Posted by "Asankha C. Perera" <as...@wso2.com>.
Hi Gregory

I finally got the time to look into your code after being almost done 
building the final Synapse 1.1 artifacts..

I think I found your problem, do you see the time being printed on the 
System.out by the DummyService being more than 30 seconds or so for some 
reason before the errors happen?

e.g.
time: 18.555
time: 17.479
time: 17.062
time: 6.689
time: 3.633
time: 0.946
time: 0.939

Even if what you see is ~25 seconds.. there is a good possibility that 
some of the requests exceed the default socket timeout of your system 
which usually could be ~30 seconds. I load tested Synapse with our 
SimpleStockQuoteService and another faster version of that I typically 
use, and we still can do many thousands of connections without any 
issues as before.

So first, can you try to make your DummyService reply faster.. say 
within a second or so and try your load test again? If this works, you 
could play around with the OS tuning to support larger timeouts and if 
you use Linux by then I could help you :-)

asankha

---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-user-help@ws.apache.org


Re: Synapse looses messages under high load

Posted by Dan Retzlaff <dr...@gmail.com>.
I'm new to Synapse, but without HTTP keep-alive I'd suspect ephemeral port
exhaustion. (Each client connection uses a port between 1025 and 5000, which
I believe the OS won't re-use within the TCP Maximum Segment Lifetime of
~120 seconds.) It sounds like Asankha may be thinking along the same lines.

On Nov 7, 2007 3:26 AM, Gregory Van seghbroeck <
gregory.vanseghbroeck@intec.ugent.be> wrote:

> Hey Asankha,
>
> Thanks a lot for the quick response.
>
> > You do not state your client environment, I am specifically interested
> > to see if it sends HTTP 1.0 or 1.1 requests to Synapse, and if
> > Keepalives are used
> Our client application also uses the HTTP/1.1 protocol, but when working
> with the HTTP/1.1 protocol we cannot choose to set the Keepalives
> parameter. The persistence parameter is false.
> > So are you saying that a direct client to Tomcat test has issues?
> No, we left Synapse in the middle. But the only thing Synapse does is
> forwarding the incoming message to the Tomcat server. Here is the simple
> synapse.xml file:
> <definitions xmlns="http://ws.apache.org/ns/synapse">
>    <in>
>        <send>
>            <endpoint>
>                <address
> uri="http://157.193.215.56:9090/axis2/services/DummyService2" />
>            </endpoint>
>        </send>
>    </in>
>    <out>
>        <send />
>    </out>
> </definitions>
> > This is great and I will look forward to your results. Meanwhile, if
> > you can help me reproduce this test with the Apache Bench Java clone
> > (that supports chunking, SSL etc - check the link I gave earlier for
> > this) and Synapse, I can help you find the issue. You may send any
> > confidential information privately to me if you please
> I,ve attached 3 files: our roundrobin mediator, the Web Service
> (DummyService2) and the synapse.xml of the first tests. As you can see
> in the attached files roundrobin is just part of our tests. But
> currently the client is calling Synapse with the following command: POST
> http://<synapse's ip-address>:8000/roundrobin and the following
> SOAP-message:
> <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/
> ">
>    <soapenv:Body />
> </soapenv:Envelope>
>
>
> If there is anything else you need to clone our environment, do not
> hesitate to send me an e-mail.
>
> Kind regards and thanks a lot,
> Gregory
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
> For additional commands, e-mail: synapse-user-help@ws.apache.org
>

Re: Synapse looses messages under high load

Posted by Gregory Van seghbroeck <gr...@intec.ugent.be>.
Hey Asankha,

Thanks a lot for the quick response.

> You do not state your client environment, I am specifically interested 
> to see if it sends HTTP 1.0 or 1.1 requests to Synapse, and if 
> Keepalives are used
Our client application also uses the HTTP/1.1 protocol, but when working 
with the HTTP/1.1 protocol we cannot choose to set the Keepalives 
parameter. The persistence parameter is false.
> So are you saying that a direct client to Tomcat test has issues?
No, we left Synapse in the middle. But the only thing Synapse does is 
forwarding the incoming message to the Tomcat server. Here is the simple 
synapse.xml file:
<definitions xmlns="http://ws.apache.org/ns/synapse">
    <in>
        <send>
            <endpoint>
                <address 
uri="http://157.193.215.56:9090/axis2/services/DummyService2" />
            </endpoint>
        </send>
    </in>
    <out>
        <send />
    </out>
</definitions>
> This is great and I will look forward to your results. Meanwhile, if 
> you can help me reproduce this test with the Apache Bench Java clone 
> (that supports chunking, SSL etc - check the link I gave earlier for 
> this) and Synapse, I can help you find the issue. You may send any 
> confidential information privately to me if you please
I,ve attached 3 files: our roundrobin mediator, the Web Service 
(DummyService2) and the synapse.xml of the first tests. As you can see 
in the attached files roundrobin is just part of our tests. But 
currently the client is calling Synapse with the following command: POST 
http://<synapse's ip-address>:8000/roundrobin and the following 
SOAP-message:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
    <soapenv:Body />
</soapenv:Envelope>


If there is anything else you need to clone our environment, do not 
hesitate to send me an e-mail.

Kind regards and thanks a lot,
Gregory

Re: Synapse looses messages under high load

Posted by "Asankha C. Perera" <as...@wso2.com>.
Hi Gregory
> We are currently using the Synapse 1.0 release. Our Synapse Machine's 
> OS is Windows Server 2003 R2 (Enterprise Edition) SP1 with a JDK 
> 1.6.0_02-b06. 
We haven't tested NIO code with JDK 1.6.x but only with 1.5.x.. from 
testing carried out by me on Windows environments, the NIO 
implementation on Windows has some issues, and I would strongly suggest 
using Linux as an alternative. You could also perform some of the tuning 
suggested in http://wso2.org/library/1721 at the bottom.
> Both Tomcat Servers run on a Windows 2000 (5.00.2195) OS with JDK 
> 1.5.0_06-b05 and JDK 1.5.0_07-b03. The version of Axis2 is 1.1.1 and 
> of Tomcat is 5.5.20. As you already could have guessed, these servers 
> run on different physical machines, but they are directly connected to 
> each other (our Synapse server machine has multiple network cards).
You do not state your client environment, I am specifically interested 
to see if it sends HTTP 1.0 or 1.1 requests to Synapse, and if 
Keepalives are used
> We performed the new requested tests (simple message forwarding to one 
> of our Tomcat machines). We noticed the same disturbing behaviour 
> (even with a dropped load - our Tomcat server couldn't cope with the 
> same throughput as with the other tests). The client made 607 
> invocations and only received 304 correct answers returned to the client. 
So are you saying that a direct client to Tomcat test has issues?
> We are going to test our Synapse broker on a Linux machine (I've read 
> somewhere that there might be some concurrency related problems with 
> Windows and Java's NIO). I let you know what the outcome is.
This is great and I will look forward to your results. Meanwhile, if you 
can help me reproduce this test with the Apache Bench Java clone (that 
supports chunking, SSL etc - check the link I gave earlier for this) and 
Synapse, I can help you find the issue. You may send any confidential 
information privately to me if you please

asankha

---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-user-help@ws.apache.org


Re: Synapse looses messages under high load

Posted by Gregory Van seghbroeck <gr...@intec.ugent.be>.
Hi Asankha,

We are currently using the Synapse 1.0 release. Our Synapse Machine's OS 
is Windows Server 2003 R2 (Enterprise Edition) SP1 with a JDK 
1.6.0_02-b06. Both Tomcat Servers run on a Windows 2000 (5.00.2195) OS 
with JDK 1.5.0_06-b05 and JDK 1.5.0_07-b03. The version of Axis2 is 
1.1.1 and of Tomcat is 5.5.20. As you already could have guessed, these 
servers run on different physical machines, but they are directly 
connected to each other (our Synapse server machine has multiple network 
cards).
Regarding your question about our extensions. We just implemented a new 
mediator that sets the "To" attribute of the address via 
"MessageContext.setTo()". When the new address is set, we just use the 
default send-mediator to forward the entire message (with the changed 
To-address of course) to the respective Tomcat Server. So we are not 
using Synapse's load balancing mechanism (mediators seemed the fastest 
way to implement our goal).
We performed the new requested tests (simple message forwarding to one 
of our Tomcat machines). We noticed the same disturbing behaviour (even 
with a dropped load - our Tomcat server couldn't cope with the same 
throughput as with the other tests). The client made 607 invocations and 
only received 304 correct answers returned to the client. We also 
noticed the same entries in our logs: 607 /[/157.193.215.56:9090]: 
Content encoder [chunk-coded; completed: true]/ and /304 
[/157.193.215.56:9090]: Content decoder [chunk-coded; completed: true]/.

We are going to test our Synapse broker on a Linux machine (I've read 
somewhere that there might be some concurrency related problems with 
Windows and Java's NIO). I let you know what the outcome is.

Thanks in advance,
Gregory

Asankha C. Perera wrote:
> Hi Gregory
>
> Could you let me know the version of HTTP (1.0 or 1.1) being used 
> between your client and Synapse, and between Synapse and Tomcat. Are 
> you using SSL by any means. Also let me know the OS version, and your 
> JDK version (including the minor version). Do you see anything in the 
> log files? Are the client, Synapse and Tomcat on different physical 
> hosts, virtual machines or same host? Also what do you mean by "we 
> have extended Synapse.." have you done any code changes? Also are you 
> using the Synapse 1.0 release or the 1.1 RC?
>
>> This works extremely well with a few messages (even if we use several 
>> simultaneous clients). But when we stress test our setup, we notice 
>> something peculiar. We have sent in a particular test run 1110 
>> requests to synapse, but at the client we only got 878 responses back.
> Are you using a load balanced endpoint? Do you think you loose 
> messages when load balancing is not used? Could you verify with a 
> simple test in this same environment?
>> The debug-logs on the org.apache.axis2.transport classes showed a 
>> discrepancy between the content encoder (/[DEBUG] 06 nov 10:51:27.828 
>> AM I/O reactor worker thread 2 
>> [org.apache.axis2.transport.nhttp.ClientHandler]HTTP connection 
>> [/157.193.215.56:9090]: Content encoder [chunk-coded; completed: 
>> true]/) en content decoder (/BUG] 06 nov 10:51:28.718 AM I/O reactor 
>> worker thread 2 [org.apache.axis2.transport.nhttp.ClientHandler]HTTP 
>> connection [/157.193.215.56:9090]: Content decoder [chunk-coded; 
>> completed: true]/) for both Tomcat-axis2 servers. We noticed 1110 of 
>> the content encoder entries and only 878 entries for the content 
>> decoder. These numbers fit to perfectly to be coincidence.
> Again.. we just renamed our packaging of the NIO based http/s 
> transports to org.apache.synapse.transport.nhttp .. so either you are 
> using Synapse 1.0 or the 1.1 RC with a possibly older transport?
>
> asankha
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
> For additional commands, e-mail: synapse-user-help@ws.apache.org


Re: Synapse looses messages under high load

Posted by "Asankha C. Perera" <as...@wso2.com>.
Hi Gregory

Could you let me know the version of HTTP (1.0 or 1.1) being used 
between your client and Synapse, and between Synapse and Tomcat. Are you 
using SSL by any means. Also let me know the OS version, and your JDK 
version (including the minor version). Do you see anything in the log 
files? Are the client, Synapse and Tomcat on different physical hosts, 
virtual machines or same host? Also what do you mean by "we have 
extended Synapse.." have you done any code changes? Also are you using 
the Synapse 1.0 release or the 1.1 RC?

> This works extremely well with a few messages (even if we use several 
> simultaneous clients). But when we stress test our setup, we notice 
> something peculiar. We have sent in a particular test run 1110 
> requests to synapse, but at the client we only got 878 responses back.
Are you using a load balanced endpoint? Do you think you loose messages 
when load balancing is not used? Could you verify with a simple test in 
this same environment?
> The debug-logs on the org.apache.axis2.transport classes showed a 
> discrepancy between the content encoder (/[DEBUG] 06 nov 10:51:27.828 
> AM I/O reactor worker thread 2 
> [org.apache.axis2.transport.nhttp.ClientHandler]HTTP connection 
> [/157.193.215.56:9090]: Content encoder [chunk-coded; completed: 
> true]/) en content decoder (/BUG] 06 nov 10:51:28.718 AM I/O reactor 
> worker thread 2 [org.apache.axis2.transport.nhttp.ClientHandler]HTTP 
> connection [/157.193.215.56:9090]: Content decoder [chunk-coded; 
> completed: true]/) for both Tomcat-axis2 servers. We noticed 1110 of 
> the content encoder entries and only 878 entries for the content 
> decoder. These numbers fit to perfectly to be coincidence.
Again.. we just renamed our packaging of the NIO based http/s transports 
to org.apache.synapse.transport.nhttp .. so either you are using Synapse 
1.0 or the 1.1 RC with a possibly older transport?

asankha

---------------------------------------------------------------------
To unsubscribe, e-mail: synapse-user-unsubscribe@ws.apache.org
For additional commands, e-mail: synapse-user-help@ws.apache.org