You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mina.apache.org by Emmanuel Lécharny <el...@gmail.com> on 2013/01/04 17:51:37 UTC

[MINA 3] Performances

Hi !

I conducted some profiling sessions today, to see where we were spening
some spurious CPU in MINA3. When I first did some tests, I was able to
process 1 million 10 bytes messages in 75 seconds (the message is
written by the client, read by the server, which returns a 1 bte message
to the client). This is the very same test than the one Jeff wrote for
MINA 2 and Netty.

After a bit of analysis, I was able to lower this number to 57 seconds.

Now, here are the numbers for MINA 3, MINA 2 and Netty, for 1M messages :

Mina3 client/ Mina3 server : 10bytes msgs in 57.8 secs | 1k msgs in 53.2
secs | 10k msgs in 66.1 secs
Mina2 client/ Mina2 server : 10bytes msgs in 53.4 secs | 1k msgs in 52.4
secs | 10k msgs in 75.6 secs
Netty client/ Mina2 server :10bytes msgs in 51.4 secs | 1k msgs in 49.6
secs | 10k msgs in 74.7 secs

(we currently don't have a Netty server)

So bottom line, MINA 3 is slower than any other combinaison, despite the
minimal features we have injected, except for big messages. Is this a
problem ? Well, yes and no.

There are some very good reasons for MINA 3 to be slower : we call the
selector.select() method for every message we have to send. This is the
most expensive part of the code, and it's not something we can improve :
we don't have any way to make select() go faster.

OTOH, we could call select() less often. Right now, what we do is that
everytime we exit from a select(), we process all the activated
SelectionKey we get. This is done by calling the ready() method, with
flags indicating which event we have to process (OP_READ and OP_WRITE
mainly).

The ready() method processes the connect, read, wrate and accept events,
one after the other. The thing here is that if a read results in some
writes, the writes will be processed in the next select() loop, when in
Netty and MINA2 it's potentially processed just afterward, in the same
select() loop.

We can bypass this extra loop most of the time, as soon as the channel
is ready. The idea is to push the message into the channel, if we don't
have anything in the writeQueue, and we are done. If the writeQueue is
not empty, we simply push the message in the queue. Last, not least, if
we weren't able to write all the message, we push the remaining message
on the writeQueue, set the SelectKey OP_WRITE to true, and wake up the
selector. That would save us a lot of CPU for small messages.

I will try to do that.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [MINA 3] Performances

Posted by Emmanuel Lécharny <el...@gmail.com>.

Hi !

I did some more tests, with different buffer size. Here are the results :

MINA3 :

10b,   1M   msg/s : 28,341
1kb,   1M   msg/s : 28,884
10kb,  1M   msg/s : 25,603
20kb,  1M   msg/s : 21,238
50kb,  500k msg/s : 9,425
100kb, 200k msg/s : 7,637
200kb, 100k msg/s : 2,181
500kb, 50k  msg/s : 980
1Mb,   20k  msg/s : 462
10Mb,  2k   msg/s : 37
64Mb,  500  msg/s : 7

MINA2 :

10b,   1M   msg/s : 19,085
1kb,   1M   msg/s : 19,409
10kb,  1M   msg/s : 13,141
20kb,  1M   msg/s : 9,679
50kb,  500k msg/s : 7,025
100kb, 200k msg/s : 4,444
200kb, 100k msg/s : 2,861
500kb, 50k  msg/s : 1,550
1Mb,   20k  msg/s : 753
10Mb,  2k   msg/s : 76
64Mb,  500  msg/s : 11


Netty3 :

10b,   1M   msg/s : 26,010
1kb,   1M   msg/s : 25,297
10kb,  1M   msg/s : 19,883
20kb,  1M   msg/s : 12,420
50kb,  500k msg/s : 7,058
100kb, 200k msg/s : 4,719
200kb, 100k msg/s : 3,078
500kb, 50k  msg/s : 1,290
1Mb,   20k  msg/s : 477
10Mb,  2k   msg/s : 17
64Mb,  500  msg/s : 4


I have run those tests more than once, and I haven't seen so much
variation in the results.

It's interesting to see that all the frameworks have their shiny spots,
and dark ones. Typically, MINA 3 is faster up to message size < 200kb,
then Netty 3 takes the lead à 200kb, and interestingly, MINA 2 is faster
above 200kb.

For messages <200kb, MIN3 is faster than Netty3 by 9% up to 71%. It's
also faster than MINA 2 by 34% up to 120%.

Why do we gat those differences ? I can tell for MINA3 vs MINA 2 : MINA
2 write the data after a having enqueued them, so yu will get more
select() calls, when MINA 3 tries its best to write the messages
immediately. For messages bigger than the Socket SendBufferSize, MINA 3
does a copy into a DirectBuffer which gets allocated on the fly. It's a
costly operation when the message is big. MINA 2 tries to copy only
SendBuferSize bytes into the socket at each rounds.

All in all, it's assumed that there is a Heap -> Direct buffer
conversion done inside the Channel, and that the Channel does cache some
DirectBuffer.

I guess that we should be able to offer some configuration to the user
to tune the performances regardingthe kind of sent messages.

-- 
Regards, Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [MINA 3] Performances

Posted by Emmanuel Lécharny <el...@gmail.com>.

Le 1/14/13 10:47 AM, Julien Vermillard a écrit :
> It's only loopback tests ? I wonder if the result would be the same on a
> real network.

I don't think it would be any different, except that you will exhaust
your 100Mbs bandwith with 1Kb messages before you reach the peak we get
on the loopback, and a 1Gbs bandwith with 10Kb messages.
>
> BTW what would be the use case for writing buffer bigger than 64K ? That's
> sounding like a waste of memory for me.
This is what I told Norman : with buffer that size, it makes complete
sense to use transferTo() inseatd of going through buffers. In LDAP, we
may deal with message that big as we may transfer images, but this is
rare. It's probably more a n application issue than a MINA issue anyway...


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [MINA 3] Performances

Posted by Julien Vermillard <jv...@gmail.com>.

It's only loopback tests ? I wonder if the result would be the same on a
real network.

BTW what would be the use case for writing buffer bigger than 64K ? That's
sounding like a waste of memory for me.


On Mon, Jan 14, 2013 at 8:36 AM, Emmanuel Lécharny <el...@gmail.com>wrote:

> Hi !
>
> I did some more tests, with different buffer size. Here are the results :
>
> MINA3 :
>
> 10b,   1M   msg/s : 28,341
> 1kb,   1M   msg/s : 28,884
> 10kb,  1M   msg/s : 25,603
> 20kb,  1M   msg/s : 21,238
> 50kb,  500k msg/s : 9,425
> 100kb, 200k msg/s : 7,637
> 200kb, 100k msg/s : 2,181
> 500kb, 50k  msg/s : 980
> 1Mb,   20k  msg/s : 462
> 10Mb,  2k   msg/s : 37
> 64Mb,  500  msg/s : 7
>
> MINA2 :
>
> 10b,   1M   msg/s : 19,085
> 1kb,   1M   msg/s : 19,409
> 10kb,  1M   msg/s : 13,141
> 20kb,  1M   msg/s : 9,679
> 50kb,  500k msg/s : 7,025
> 100kb, 200k msg/s : 4,444
> 200kb, 100k msg/s : 2,861
> 500kb, 50k  msg/s : 1,550
> 1Mb,   20k  msg/s : 753
> 10Mb,  2k   msg/s : 76
> 64Mb,  500  msg/s : 11
>
>
> Netty3 :
>
> 10b,   1M   msg/s : 26,010
> 1kb,   1M   msg/s : 25,297
> 10kb,  1M   msg/s : 19,883
> 20kb,  1M   msg/s : 12,420
> 50kb,  500k msg/s : 7,058
> 100kb, 200k msg/s : 4,719
> 200kb, 100k msg/s : 3,078
> 500kb, 50k  msg/s : 1,290
> 1Mb,   20k  msg/s : 477
> 10Mb,  2k   msg/s : 17
> 64Mb,  500  msg/s : 4
>
>
> I have run those tests more than once, and I haven't seen so much
> variation in the results.
>
> It's interesting to see that all the frameworks have their shiny spots,
> and dark ones. Typically, MINA 3 is faster up to message size < 200kb,
> then Netty 3 takes the lead à 200kb, and interestingly, MINA 2 is faster
> above 200kb.
>
> For messages <200kb, MIN3 is faster than Netty3 by 9% up to 71%. It's
> also faster than MINA 2 by 34% up to 120%.
>
> Why do we gat those differences ? I can tell for MINA3 vs MINA 2 : MINA
> 2 write the data after a having enqueued them, so yu will get more
> select() calls, when MINA 3 tries its best to write the messages
> immediately. For messages bigger than the Socket SendBufferSize, MINA 3
> does a copy into a DirectBuffer which gets allocated on the fly. It's a
> costly operation when the message is big. MINA 2 tries to copy only
> SendBuferSize bytes into the socket at each rounds.
>
> All in all, it's assumed that there is a Heap -> Direct buffer
> conversion done inside the Channel, and that the Channel does cache some
> DirectBuffer.
>
> I guess that we should be able to offer some configuration to the user
> to tune the performances regardingthe kind of sent messages.
>
> One more thing : whatever processing is done on the chain or in the
> IoHandler, it's quite likely that it will not be a bottleneck compared
> to the time it takes to process a select(). During my profiling
> sessions, I saw that the select() operation covered 95% of the global
> CPU, and when we weren't spending franticly CPU cycles copying
> HeapBuffers into DirectBuffers, the CPU System represented around 85% of
> all the CPU.
>
> --
> Regards, Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>

Re: [MINA 3] Performances

Posted by Emmanuel Lécharny <el...@gmail.com>.

Hi !

I did some more tests, with different buffer size. Here are the results :

MINA3 :

10b,   1M   msg/s : 28,341
1kb,   1M   msg/s : 28,884
10kb,  1M   msg/s : 25,603
20kb,  1M   msg/s : 21,238
50kb,  500k msg/s : 9,425
100kb, 200k msg/s : 7,637
200kb, 100k msg/s : 2,181
500kb, 50k  msg/s : 980
1Mb,   20k  msg/s : 462
10Mb,  2k   msg/s : 37
64Mb,  500  msg/s : 7

MINA2 :

10b,   1M   msg/s : 19,085
1kb,   1M   msg/s : 19,409
10kb,  1M   msg/s : 13,141
20kb,  1M   msg/s : 9,679
50kb,  500k msg/s : 7,025
100kb, 200k msg/s : 4,444
200kb, 100k msg/s : 2,861
500kb, 50k  msg/s : 1,550
1Mb,   20k  msg/s : 753
10Mb,  2k   msg/s : 76
64Mb,  500  msg/s : 11


Netty3 :

10b,   1M   msg/s : 26,010
1kb,   1M   msg/s : 25,297
10kb,  1M   msg/s : 19,883
20kb,  1M   msg/s : 12,420
50kb,  500k msg/s : 7,058
100kb, 200k msg/s : 4,719
200kb, 100k msg/s : 3,078
500kb, 50k  msg/s : 1,290
1Mb,   20k  msg/s : 477
10Mb,  2k   msg/s : 17
64Mb,  500  msg/s : 4


I have run those tests more than once, and I haven't seen so much
variation in the results.

It's interesting to see that all the frameworks have their shiny spots,
and dark ones. Typically, MINA 3 is faster up to message size < 200kb,
then Netty 3 takes the lead à 200kb, and interestingly, MINA 2 is faster
above 200kb.

For messages <200kb, MIN3 is faster than Netty3 by 9% up to 71%. It's
also faster than MINA 2 by 34% up to 120%.

Why do we gat those differences ? I can tell for MINA3 vs MINA 2 : MINA
2 write the data after a having enqueued them, so yu will get more
select() calls, when MINA 3 tries its best to write the messages
immediately. For messages bigger than the Socket SendBufferSize, MINA 3
does a copy into a DirectBuffer which gets allocated on the fly. It's a
costly operation when the message is big. MINA 2 tries to copy only
SendBuferSize bytes into the socket at each rounds.

All in all, it's assumed that there is a Heap -> Direct buffer
conversion done inside the Channel, and that the Channel does cache some
DirectBuffer.

I guess that we should be able to offer some configuration to the user
to tune the performances regardingthe kind of sent messages.

One more thing : whatever processing is done on the chain or in the
IoHandler, it's quite likely that it will not be a bottleneck compared
to the time it takes to process a select(). During my profiling
sessions, I saw that the select() operation covered 95% of the global
CPU, and when we weren't spending franticly CPU cycles copying
HeapBuffers into DirectBuffers, the CPU System represented around 85% of
all the CPU.

-- 
Regards, Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [MINA 3] Performances

Posted by Emmanuel Lécharny <el...@gmail.com>.

Le 1/11/13 9:09 AM, Jeff MAURY a écrit :
> Two remarks:
>
> 1) if the user buffer is a DirectBuffer, I suppose you use it instead of
> your alllocated DirectBuffer
Yes.

> 2) why don't you use always your DirectBuffer when you need to write on the
> channel ?
This is what is done. Either i use the pre-allocated directBuffer, or I
allocate one and copy the HeapBuffer into it.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [MINA 3] Performances

Posted by Jeff MAURY <je...@jeffmaury.com>.

Two remarks:

1) if the user buffer is a DirectBuffer, I suppose you use it instead of
your alllocated DirectBuffer
2) why don't you use always your DirectBuffer when you need to write on the
channel ?

Regards
Jeff



On Fri, Jan 11, 2013 at 6:41 AM, Emmanuel Lécharny <el...@gmail.com>wrote:

> I have played a bit more...
>
> Now, I create a DirectBuffer whn the session is created, its size being
> equal to the SendBufferSize, and I reuse this buffer if I can write the
> message immediately. If not, or if I can't write it completely, I copy
> the HeapBuffer into a DirectBuffer.
>
> Such a strategy keeps the scenario where we send small messages fast,
> intermediate messages size above the SendBufferSize are also benefiting
> from this change and big messages ae also speeded up.
>
> It's all committed on trunk.
>
>
>
>
> Le 1/10/13 5:22 PM, Emmanuel Lécharny a écrit :
> > Ok I have conducted some experimentations :
> >
> > 1) The HeapBuffer is copied into a DirectBuffer before being written
> >
> > We see a dramatic performance improvement for the forth test (64Mb
> > messages), which now takes 14seconds to complete, instead of
> > timeouting,  but the three first tests are going 15% slower.
> >
> > 2) The HeapBuffer is copied only if its suize is above XXX (XXX to be
> > defined)
> >
> > The performances for tests 1 and 2 are the same, the 4th test is also as
> > good as with scenario (1), but test 2 (10Kb messages) is slowed down.
> >
> >
> > So the clear winner is scenario 2 so far.
> >
> > I think I have to study a third scenario, where we only copy the buffer
> > if its size is above the SendBuffer size.
> >
> > Also note that if the socket is not ready to accept some direct write
> > (for instance if there is a queue), I think the best is to convert the
> > HeapBuffer to a DirectBuffer.
> >
> >
> >
> >
> > Le 1/10/13 2:00 PM, Jeff MAURY a écrit :
> >> to be a little bit more precise, you will copy 64Mb the first time then
> >> 64Mb - 64kb the second time and so on
> >>
> >> Jeff
> >>
> >>
> >>
> >> On Thu, Jan 10, 2013 at 12:11 PM, Emmanuel Lécharny <
> elecharny@gmail.com>wrote:
> >>
> >>> Le 1/10/13 10:56 AM, Jeff MAURY a écrit :
> >>>> The performance gain is not related to the nature of the buffer; I
> mean
> >>>> writing to an HeapBuffer vs writing to a DirectBuffer but related to
> >>>> writing the buffer to the socket: if you write an HeapBuffer to a
> socket,
> >>>> my guess is that it will be copied to a DirectBuffer before it gets
> >>> written
> >>>> to the socket which is not the case for a DirectBuffer
> >>> This is exactly what happens. Internally, for each call to the
> >>> channel.write(HeapBuffer), there is a call to
> DirectBuffer.pu(HeapBuffer).
> >>>
> >>> You can imagine how costly this is when the HeapBuffer is 64 Mb big,
> and
> >>> we send only a fragment of it for each round... If the SendBufferSize
> is
> >>> 64Kb, in this case, you will just copy the 64Mb 1000 times.
> >>>
> >>> Good catch btw, Jeff !
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Cordialement,
> >>> Emmanuel Lécharny
> >>> www.iktek.com
> >>>
> >>>
> >
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Jeff MAURY


"Legacy code" often differs from its suggested alternative by actually
working and scaling.
 - Bjarne Stroustrup

http://www.jeffmaury.com
http://riadiscuss.jeffmaury.com
http://www.twitter.com/jeffmaury

Re: [MINA 3] Performances

Posted by Emmanuel Lécharny <el...@gmail.com>.

I have played a bit more...

Now, I create a DirectBuffer whn the session is created, its size being
equal to the SendBufferSize, and I reuse this buffer if I can write the
message immediately. If not, or if I can't write it completely, I copy
the HeapBuffer into a DirectBuffer.

Such a strategy keeps the scenario where we send small messages fast,
intermediate messages size above the SendBufferSize are also benefiting
from this change and big messages ae also speeded up.

It's all committed on trunk.




Le 1/10/13 5:22 PM, Emmanuel Lécharny a écrit :
> Ok I have conducted some experimentations :
>
> 1) The HeapBuffer is copied into a DirectBuffer before being written
>
> We see a dramatic performance improvement for the forth test (64Mb
> messages), which now takes 14seconds to complete, instead of
> timeouting,  but the three first tests are going 15% slower.
>
> 2) The HeapBuffer is copied only if its suize is above XXX (XXX to be
> defined)
>
> The performances for tests 1 and 2 are the same, the 4th test is also as
> good as with scenario (1), but test 2 (10Kb messages) is slowed down.
>
>
> So the clear winner is scenario 2 so far.
>
> I think I have to study a third scenario, where we only copy the buffer
> if its size is above the SendBuffer size.
>
> Also note that if the socket is not ready to accept some direct write
> (for instance if there is a queue), I think the best is to convert the
> HeapBuffer to a DirectBuffer.
>
>
>
>
> Le 1/10/13 2:00 PM, Jeff MAURY a écrit :
>> to be a little bit more precise, you will copy 64Mb the first time then
>> 64Mb - 64kb the second time and so on
>>
>> Jeff
>>
>>
>>
>> On Thu, Jan 10, 2013 at 12:11 PM, Emmanuel Lécharny <el...@gmail.com>wrote:
>>
>>> Le 1/10/13 10:56 AM, Jeff MAURY a écrit :
>>>> The performance gain is not related to the nature of the buffer; I mean
>>>> writing to an HeapBuffer vs writing to a DirectBuffer but related to
>>>> writing the buffer to the socket: if you write an HeapBuffer to a socket,
>>>> my guess is that it will be copied to a DirectBuffer before it gets
>>> written
>>>> to the socket which is not the case for a DirectBuffer
>>> This is exactly what happens. Internally, for each call to the
>>> channel.write(HeapBuffer), there is a call to DirectBuffer.pu(HeapBuffer).
>>>
>>> You can imagine how costly this is when the HeapBuffer is 64 Mb big, and
>>> we send only a fragment of it for each round... If the SendBufferSize is
>>> 64Kb, in this case, you will just copy the 64Mb 1000 times.
>>>
>>> Good catch btw, Jeff !
>>>
>>>
>>> --
>>> Regards,
>>> Cordialement,
>>> Emmanuel Lécharny
>>> www.iktek.com
>>>
>>>
>


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [MINA 3] Performances

Posted by Emmanuel Lécharny <el...@gmail.com>.

Ok I have conducted some experimentations :

1) The HeapBuffer is copied into a DirectBuffer before being written

We see a dramatic performance improvement for the forth test (64Mb
messages), which now takes 14seconds to complete, instead of
timeouting,  but the three first tests are going 15% slower.

2) The HeapBuffer is copied only if its suize is above XXX (XXX to be
defined)

The performances for tests 1 and 2 are the same, the 4th test is also as
good as with scenario (1), but test 2 (10Kb messages) is slowed down.

So the clear winner is scenario 2 so far.

I think I have to study a third scenario, where we only copy the buffer
if its size is above the SendBuffer size.

Also note that if the socket is not ready to accept some direct write
(for instance if there is a queue), I think the best is to convert the
HeapBuffer to a DirectBuffer.

Le 1/10/13 2:00 PM, Jeff MAURY a écrit :
> to be a little bit more precise, you will copy 64Mb the first time then
> 64Mb - 64kb the second time and so on
>
> Jeff
>
>
>
> On Thu, Jan 10, 2013 at 12:11 PM, Emmanuel Lécharny <el...@gmail.com>wrote:
>
>> Le 1/10/13 10:56 AM, Jeff MAURY a écrit :
>>> The performance gain is not related to the nature of the buffer; I mean
>>> writing to an HeapBuffer vs writing to a DirectBuffer but related to
>>> writing the buffer to the socket: if you write an HeapBuffer to a socket,
>>> my guess is that it will be copied to a DirectBuffer before it gets
>> written
>>> to the socket which is not the case for a DirectBuffer
>> This is exactly what happens. Internally, for each call to the
>> channel.write(HeapBuffer), there is a call to DirectBuffer.pu(HeapBuffer).
>>
>> You can imagine how costly this is when the HeapBuffer is 64 Mb big, and
>> we send only a fragment of it for each round... If the SendBufferSize is
>> 64Kb, in this case, you will just copy the 64Mb 1000 times.
>>
>> Good catch btw, Jeff !
>>
>>
>> --
>> Regards,
>> Cordialement,
>> Emmanuel Lécharny
>> www.iktek.com
>>
>>
>

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [MINA 3] Performances

Posted by Jeff MAURY <je...@jeffmaury.com>.

to be a little bit more precise, you will copy 64Mb the first time then
64Mb - 64kb the second time and so on

Jeff



On Thu, Jan 10, 2013 at 12:11 PM, Emmanuel Lécharny <el...@gmail.com>wrote:

> Le 1/10/13 10:56 AM, Jeff MAURY a écrit :
> > The performance gain is not related to the nature of the buffer; I mean
> > writing to an HeapBuffer vs writing to a DirectBuffer but related to
> > writing the buffer to the socket: if you write an HeapBuffer to a socket,
> > my guess is that it will be copied to a DirectBuffer before it gets
> written
> > to the socket which is not the case for a DirectBuffer
>
> This is exactly what happens. Internally, for each call to the
> channel.write(HeapBuffer), there is a call to DirectBuffer.pu(HeapBuffer).
>
> You can imagine how costly this is when the HeapBuffer is 64 Mb big, and
> we send only a fragment of it for each round... If the SendBufferSize is
> 64Kb, in this case, you will just copy the 64Mb 1000 times.
>
> Good catch btw, Jeff !
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Jeff MAURY


"Legacy code" often differs from its suggested alternative by actually
working and scaling.
 - Bjarne Stroustrup

http://www.jeffmaury.com
http://riadiscuss.jeffmaury.com
http://www.twitter.com/jeffmaury

Re: [MINA 3] Performances

Posted by Emmanuel Lécharny <el...@gmail.com>.

Le 1/10/13 10:56 AM, Jeff MAURY a écrit :
> The performance gain is not related to the nature of the buffer; I mean
> writing to an HeapBuffer vs writing to a DirectBuffer but related to
> writing the buffer to the socket: if you write an HeapBuffer to a socket,
> my guess is that it will be copied to a DirectBuffer before it gets written
> to the socket which is not the case for a DirectBuffer

This is exactly what happens. Internally, for each call to the
channel.write(HeapBuffer), there is a call to DirectBuffer.pu(HeapBuffer).

You can imagine how costly this is when the HeapBuffer is 64 Mb big, and
we send only a fragment of it for each round... If the SendBufferSize is
64Kb, in this case, you will just copy the 64Mb 1000 times.

Good catch btw, Jeff !


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [MINA 3] Performances

Posted by Jeff MAURY <je...@jeffmaury.com>.

The performance gain is not related to the nature of the buffer; I mean
writing to an HeapBuffer vs writing to a DirectBuffer but related to
writing the buffer to the socket: if you write an HeapBuffer to a socket,
my guess is that it will be copied to a DirectBuffer before it gets written
to the socket which is not the case for a DirectBuffer

Jeff



On Thu, Jan 10, 2013 at 10:47 AM, Steve Ulrich <st...@proemion.com>wrote:

> Hi!
>
> I saw some benchmarks of direct vs. heap buffers - but I can't remember a
> single one where direct buffers were a *big* performance gain. If you're
> copying the buffers just to make it perform better, you'll probably get a
> huge performance penality caused by the copy-logic itself.
>
> Maybe it's possible to remove the copy-logic by using a "duplicate()" of
> the original buffer. This copies only the ByteBuffer-Wrapper, not the
> underlying array.
> There is still a tradeoff: Application logic must make sure to not change
> the buffers content anymore! If it does, it gets really ugly.
> Shouldn't be a problem for Encoders, but may be if the application reuses
> the buffer (IMHO a bad Idea, anyway)
>
> regards
>
> Steve Ulrich
>
>
> > Emmanuel Lécharny [mailto:elecharny@gmail.com] wrote:
> >
> > Le 1/9/13 11:54 AM, Jeff MAURY a écrit :
> > > The problem I see if you choose to copy the user buffer into a
> > DirectBuffer
> > > is that your memory consumption will double even if the DirectBuffer
> > is not
> > > allocated on the heap, it may be problematic
> > It will double only the time necessary to copy the buffer. Then you can
> > discard the HeapBuffer...
> >
> > All in all, this is currently what happens behind the curtain, as NIO
> > copies the HeapBuffer into a HeapBuffer. Doing it on our layer gives us
> > some control.
> >
> > --
> > Regards,
> > Cordialement,
> > Emmanuel Lécharny
> > www.iktek.com
> >
> >
>
>
>
> --------------------------------------------------------------------------
> PROEMION GmbH
>
> Steve Ulrich
>
> IT Development (IT/DEV)
>
> Donaustrasse 14
> D-36043 Fulda, Germany
> Phone +49 (0) 661 9490-601
> Fax +49 (0) 661 9490-333
>
> http://www.proemion.com
>
> Geschäftsführer: Dipl. Ing. Robert Michaelides
> Amtsgericht-Registergericht-Fulda: 5 HRB 1867
> --------------------------------------------------------------------------
> E-mail and any attachments may be confidential. If you have received this
> E-mail and you are not a named addressee, please inform the sender
> immediately by E-mail and then delete this E-mail from your system. If you
> are not a named addressee, you may not use, disclose, distribute, copy or
> print this E-mail. Addressees should scan this E-mail and any attachments
> for viruses. No representation or warranty is made as to the absence of
> viruses in this E-mail or any of its attachments.
>
> AKTUELLES:
> http://www.proemion.de
>
> NEWS:
> http://www.proemion.com
>
>
>


-- 
Jeff MAURY


"Legacy code" often differs from its suggested alternative by actually
working and scaling.
 - Bjarne Stroustrup

http://www.jeffmaury.com
http://riadiscuss.jeffmaury.com
http://www.twitter.com/jeffmaury

RE: [MINA 3] Performances

Posted by Steve Ulrich <st...@proemion.com>.

Hi!

I saw some benchmarks of direct vs. heap buffers - but I can't remember a single one where direct buffers were a *big* performance gain. If you're copying the buffers just to make it perform better, you'll probably get a huge performance penality caused by the copy-logic itself.

Maybe it's possible to remove the copy-logic by using a "duplicate()" of the original buffer. This copies only the ByteBuffer-Wrapper, not the underlying array.
There is still a tradeoff: Application logic must make sure to not change the buffers content anymore! If it does, it gets really ugly.
Shouldn't be a problem for Encoders, but may be if the application reuses the buffer (IMHO a bad Idea, anyway)

regards

Steve Ulrich


> Emmanuel Lécharny [mailto:elecharny@gmail.com] wrote:
>
> Le 1/9/13 11:54 AM, Jeff MAURY a écrit :
> > The problem I see if you choose to copy the user buffer into a
> DirectBuffer
> > is that your memory consumption will double even if the DirectBuffer
> is not
> > allocated on the heap, it may be problematic
> It will double only the time necessary to copy the buffer. Then you can
> discard the HeapBuffer...
>
> All in all, this is currently what happens behind the curtain, as NIO
> copies the HeapBuffer into a HeapBuffer. Doing it on our layer gives us
> some control.
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>



--------------------------------------------------------------------------
PROEMION GmbH

Steve Ulrich

IT Development (IT/DEV)

Donaustrasse 14
D-36043 Fulda, Germany
Phone +49 (0) 661 9490-601
Fax +49 (0) 661 9490-333

http://www.proemion.com

Geschäftsführer: Dipl. Ing. Robert Michaelides
Amtsgericht-Registergericht-Fulda: 5 HRB 1867
--------------------------------------------------------------------------
E-mail and any attachments may be confidential. If you have received this
E-mail and you are not a named addressee, please inform the sender immediately by E-mail and then delete this E-mail from your system. If you are not a named addressee, you may not use, disclose, distribute, copy or print this E-mail. Addressees should scan this E-mail and any attachments for viruses. No representation or warranty is made as to the absence of viruses in this E-mail or any of its attachments.

AKTUELLES:
http://www.proemion.de

NEWS:
http://www.proemion.com

Re: [MINA 3] Performances

Posted by Emmanuel Lécharny <el...@gmail.com>.

Le 1/9/13 11:54 AM, Jeff MAURY a écrit :
> The problem I see if you choose to copy the user buffer into a DirectBuffer
> is that your memory consumption will double even if the DirectBuffer is not
> allocated on the heap, it may be problematic
It will double only the time necessary to copy the buffer. Then you can
discard the HeapBuffer...

All in all, this is currently what happens behind the curtain, as NIO
copies the HeapBuffer into a HeapBuffer. Doing it on our layer gives us
some control.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [MINA 3] Performances

Posted by Jeff MAURY <je...@jeffmaury.com>.

The problem I see if you choose to copy the user buffer into a DirectBuffer
is that your memory consumption will double even if the DirectBuffer is not
allocated on the heap, it may be problematic

Regards
Jeff



On Wed, Jan 9, 2013 at 11:44 AM, Emmanuel Lécharny <el...@gmail.com>wrote:

> Le 1/5/13 1:27 PM, Jeff MAURY a écrit :
> > Regarding the message size, I noticed that Mina2 writes only a sub part
> of
> > the message (size is computed from the receive buffer size if I remember
> > correctly) whereas Netty tries to write the buffer buffer to the socket.
> So
> > If you did the same in Mina3, it may explain why it is slow
>
> This can perfectly well be the reason for the horible performances we
> get when dealing with huge messages : as Jeff suggested, the HeapBuffer
> might be copied totally into a DirectBuffer when trying to write the
> message, regardless of the SendBufferSize. If so (ouch), we will do that
> 1000 times for a 64Mb message and a SendBufferSIze of 64Kb...
>
> I will experiment another solution for MINA 3 : allocating a
> DirectBuffer once for each thread, with a SendBufferSize size, and use
> this buffer to write into the socket. I'm also wondering if it would not
> be better to copy the HeapBuffer into a DirectBuffer before trying to
> write the message, as we may avoid copying the data from the Heap buffer
> into the DirectBuffer many times.
>
> Interesting thread !
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Jeff MAURY


"Legacy code" often differs from its suggested alternative by actually
working and scaling.
 - Bjarne Stroustrup

http://www.jeffmaury.com
http://riadiscuss.jeffmaury.com
http://www.twitter.com/jeffmaury

Re: [MINA 3] Performances

Posted by Emmanuel Lécharny <el...@gmail.com>.

Le 1/5/13 1:27 PM, Jeff MAURY a écrit :
> Regarding the message size, I noticed that Mina2 writes only a sub part of
> the message (size is computed from the receive buffer size if I remember
> correctly) whereas Netty tries to write the buffer buffer to the socket. So
> If you did the same in Mina3, it may explain why it is slow

This can perfectly well be the reason for the horible performances we
get when dealing with huge messages : as Jeff suggested, the HeapBuffer
might be copied totally into a DirectBuffer when trying to write the
message, regardless of the SendBufferSize. If so (ouch), we will do that
1000 times for a 64Mb message and a SendBufferSIze of 64Kb...

I will experiment another solution for MINA 3 : allocating a
DirectBuffer once for each thread, with a SendBufferSize size, and use
this buffer to write into the socket. I'm also wondering if it would not
be better to copy the HeapBuffer into a DirectBuffer before trying to
write the message, as we may avoid copying the data from the Heap buffer
into the DirectBuffer many times.

Interesting thread !

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [MINA 3] Performances

Posted by Julien Vermillard <jv...@gmail.com>.

perhaps we should try reading/writing direct buffers


On Sat, Jan 5, 2013 at 8:23 PM, Jeff MAURY <je...@gmail.com> wrote:

> I'm working on the netty server. It should be ok by the end of the
> week-end.
> Regarding the performance for large message, should'nt it be related to jni
> i mean the conversion from the java bytebuffer to a memory array that is
> expected by the os socket layer ?
>
> Jeff
> Le 5 janv. 2013 19:42, "Emmanuel Lécharny" <el...@gmail.com> a écrit :
>
> > Le 1/5/13 7:21 PM, Jeff MAURY a écrit :
> > > No, I did not mean there's a bug but what I meant is that when Mina2
> has
> > to
> > > write a large message, it will split the message in small parts when
> > > writing to the socket whereas Netty tries to write the full message to
> > the
> > > socket (as Mina3 from what you said). This may explain why Netty
> becomes
> > > slower for large messages like Mina3
> > Ah, ok.
> >
> > However, the way it *should* work, in any case, is that you should
> > always try to send as much data as you can, assuming also that the
> > send/received buffer are correctly sized initially.
> >
> > What I don't get is how it can make any difference to write the whole
> > message into the socket, because the socket won't accept more than what
> > it can store. I was expecting that you will loop as many times as the
> > socket can absorb, waking up the select() as soon as the socket s ready
> > to accept more data.
> >
> > Anyway, I have to investigate why MINA 3 is so damn slow when it comes
> > to send big messages, compared to MINA2. There is no reason for such a
> > gap in performance. This is also true for Netty, btw.
> >
> > Last, not lesat : the test with Netty just vovers the
> > NettyClientMina2Server. We need a test with NettyClientNettyServer (and
> > probably with the two latest Netty versions).
> >
> > --
> > Regards,
> > Cordialement,
> > Emmanuel Lécharny
> > www.iktek.com
> >
> >
>

Re: [MINA 3] Performances

Posted by Emmanuel Lécharny <el...@gmail.com>.

Le 1/5/13 8:23 PM, Jeff MAURY a écrit :
> I'm working on the netty server. It should be ok by the end of the week-end.
I just ran the NettyClient/NettyServer test you provided, using Java 7 :

NettyClient/NettyServer
10     : 38.308 - 26104/s
1024   : 39.383 - 25392/s
10240  : 49.132 - 20353/s


Compared to the same test on MINA 3 :

10     : 34.006 - 29407/s 12.65% faster
1024   : 33.568 - 29790/s 17.32% faster
10240  : 37.706 - 26521/s 30.31% faster

Definitively good...

Many thanks to Jean-François !

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [MINA 3] Performances

Posted by Jeff MAURY <je...@gmail.com>.

I'm working on the netty server. It should be ok by the end of the week-end.
Regarding the performance for large message, should'nt it be related to jni
i mean the conversion from the java bytebuffer to a memory array that is
expected by the os socket layer ?

Jeff
Le 5 janv. 2013 19:42, "Emmanuel Lécharny" <el...@gmail.com> a écrit :

> Le 1/5/13 7:21 PM, Jeff MAURY a écrit :
> > No, I did not mean there's a bug but what I meant is that when Mina2 has
> to
> > write a large message, it will split the message in small parts when
> > writing to the socket whereas Netty tries to write the full message to
> the
> > socket (as Mina3 from what you said). This may explain why Netty becomes
> > slower for large messages like Mina3
> Ah, ok.
>
> However, the way it *should* work, in any case, is that you should
> always try to send as much data as you can, assuming also that the
> send/received buffer are correctly sized initially.
>
> What I don't get is how it can make any difference to write the whole
> message into the socket, because the socket won't accept more than what
> it can store. I was expecting that you will loop as many times as the
> socket can absorb, waking up the select() as soon as the socket s ready
> to accept more data.
>
> Anyway, I have to investigate why MINA 3 is so damn slow when it comes
> to send big messages, compared to MINA2. There is no reason for such a
> gap in performance. This is also true for Netty, btw.
>
> Last, not lesat : the test with Netty just vovers the
> NettyClientMina2Server. We need a test with NettyClientNettyServer (and
> probably with the two latest Netty versions).
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>

Re: [MINA 3] Performances

Posted by Emmanuel Lécharny <el...@gmail.com>.

Le 1/5/13 7:21 PM, Jeff MAURY a écrit :
> No, I did not mean there's a bug but what I meant is that when Mina2 has to
> write a large message, it will split the message in small parts when
> writing to the socket whereas Netty tries to write the full message to the
> socket (as Mina3 from what you said). This may explain why Netty becomes
> slower for large messages like Mina3
Ah, ok.

However, the way it *should* work, in any case, is that you should
always try to send as much data as you can, assuming also that the
send/received buffer are correctly sized initially.

What I don't get is how it can make any difference to write the whole
message into the socket, because the socket won't accept more than what
it can store. I was expecting that you will loop as many times as the
socket can absorb, waking up the select() as soon as the socket s ready
to accept more data.

Anyway, I have to investigate why MINA 3 is so damn slow when it comes
to send big messages, compared to MINA2. There is no reason for such a
gap in performance. This is also true for Netty, btw.

Last, not lesat : the test with Netty just vovers the
NettyClientMina2Server. We need a test with NettyClientNettyServer (and
probably with the two latest Netty versions).

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [MINA 3] Performances

Posted by Jeff MAURY <je...@jeffmaury.com>.

No, I did not mean there's a bug but what I meant is that when Mina2 has to
write a large message, it will split the message in small parts when
writing to the socket whereas Netty tries to write the full message to the
socket (as Mina3 from what you said). This may explain why Netty becomes
slower for large messages like Mina3

Jeff

On Sat, Jan 5, 2013 at 5:26 PM, Emmanuel Lécharny <el...@gmail.com>wrote:

> Le 1/5/13 1:27 PM, Jeff MAURY a écrit :
> > Regarding the message size, I noticed that Mina2 writes only a sub part
> of
> > the message
>
> Do you mean the full message don't get written ? I would be extremely
> surprised if it were the case, and it would worth a JIRA and a fix...
>
> > (size is computed from the receive buffer size if I remember
> > correctly) whereas Netty tries to write the buffer buffer to the socket.
> So
> > If you did the same in Mina3, it may explain why it is slow
> I have to investigate what's goig on in MINA3. We can certainly improve
> the performance here.
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>

-- 
Jeff MAURY

"Legacy code" often differs from its suggested alternative by actually
working and scaling.
 - Bjarne Stroustrup

http://www.jeffmaury.com
http://riadiscuss.jeffmaury.com
http://www.twitter.com/jeffmaury

Re: [MINA 3] Performances

Posted by Emmanuel Lécharny <el...@gmail.com>.

Le 1/5/13 1:27 PM, Jeff MAURY a écrit :
> Regarding the message size, I noticed that Mina2 writes only a sub part of
> the message 

Do you mean the full message don't get written ? I would be extremely
surprised if it were the case, and it would worth a JIRA and a fix...

> (size is computed from the receive buffer size if I remember
> correctly) whereas Netty tries to write the buffer buffer to the socket. So
> If you did the same in Mina3, it may explain why it is slow
I have to investigate what's goig on in MINA3. We can certainly improve
the performance here.


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: [MINA 3] Performances

Posted by Jeff MAURY <je...@jeffmaury.com>.

Regarding the message size, I noticed that Mina2 writes only a sub part of
the message (size is computed from the receive buffer size if I remember
correctly) whereas Netty tries to write the buffer buffer to the socket. So
If you did the same in Mina3, it may explain why it is slow

Jeff



On Sat, Jan 5, 2013 at 9:12 AM, Julien Vermillard <jv...@gmail.com>wrote:

> Houura :-)
> The default socket buffer size is changed by mina 2 ?
>  Le 5 janv. 2013 06:26, "Emmanuel Lécharny" <el...@gmail.com> a écrit
> :
>
> > Some more results after the modification I suggest at the end of my last
> > mail (ie, do the write immediately, instead of using a queue, when we
> can)
> > :
> >
> > Mina3 client/ Mina3 server : 10bytes msgs in 38.6 secs | 1k msgs in 42.2
> > secs | 10k msgs in 45.7 secs
> >
> > This is a clear 50% speedup compared to my previous MINA 3 perfs, and 33%
> > increase compared to the Netty scenario.
> >
> > I still have some more code to fix to get this working fo UDP, as I
> > added a writeDirect() method into the IoSessin interface I now have to
> > implement for each class.
> >
> > Also note that, compared to MINA 2, the forth test (ie, transfering
> > large messages - 64Mo) is extremely slow - as for Netty -. There is most
> > certainly somethng to do regarding the buffer size we use to transfer
> > such big Messages.
> >
> >
> > Le 1/4/13 5:51 PM, Emmanuel Lécharny a écrit :
> > > Hi !
> > >
> > > I conducted some profiling sessions today, to see where we were spening
> > > some spurious CPU in MINA3. When I first did some tests, I was able to
> > > process 1 million 10 bytes messages in 75 seconds (the message is
> > > written by the client, read by the server, which returns a 1 bte
> message
> > > to the client). This is the very same test than the one Jeff wrote for
> > > MINA 2 and Netty.
> > >
> > > After a bit of analysis, I was able to lower this number to 57 seconds.
> > >
> > > Now, here are the numbers for MINA 3, MINA 2 and Netty, for 1M
> messages :
> > >
> > > Mina3 client/ Mina3 server : 10bytes msgs in 57.8 secs | 1k msgs in
> 53.2
> > > secs | 10k msgs in 66.1 secs
> > > Mina2 client/ Mina2 server : 10bytes msgs in 53.4 secs | 1k msgs in
> 52.4
> > > secs | 10k msgs in 75.6 secs
> > > Netty client/ Mina2 server :10bytes msgs in 51.4 secs | 1k msgs in 49.6
> > > secs | 10k msgs in 74.7 secs
> > >
> > > (we currently don't have a Netty server)
> > >
> > > So bottom line, MINA 3 is slower than any other combinaison, despite
> the
> > > minimal features we have injected, except for big messages. Is this a
> > > problem ? Well, yes and no.
> > >
> > > There are some very good reasons for MINA 3 to be slower : we call the
> > > selector.select() method for every message we have to send. This is the
> > > most expensive part of the code, and it's not something we can improve
> :
> > > we don't have any way to make select() go faster.
> > >
> > > OTOH, we could call select() less often. Right now, what we do is that
> > > everytime we exit from a select(), we process all the activated
> > > SelectionKey we get. This is done by calling the ready() method, with
> > > flags indicating which event we have to process (OP_READ and OP_WRITE
> > > mainly).
> > >
> > > The ready() method processes the connect, read, wrate and accept
> events,
> > > one after the other. The thing here is that if a read results in some
> > > writes, the writes will be processed in the next select() loop, when in
> > > Netty and MINA2 it's potentially processed just afterward, in the same
> > > select() loop.
> > >
> > > We can bypass this extra loop most of the time, as soon as the channel
> > > is ready. The idea is to push the message into the channel, if we don't
> > > have anything in the writeQueue, and we are done. If the writeQueue is
> > > not empty, we simply push the message in the queue. Last, not least, if
> > > we weren't able to write all the message, we push the remaining message
> > > on the writeQueue, set the SelectKey OP_WRITE to true, and wake up the
> > > selector. That would save us a lot of CPU for small messages.
> > >
> > > I will try to do that.
> > >
> >
> >
> > --
> > Regards,
> > Cordialement,
> > Emmanuel Lécharny
> > www.iktek.com
> >
> >
>



-- 
Jeff MAURY


"Legacy code" often differs from its suggested alternative by actually
working and scaling.
 - Bjarne Stroustrup

http://www.jeffmaury.com
http://riadiscuss.jeffmaury.com
http://www.twitter.com/jeffmaury

Re: [MINA 3] Performances

Posted by Julien Vermillard <jv...@gmail.com>.

Houura :-)
The default socket buffer size is changed by mina 2 ?
 Le 5 janv. 2013 06:26, "Emmanuel Lécharny" <el...@gmail.com> a écrit :

> Some more results after the modification I suggest at the end of my last
> mail (ie, do the write immediately, instead of using a queue, when we can)
> :
>
> Mina3 client/ Mina3 server : 10bytes msgs in 38.6 secs | 1k msgs in 42.2
> secs | 10k msgs in 45.7 secs
>
> This is a clear 50% speedup compared to my previous MINA 3 perfs, and 33%
> increase compared to the Netty scenario.
>
> I still have some more code to fix to get this working fo UDP, as I
> added a writeDirect() method into the IoSessin interface I now have to
> implement for each class.
>
> Also note that, compared to MINA 2, the forth test (ie, transfering
> large messages - 64Mo) is extremely slow - as for Netty -. There is most
> certainly somethng to do regarding the buffer size we use to transfer
> such big Messages.
>
>
> Le 1/4/13 5:51 PM, Emmanuel Lécharny a écrit :
> > Hi !
> >
> > I conducted some profiling sessions today, to see where we were spening
> > some spurious CPU in MINA3. When I first did some tests, I was able to
> > process 1 million 10 bytes messages in 75 seconds (the message is
> > written by the client, read by the server, which returns a 1 bte message
> > to the client). This is the very same test than the one Jeff wrote for
> > MINA 2 and Netty.
> >
> > After a bit of analysis, I was able to lower this number to 57 seconds.
> >
> > Now, here are the numbers for MINA 3, MINA 2 and Netty, for 1M messages :
> >
> > Mina3 client/ Mina3 server : 10bytes msgs in 57.8 secs | 1k msgs in 53.2
> > secs | 10k msgs in 66.1 secs
> > Mina2 client/ Mina2 server : 10bytes msgs in 53.4 secs | 1k msgs in 52.4
> > secs | 10k msgs in 75.6 secs
> > Netty client/ Mina2 server :10bytes msgs in 51.4 secs | 1k msgs in 49.6
> > secs | 10k msgs in 74.7 secs
> >
> > (we currently don't have a Netty server)
> >
> > So bottom line, MINA 3 is slower than any other combinaison, despite the
> > minimal features we have injected, except for big messages. Is this a
> > problem ? Well, yes and no.
> >
> > There are some very good reasons for MINA 3 to be slower : we call the
> > selector.select() method for every message we have to send. This is the
> > most expensive part of the code, and it's not something we can improve :
> > we don't have any way to make select() go faster.
> >
> > OTOH, we could call select() less often. Right now, what we do is that
> > everytime we exit from a select(), we process all the activated
> > SelectionKey we get. This is done by calling the ready() method, with
> > flags indicating which event we have to process (OP_READ and OP_WRITE
> > mainly).
> >
> > The ready() method processes the connect, read, wrate and accept events,
> > one after the other. The thing here is that if a read results in some
> > writes, the writes will be processed in the next select() loop, when in
> > Netty and MINA2 it's potentially processed just afterward, in the same
> > select() loop.
> >
> > We can bypass this extra loop most of the time, as soon as the channel
> > is ready. The idea is to push the message into the channel, if we don't
> > have anything in the writeQueue, and we are done. If the writeQueue is
> > not empty, we simply push the message in the queue. Last, not least, if
> > we weren't able to write all the message, we push the remaining message
> > on the writeQueue, set the SelectKey OP_WRITE to true, and wake up the
> > selector. That would save us a lot of CPU for small messages.
> >
> > I will try to do that.
> >
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>

Re: [MINA 3] Performances

Posted by Emmanuel Lécharny <el...@gmail.com>.

Some more results after the modification I suggest at the end of my last
mail (ie, do the write immediately, instead of using a queue, when we can) :

Mina3 client/ Mina3 server : 10bytes msgs in 38.6 secs | 1k msgs in 42.2 secs | 10k msgs in 45.7 secs

This is a clear 50% speedup compared to my previous MINA 3 perfs, and 33% increase compared to the Netty scenario.

I still have some more code to fix to get this working fo UDP, as I
added a writeDirect() method into the IoSessin interface I now have to
implement for each class.

Also note that, compared to MINA 2, the forth test (ie, transfering
large messages - 64Mo) is extremely slow - as for Netty -. There is most
certainly somethng to do regarding the buffer size we use to transfer
such big Messages.


Le 1/4/13 5:51 PM, Emmanuel Lécharny a écrit :
> Hi !
>
> I conducted some profiling sessions today, to see where we were spening
> some spurious CPU in MINA3. When I first did some tests, I was able to
> process 1 million 10 bytes messages in 75 seconds (the message is
> written by the client, read by the server, which returns a 1 bte message
> to the client). This is the very same test than the one Jeff wrote for
> MINA 2 and Netty.
>
> After a bit of analysis, I was able to lower this number to 57 seconds.
>
> Now, here are the numbers for MINA 3, MINA 2 and Netty, for 1M messages :
>
> Mina3 client/ Mina3 server : 10bytes msgs in 57.8 secs | 1k msgs in 53.2
> secs | 10k msgs in 66.1 secs
> Mina2 client/ Mina2 server : 10bytes msgs in 53.4 secs | 1k msgs in 52.4
> secs | 10k msgs in 75.6 secs
> Netty client/ Mina2 server :10bytes msgs in 51.4 secs | 1k msgs in 49.6
> secs | 10k msgs in 74.7 secs
>
> (we currently don't have a Netty server)
>
> So bottom line, MINA 3 is slower than any other combinaison, despite the
> minimal features we have injected, except for big messages. Is this a
> problem ? Well, yes and no.
>
> There are some very good reasons for MINA 3 to be slower : we call the
> selector.select() method for every message we have to send. This is the
> most expensive part of the code, and it's not something we can improve :
> we don't have any way to make select() go faster.
>
> OTOH, we could call select() less often. Right now, what we do is that
> everytime we exit from a select(), we process all the activated
> SelectionKey we get. This is done by calling the ready() method, with
> flags indicating which event we have to process (OP_READ and OP_WRITE
> mainly).
>
> The ready() method processes the connect, read, wrate and accept events,
> one after the other. The thing here is that if a read results in some
> writes, the writes will be processed in the next select() loop, when in
> Netty and MINA2 it's potentially processed just afterward, in the same
> select() loop.
>
> We can bypass this extra loop most of the time, as soon as the channel
> is ready. The idea is to push the message into the channel, if we don't
> have anything in the writeQueue, and we are done. If the writeQueue is
> not empty, we simply push the message in the queue. Last, not least, if
> we weren't able to write all the message, we push the remaining message
> on the writeQueue, set the SelectKey OP_WRITE to true, and wake up the
> selector. That would save us a lot of CPU for small messages.
>
> I will try to do that.
>


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com