You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mina.apache.org by 이희승 (Trustin Lee) <t...@gmail.com> on 2008/04/06 19:52:48 UTC

Interesting direct buffer vs heap buffer micro benchmark

Hi,

I found direct buffer outperforms heap buffer in most cases in modern
SUN JVM (1.6.0.05 and 1.6.0.10-beta), but I am not sure the result I got
is consistent for more architectures.

The test is very simple; it just reads and writes an integer from/to the
buffer.  What makes this test interesting is that the figures change
dramatically depending on the firstly called test method.  If the
firstly called test is the heap buffer test, heap buffer performs very
well.  However, if the firstly called one is the direct buffer test,
direct buffer outperforms often marginally.  What's obvious though is
that direct buffer outperforms heap buffer in most cases.

If this test result is true for other platforms such as SunOS and
Windows, what could we do?  I know allocating a lot of direct buffers
easily leads to OOM, but I found that I can avoid that problem by
implementing simple JNI functions which simply bypass JVM direct buffer
management and use stdlib malloc/free instead.

I was also able to observe similar test result in JDK 1.5.0.15, but
running the heap test first always resulted in extremely slow direct
buffer performance (3x slowdown), which is also interesting.  Why
hotspot so inconsistent like this?

---- DIRECT FIRST put/getLong() ----
direct: 1237
heap: 2729

---- HEAP FIRST put/getLong() ----
heap: 2830
direct: 1428

---- DIRECT FIRST put/getInt() ----
direct: 2172
heap: 10657

---- HEAP FIRST put/getInt() ----
heap: 5561
direct: 3371

---- DIRECT FIRST put/getShort() ----
direct: 3911
heap: 9517

---- HEAP FIRST put/getShort() ----
heap: 10282
direct: 5133

---- DIRECT FIRST put/get() ----
direct: 6222
heap: 13447

---- HEAP FIRST put/get() ----
heap: 8946
direct: 12235

The test was performed on my laptop (core 2 duo, T9300 2.5GHz) and I'd
like to know what other people got from the following test code.  I
modified get/put methods and the order of the test methods manually.
Probably just running against get/putInt() with two orders will be fine.

---- TEST CODE ----
import java.nio.ByteBuffer;

import org.junit.Test;


public class ByteBufferTest {
    @Test
    public void direct() {
        ByteBuffer buf = ByteBuffer.allocateDirect(2048);
        test("direct", buf);
    }

    @Test
    public void heap() {
        ByteBuffer buf = ByteBuffer.allocate(2048);
        test("heap", buf);
    }

    private void test(String name, ByteBuffer buf) {
        long startTime = System.currentTimeMillis();
        for (int i = 1048576; i > 0; i --) {
            buf.clear();
            while (buf.hasRemaining()) {
                buf.getInt(buf.position());
                buf.putInt((byte) 0);
            }
        }
        long endTime = System.currentTimeMillis();
        System.out.println(name + ": " + (endTime - startTime));
    }
}


-- 
Trustin Lee - Principal Software Engineer, JBoss, Red Hat
--
what we call human nature is actually human habit
--
http://gleamynode.net/

Re: Interesting direct buffer vs heap buffer micro benchmark

Posted by "David M. Lloyd" <da...@redhat.com>.

On 04/07/2008 02:05 AM, "이희승 (Trustin Lee) <tr...@gmail.com>" wrote:
> I have implemented a quick prototype of mina JNI extension and the
> ByteBuffer allocated by glibc malloc seems to show the same performance
> characteristics as expected.

I also suggested (in IRC) to try mapping anonymous pages via mmap.  Transcript:

<dmlloyd> trustin: another JNI experiment that might be interesting to try, 
would be to try using mmap/munmap to allocate memory (if you're doing big 
chunks).  This would work better on 64-bit systems though.
<trustin> dmlloyd: how big?
<vrm> big enougth pool ;)
<dmlloyd> hm, I dunno :)  let me look at the man page
<dmlloyd> I guess use your judgement - the granularity is one page usually, 
but you don't want to go that small
<dmlloyd> I'd say 512k to 1MB
<dmlloyd> that's a gut guess though :)
<dmlloyd> 1MB would yield 64 16k buffers, and even a 16k user buffer is 
pretty big imo
<dmlloyd> 1024 1k buffers
<dmlloyd> so I guess you could allocate maybe 10 1MB buffers and be set for 
life, for most applications :)
<dmlloyd> I'll post the idea to mina-dev too
<trustin> sounds great
<vrm> chunks bigger than 1MB shouldn't appear (except transfering a big 
file with aweb)
<dmlloyd> would you really want to buffer all 1MB in RAM at once though?
<dmlloyd> for transferring big files, I'd spool it to disk once it gets 
bigger than about 16k I guess
<vrm> never tried but I can bid it'll scale badly

- DML

Re: Interesting direct buffer vs heap buffer micro benchmark

Posted by 이희승 (Trustin Lee) <t...@gmail.com>.

David M. Lloyd wrote:
> On 04/07/2008 09:11 AM, "이희승 (Trustin Lee) <tr...@gmail.com>" wrote:
>> David M. Lloyd wrote:
>>> On 04/07/2008 02:05 AM, "이희승 (Trustin Lee) <tr...@gmail.com>"
>>> wrote:
>>>> We could create a big read buffer and fire a messageReceived event with
>>>> its sliced part, but we still have an issue with figuring out what part
>>>> of the read buffer is being referenced by user.  We can be notified
>>>> when
>>>> the slice is garbage collected using PhantomReference, but its
>>>> performance is poor according to my test.  Of course, again, we can ask
>>>> a user to notify the I/O processor when he or she doesn't need it
>>>> anymore, but it's inconvenient and error-prone.
>>> Trustin, what if you only allocate large buffers, and then hand out
>>> slices to it?  Rather than using PhantomReferences to track the each
>>> slice, you could track the original buffer itself.  Once the original
>>> buffer is no longer referenced, you could then create new slices and
>>> hand them out.
>>
>> How do you determine if the original buffer is no longer referenced?  If
>> we know that, we can simply reuse the original buffer and that would be
>> the most efficient implementation.
> 
> The same way you already were - with a PhantomReference or similar.  I'm
> guessing that by having one PhantomReference to many user buffers, maybe
> the performance impact won't be so significant.

Ahh, that's a nice idea.  Let me write some prototype and repost the
test result.  :)

Thanks,
Trustin
-- 
Trustin Lee - Principal Software Engineer, JBoss, Red Hat
--
what we call human nature is actually human habit
--
http://gleamynode.net/

Re: Interesting direct buffer vs heap buffer micro benchmark

Posted by "David M. Lloyd" <da...@redhat.com>.

On 04/07/2008 09:11 AM, "이희승 (Trustin Lee) <tr...@gmail.com>" wrote:
> David M. Lloyd wrote:
>> On 04/07/2008 02:05 AM, "이희승 (Trustin Lee) <tr...@gmail.com>" wrote:
>>> We could create a big read buffer and fire a messageReceived event with
>>> its sliced part, but we still have an issue with figuring out what part
>>> of the read buffer is being referenced by user.  We can be notified when
>>> the slice is garbage collected using PhantomReference, but its
>>> performance is poor according to my test.  Of course, again, we can ask
>>> a user to notify the I/O processor when he or she doesn't need it
>>> anymore, but it's inconvenient and error-prone.
>> Trustin, what if you only allocate large buffers, and then hand out
>> slices to it?  Rather than using PhantomReferences to track the each
>> slice, you could track the original buffer itself.  Once the original
>> buffer is no longer referenced, you could then create new slices and
>> hand them out.
> 
> How do you determine if the original buffer is no longer referenced?  If
> we know that, we can simply reuse the original buffer and that would be
> the most efficient implementation.

The same way you already were - with a PhantomReference or similar.  I'm 
guessing that by having one PhantomReference to many user buffers, maybe 
the performance impact won't be so significant.

- DML

Re: Interesting direct buffer vs heap buffer micro benchmark

Posted by 이희승 (Trustin Lee) <t...@gmail.com>.

David M. Lloyd wrote:
> On 04/07/2008 02:05 AM, "이희승 (Trustin Lee) <tr...@gmail.com>" wrote:
>> We could create a big read buffer and fire a messageReceived event with
>> its sliced part, but we still have an issue with figuring out what part
>> of the read buffer is being referenced by user.  We can be notified when
>> the slice is garbage collected using PhantomReference, but its
>> performance is poor according to my test.  Of course, again, we can ask
>> a user to notify the I/O processor when he or she doesn't need it
>> anymore, but it's inconvenient and error-prone.
> 
> Trustin, what if you only allocate large buffers, and then hand out
> slices to it?  Rather than using PhantomReferences to track the each
> slice, you could track the original buffer itself.  Once the original
> buffer is no longer referenced, you could then create new slices and
> hand them out.

How do you determine if the original buffer is no longer referenced?  If
we know that, we can simply reuse the original buffer and that would be
the most efficient implementation.

Thanks,
-- 
Trustin Lee - Principal Software Engineer, JBoss, Red Hat
--
what we call human nature is actually human habit
--
http://gleamynode.net/

Re: Interesting direct buffer vs heap buffer micro benchmark

Posted by "David M. Lloyd" <da...@redhat.com>.

On 04/07/2008 02:05 AM, "이희승 (Trustin Lee) <tr...@gmail.com>" wrote:
> We could create a big read buffer and fire a messageReceived event with
> its sliced part, but we still have an issue with figuring out what part
> of the read buffer is being referenced by user.  We can be notified when
> the slice is garbage collected using PhantomReference, but its
> performance is poor according to my test.  Of course, again, we can ask
> a user to notify the I/O processor when he or she doesn't need it
> anymore, but it's inconvenient and error-prone.

Trustin, what if you only allocate large buffers, and then hand out slices 
to it?  Rather than using PhantomReferences to track the each slice, you 
could track the original buffer itself.  Once the original buffer is no 
longer referenced, you could then create new slices and hand them out.

- DML

Re: Interesting direct buffer vs heap buffer micro benchmark

Posted by 이희승 (Trustin Lee) <t...@gmail.com>.

I have implemented a quick prototype of mina JNI extension and the
ByteBuffer allocated by glibc malloc seems to show the same performance
characteristics as expected.

However, I was not able to overcome the cost of direct buffer
allocation, which is extremely high (apx 10 times).  Pooling yields much
better performance, but it's still very slow (2~5 times depending on
situation.)  I think the allocation can also outperform heap buffer if
we force a user to return the buffer to the pool explicitly, but it's an
expressway to OOM.

So... I guess we need to stick to heap buffers unfortunately although I
still want to find out a way to reduce the buffer allocation which
occurs in I/O processor for each read operation.  It certainly takes up
memory bandwidth because the newly allocated buffer is filled with NULs
inevitably.  AFAIK, there's no workaround in JNI for this issue as long
as we use heap buffers.

We could create a big read buffer and fire a messageReceived event with
its sliced part, but we still have an issue with figuring out what part
of the read buffer is being referenced by user.  We can be notified when
the slice is garbage collected using PhantomReference, but its
performance is poor according to my test.  Of course, again, we can ask
a user to notify the I/O processor when he or she doesn't need it
anymore, but it's inconvenient and error-prone.

I don't think JVM will ever provide us a way to create a dirty byte
array, so I guess I've hit the dead end for now.

"이희승 (Trustin Lee) <tr...@gmail.com>" wrote:
> Emmanuel Lecharny wrote:
>> "이희승 (Trustin Lee) <tr...@gmail.com>" wrote:
>>> Hi,
>>>
>>>   
>> <snip/>
>>> I was also able to observe similar test result in JDK 1.5.0.15, but
>>> running the heap test first always resulted in extremely slow direct
>>> buffer performance (3x slowdown), which is also interesting.  Why
>>> hotspot so inconsistent like this?
>>>   
>> I have tested it with IBM JVM, and I got the very same strange
>> inconsistency... It would be interesting to profile the test to
>> understand what really happens internally.
> 
> It seems like the hotspot engine optimizes the test(...) method to work
> better with a certain type of byte buffer.  I inlined the test(...)
> method into the two test methods, and now I am getting consistent
> result.  Direct buffer outperforms heap buffer for all cases in Java 6,
> except for the allocation and deallocation.  If we can provide
> reasonable solution for pooling direct buffers, we will be able to gain
> more performance out of MINA.
> 
> BTW, it is interesting that direct buffer outperforms heap buffer.
> Kudos to the JVM developers!
> 
> ---- REVISED CODE ----
> import java.nio.ByteBuffer;
> 
> import org.junit.Test;
> 
> 
> public class ByteBufferTest {
>     @Test
>     public void heap() {
>         ByteBuffer buf = ByteBuffer.allocate(2048);
>         long startTime = System.currentTimeMillis();
>         for (int i = 1048576; i > 0; i --) {
>             buf.clear();
>             while (buf.hasRemaining()) {
>                 buf.getInt(buf.position());
>                 buf.putInt((byte) 0);
>             }
>         }
>         long endTime = System.currentTimeMillis();
>         System.out.println("heap: " + (endTime - startTime));
>     }
> 
>     @Test
>     public void direct() {
>         ByteBuffer buf = ByteBuffer.allocateDirect(2048);
>         long startTime = System.currentTimeMillis();
>         for (int i = 1048576; i > 0; i --) {
>             buf.clear();
>             while (buf.hasRemaining()) {
>                 buf.getInt(buf.position());
>                 buf.putInt((byte) 0);
>             }
>         }
>         long endTime = System.currentTimeMillis();
>         System.out.println("direct: " + (endTime - startTime));
>     }
> 
> }
> 
> 

-- 
Trustin Lee - Principal Software Engineer, JBoss, Red Hat
--
what we call human nature is actually human habit
--
http://gleamynode.net/

Re: Interesting direct buffer vs heap buffer micro benchmark

Posted by Steve Ulrich <st...@proemion.com>.

"이희승 (Trustin Lee <tr...@...> writes:

> Emmanuel Lecharny wrote:
> It seems like the hotspot engine optimizes the test(...) method to work
> better with a certain type of byte buffer.  I inlined the test(...)
> method into the two test methods, and now I am getting consistent
> result.  Direct buffer outperforms heap buffer for all cases in Java 6,
> except for the allocation and deallocation.  If we can provide
> reasonable solution for pooling direct buffers, we will be able to gain
> more performance out of MINA.
> 
> BTW, it is interesting that direct buffer outperforms heap buffer.
> Kudos to the JVM developers!


Hi! I did some testing just a while back. I got similar results and read the
javadoc more carefully, which states some points about "direct" buffers:
- They are expensive to create.
- They can prevent that a java byte buffer is copied to a native array/buffer in
an IO-Operation.
"It is therefore recommended that direct buffers be allocated primarily for
large, long-lived buffers that are subject to the underlying system's native I/O
operations."

Your test is therefore missing some "IO action" where a direct/heap buffer can
do it's tricks.

Mina is an IO-Framework, so there may be some places that will perform better
with direct buffers and some places where a pure java buffer does. The question
is: How much can you increase performance with tweaking the buffers without
harming the performance in some worst case scenarios (i.e. small messages vs.
big messages).

Re: Interesting direct buffer vs heap buffer micro benchmark

Posted by 이희승 (Trustin Lee) <t...@gmail.com>.

Emmanuel Lecharny wrote:
> "이희승 (Trustin Lee) <tr...@gmail.com>" wrote:
>> Hi,
>>
>>   
> <snip/>
>> I was also able to observe similar test result in JDK 1.5.0.15, but
>> running the heap test first always resulted in extremely slow direct
>> buffer performance (3x slowdown), which is also interesting.  Why
>> hotspot so inconsistent like this?
>>   
> 
> I have tested it with IBM JVM, and I got the very same strange
> inconsistency... It would be interesting to profile the test to
> understand what really happens internally.

It seems like the hotspot engine optimizes the test(...) method to work
better with a certain type of byte buffer.  I inlined the test(...)
method into the two test methods, and now I am getting consistent
result.  Direct buffer outperforms heap buffer for all cases in Java 6,
except for the allocation and deallocation.  If we can provide
reasonable solution for pooling direct buffers, we will be able to gain
more performance out of MINA.

BTW, it is interesting that direct buffer outperforms heap buffer.
Kudos to the JVM developers!

---- REVISED CODE ----
import java.nio.ByteBuffer;

import org.junit.Test;


public class ByteBufferTest {
    @Test
    public void heap() {
        ByteBuffer buf = ByteBuffer.allocate(2048);
        long startTime = System.currentTimeMillis();
        for (int i = 1048576; i > 0; i --) {
            buf.clear();
            while (buf.hasRemaining()) {
                buf.getInt(buf.position());
                buf.putInt((byte) 0);
            }
        }
        long endTime = System.currentTimeMillis();
        System.out.println("heap: " + (endTime - startTime));
    }

    @Test
    public void direct() {
        ByteBuffer buf = ByteBuffer.allocateDirect(2048);
        long startTime = System.currentTimeMillis();
        for (int i = 1048576; i > 0; i --) {
            buf.clear();
            while (buf.hasRemaining()) {
                buf.getInt(buf.position());
                buf.putInt((byte) 0);
            }
        }
        long endTime = System.currentTimeMillis();
        System.out.println("direct: " + (endTime - startTime));
    }

}


-- 
Trustin Lee - Principal Software Engineer, JBoss, Red Hat
--
what we call human nature is actually human habit
--
http://gleamynode.net/

Re: Interesting direct buffer vs heap buffer micro benchmark

Posted by Emmanuel Lecharny <el...@gmail.com>.

"이희승 (Trustin Lee) <tr...@gmail.com>" wrote:
> Hi,
>
>   
<snip/>
> I was also able to observe similar test result in JDK 1.5.0.15, but
> running the heap test first always resulted in extremely slow direct
> buffer performance (3x slowdown), which is also interesting.  Why
> hotspot so inconsistent like this?
>   

I have tested it with IBM JVM, and I got the very same strange 
inconsistency... It would be interesting to profile the test to 
understand what really happens internally.

More results later ...

-- 
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org