You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2013/06/05 17:52:20 UTC

[jira] [Commented] (HADOOP-9601) Support native CRC on byte arrays

    [ https://issues.apache.org/jira/browse/HADOOP-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676055#comment-13676055 ] 

Todd Lipcon commented on HADOOP-9601:
-------------------------------------

Looks like a good start. A few thoughts on performance:

- rather than using JNI to call back into ByteBuffer.hasArray, I think it would be better to introduce a new native call which just takes the array directly. The "call backs" into Java functions from JNI are going to be much slower since they don't get inlined, etc, whereas the "hasArray()" checks from Java will be JITted nicely.

- In the case of arrays, we should "chunk" the GetPrimitiveArrayCritical calls to not grab more than maybe 256KB at a time. Otherwise you can run into issues where CRC calculation in one thread blocks all other threads at a pre-GC safepoint. That was one of the reasons we switched to Pure Java CRC a couple years back.

Did you try running the CRC benchmark tests? I think there are some floating around that compare direct buffer CRC performance vs array, etc.
                
> Support native CRC on byte arrays
> ---------------------------------
>
>                 Key: HADOOP-9601
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9601
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: performance, util
>    Affects Versions: 3.0.0
>            Reporter: Todd Lipcon
>         Attachments: HADOOP-9601-WIP-01.patch
>
>
> When we first implemented the Native CRC code, we only did so for direct byte buffers, because these correspond directly to native heap memory and thus make it easy to access via JNI. We'd generally assumed that accessing byte[] arrays from JNI was not efficient enough, but now that I know more about JNI I don't think that's true -- we just need to make sure that the critical sections where we lock the buffers are short.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira