You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2014/07/17 20:21:05 UTC

[jira] [Updated] (MAPREDUCE-5962) Support CRC32C in IFile

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated MAPREDUCE-5962:
-----------------------------------

    Attachment: mapreduce-5962.txt

Attached patch adds a new configuration to set the IFile checksum type. I changed the default to CRC32C since it's much faster if you have the native libraries available.

I don't believe this is an incompatible change, since IFiles are only used internal to a single job (written by map, read by reduce). So, one would never have a different version reader compared to writer. That said, if anyone has any issues with this, they can configure the default back to CRC32 cluster-wide.

> Support CRC32C in IFile
> -----------------------
>
>                 Key: MAPREDUCE-5962
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5962
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: performance, task
>    Affects Versions: 2.5.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: mapreduce-5962.txt
>
>
> Currently, the IFile format used by the MR shuffle checksums all data using the zlib CRC32 polynomial. If we allow use of CRC32C instead, we can get a large reduction in CPU usage by leveraging the native hardware CRC32C implementation (approx half a second of CPU time savings per GB checksummed).



--
This message was sent by Atlassian JIRA
(v6.2#6252)