You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2022/06/12 18:26:00 UTC

[jira] [Resolved] (HBASE-14738) Backport HBASE-11927 (Use Native Hadoop Library for HFile checksum) to 0.98

     [ https://issues.apache.org/jira/browse/HBASE-14738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Kyle Purtell resolved HBASE-14738.
-----------------------------------------
    Release Note:   (was: Checksumming is cpu intensive. HBase computes additional checksums for HFiles (hdfs does checksums too) and stores them inline with file data. During reading, these checksums are verified to ensure data is not corrupted. This patch tries to use Hadoop Native Library for checksum computation, if it’s available, otherwise falls back to standard Java libraries. Instructions to load NHL in HBase can be found here (http://hbase.apache.org/book.html#hadoop.native.lib).

Default checksum algorithm remains CRC32. The CRC32C is better because of two reasons: 1) CRC32C has better error detection properties, and 2) New Intel processors have a dedicated instruction for crc32c computation (SSE4.2 instruction set)*. This change is fully backward compatible. Also, users should not see any differences except decrease in cpu usage. To use CRC32C, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32C’.

* On linux, run 'cat /proc/cpuinfo’ and look for sse4_2 in list of flags to see if your processor supports SSE4.2. )
      Resolution: Won't Fix

> Backport HBASE-11927 (Use Native Hadoop Library for HFile checksum) to 0.98
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-14738
>                 URL: https://issues.apache.org/jira/browse/HBASE-14738
>             Project: HBase
>          Issue Type: Task
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>         Attachments: HBASE-14738-0.98.patch
>
>
> Profiling 0.98.15 I see 20-30% of CPU time spent in Hadoop's PureJavaCrc32. Not surprising given previous results described on HBASE-11927. Backport.
> There are two issues with the backport:
> # The patch on 11927 changes the default CRC type from CRC32 to CRC32C. Although the changes are backwards compatible -files with either CRC type will be handled correctly in a transparent manner - we should probably leave the default alone in 0.98 and advise users on a site configuration change to use CRC32C if desired, for potential hardware acceleration.
> # Need a shim for differences between Hadoop's DataChecksum type.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)