You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Viktor Somogyi-Vass (Jira)" <ji...@apache.org> on 2020/10/27 15:22:00 UTC

[jira] [Created] (KAFKA-10650) Use Murmur3 hashing instead of MD5 in SkimpyOffsetMap

Viktor Somogyi-Vass created KAFKA-10650:
-------------------------------------------

             Summary: Use Murmur3 hashing instead of MD5 in SkimpyOffsetMap
                 Key: KAFKA-10650
                 URL: https://issues.apache.org/jira/browse/KAFKA-10650
             Project: Kafka
          Issue Type: Improvement
          Components: core
            Reporter: Viktor Somogyi-Vass
            Assignee: Viktor Somogyi-Vass


The usage of MD5 has been uncovered during testing Kafka for FIPS (Federal Information Processing Standards) verification.

While MD5 isn't a FIPS incompatibility here as it isn't used for cryptographic purposes, I spent some time with this as it isn't ideal either. MD5 is a relatively fast crypto hashing algo but there are much better performing algorithms for hash tables as it's used in SkimpyOffsetMap.

By applying Murmur3 (that is implemented in Streams) I could achieve a 3x faster {{put}} operation and the overall segment cleaning sped up by 30% while preserving the same collision rate (both performed within 0.0015 - 0.007, mostly with 0.004 median).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)