You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2020/11/11 17:01:52 UTC

[GitHub] [kafka] viktorsomogyi edited a comment on pull request #9519: KAFKA-10650: Use Murmur3 instead of MD5 in SkimpyOffsetMap

viktorsomogyi edited a comment on pull request #9519:
URL: https://github.com/apache/kafka/pull/9519#issuecomment-725539926


   @ijuma I don't think I looked at it but will try it out soon, its speed seems promising. I haven't found a 128bit java implementation but the 64bit in lz4 can be tried out as well. I need to see if the collision rate changes significantly if we reduce the hash size to 64. Theoretically for 100 million distinct keys the probability of collision is 1.469367×10^-23 with 18 bit while for 64 bits it's 2.710138×10^-4 which is significantly larger but it might be enough for the log cleaning use-case.
   
   @junrao I can compile a statistic for this but in my tests the collision rate was on the same level, sometimes slightly better, sometimes slightly worse. The attached image is what I collected manually for the largest test cases but I'll do something more elaborate if you think so :) - but based on this I think yes, the uniqueness of the generated hashes is on the same level.
   ![image](https://user-images.githubusercontent.com/1820518/98840553-3aa37180-2447-11eb-87e4-198856261d23.png)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org