You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2020/01/17 22:44:00 UTC
[jira] [Commented] (CODEC-264) murmur3.hash128() does not account
for unsigned seed argument
[ https://issues.apache.org/jira/browse/CODEC-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018374#comment-17018374 ]
Andy Seaborne commented on CODEC-264:
-------------------------------------
The v1.14 version of {{hash128(byte[], , int seed)}} does now apply the seed mask, contrary to the comments.
Line 805
{noformat}
@Deprecated
public static long[] hash128(final byte[] data, final int offset, final int length, final int seed) {
// ************
// Note: This fails to apply masking using 0xffffffffL to the seed.
// ************
return hash128x64(data, offset, length, seed);
}
{noformat}
It calls {{hash128x86(byte[],, int seed)}} (exact signature match), not {hash128x86(byte[],, long seed)}} (type conversion).
{{hash128x86(byte[],, int seed)}} applies the mask (checked by debugger walk through in EclipseIDE).
{{hash128(byte[],, int seed)}} should be a call of {{hash128x86(byte[],, long)}} directly.
I think that casting at the call site will do that:
{noformat}
return hash128x64(data, offset, length, (long)seed);
{noformat}
or for clarity explicitly:
{noformat}
long seedLong = seed; /* unmasked 32->64 bit extension */
return hash128x64(data, offset, length, seedLong);
{noformat}
If the private static work function had a different name, then automatic, unmasked conversion would have applied.
> murmur3.hash128() does not account for unsigned seed argument
> -------------------------------------------------------------
>
> Key: CODEC-264
> URL: https://issues.apache.org/jira/browse/CODEC-264
> Project: Commons Codec
> Issue Type: Bug
> Affects Versions: 1.13
> Reporter: Claude Warren
> Assignee: Alex Herbert
> Priority: Major
> Fix For: 1.14
>
> Attachments: YonikMurmur3Tests.java
>
>
> The original murmur3_x64_128 code used unsigned int for seed arguments. Using the equivalent bit patterns in the commons codec version does not yield the same results.
> I believe this is because the commons version does not account for sign extension etc.
> Yonic Seeley [~yonik] has explains the issue in his implementation https://github.com/yonik/java_util/blob/master/src/util/hash/MurmurHash3.java
> He provides a test case to show that his code returns the same answers as the original C/C++ code. I modified that test to call the codec version to show the error.
> I have attached that test case.
> Given that the original code is in the wild I am uncertain how to fix this issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)