You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2020/01/17 22:44:00 UTC

[jira] [Commented] (CODEC-264) murmur3.hash128() does not account for unsigned seed argument

    [ https://issues.apache.org/jira/browse/CODEC-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018374#comment-17018374 ] 

Andy Seaborne commented on CODEC-264:
-------------------------------------

The v1.14 version of {{hash128(byte[], , int seed)}} does now apply the seed mask, contrary to the comments.

Line 805
{noformat}
    @Deprecated
    public static long[] hash128(final byte[] data, final int offset, final int length, final int seed) {
        // ************
        // Note: This fails to apply masking using 0xffffffffL to the seed.
        // ************
        return hash128x64(data, offset, length, seed);
    }
{noformat}

It calls {{hash128x86(byte[],, int seed)}} (exact signature match), not {hash128x86(byte[],, long seed)}} (type conversion).

{{hash128x86(byte[],, int seed)}} applies the mask (checked by debugger walk through in EclipseIDE).

{{hash128(byte[],, int seed)}} should be a call of {{hash128x86(byte[],, long)}} directly.

I think that casting at the call site will do that:
{noformat}
        return hash128x64(data, offset, length, (long)seed);
{noformat}
or for clarity explicitly:
{noformat}
        long seedLong = seed; /* unmasked 32->64 bit extension */
        return hash128x64(data, offset, length, seedLong);
{noformat}

If the private static work function had a different name, then automatic, unmasked conversion would have applied.



> murmur3.hash128() does not account for unsigned seed argument
> -------------------------------------------------------------
>
>                 Key: CODEC-264
>                 URL: https://issues.apache.org/jira/browse/CODEC-264
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: Claude Warren
>            Assignee: Alex Herbert
>            Priority: Major
>             Fix For: 1.14
>
>         Attachments: YonikMurmur3Tests.java
>
>
> The original murmur3_x64_128 code used unsigned int for seed arguments.  Using the equivalent bit patterns in the commons codec version does not yield the same results.
> I believe this is because the commons version does not account for sign extension etc.
> Yonic Seeley [~yonik] has explains the issue in his implementation https://github.com/yonik/java_util/blob/master/src/util/hash/MurmurHash3.java
> He provides a test case to show that his code returns the same answers as the original C/C++ code.  I modified that test to call the codec version to show the error.
> I have attached that test case.
> Given that the original code is in the wild I am uncertain how to fix this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)