You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Alex Herbert (Jira)" <ji...@apache.org> on 2019/11/21 12:59:00 UTC

[jira] [Commented] (CODEC-267) MurmurHash3.hash32() does not process trailing bytes as unsigned

    [ https://issues.apache.org/jira/browse/CODEC-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979251#comment-16979251 ] 

Alex Herbert commented on CODEC-267:
------------------------------------

Another bug related to sign extension is in hash128()

> MurmurHash3.hash32() does not process trailing bytes as unsigned
> ----------------------------------------------------------------
>
>                 Key: CODEC-267
>                 URL: https://issues.apache.org/jira/browse/CODEC-267
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: Claude Warren
>            Assignee: Alex Herbert
>            Priority: Major
>
> The hash32() algorithm processes blocks of 4 bytes. Trailing bytes of 1, 2 or 3 that are negative are not masked to unsigned leading to an error.
> This test passes using data generated from the Python mmh3 library which calls the MurmurHash3 c++ code (modified for Python):
> {code:java}
> /**
>  * Test to demonstrate the errors in
>  * {@link MurmurHash3#hash32(byte[], int, int, int)}
>  * if the final 1, 2, or 3 bytes are negative.
>  */
> @Test
> public void testHash32With1TrailingSignedByteIsInvalid() {
>     // Generate test data:
>     // import mmh3
>     // import numpy as np
>     // mmh3.hash(np.uint8([-1]))
>     // mmh3.hash(np.uint8([0, -1]))
>     // mmh3.hash(np.uint8([0, 0, -1]))
>     // mmh3.hash(np.uint8([-1, 0]))
>     // mmh3.hash(np.uint8([-1, 0, 0]))
>     // mmh3.hash(np.uint8([0, -1, 0]))
>     Assert.assertNotEquals(-43192051, MurmurHash3.hash32(new byte[] {-1}, 0, 1, 0));
>     Assert.assertNotEquals(-582037868, MurmurHash3.hash32(new byte[] {0, -1}, 0, 1, 0));
>     Assert.assertNotEquals(922088087, MurmurHash3.hash32(new byte[] {0, 0, -1}, 0, 1, 0));
>     Assert.assertNotEquals(-1309567588, MurmurHash3.hash32(new byte[] {-1, 0}, 0, 1, 0));
>     Assert.assertNotEquals(-363779670, MurmurHash3.hash32(new byte[] {-1, 0, 0}, 0, 1, 0));
>     Assert.assertNotEquals(-225068062, MurmurHash3.hash32(new byte[] {0, -1, 0}, 0, 1, 0));
> }
> {code}
> This test passes with {{assertEquals}} when the code is fixed to apply masking to the final 3 bytes:
> {code:java}
>         case 3:
>             k1 ^= (data[index + 2] & 0xff) << 16;
>         case 2:
>             k1 ^= (data[index + 1] & 0xff) << 8;
>         case 1:
>             k1 ^= (data[index] & 0xff);
> {code}
> Fixing this error will be a behavioural change.
> It is recommended to leave this method alone and implement a new hash32x86 method that should match the {{MurmurHash3_x86_32}} method from the c++ source code.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)