You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2007/12/06 16:30:43 UTC
[jira] Created: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Result of HashFunction.hash() contains all identical values
-----------------------------------------------------------
Key: HADOOP-2365
URL: https://issues.apache.org/jira/browse/HADOOP-2365
Project: Hadoop
Issue Type: Bug
Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Andrzej Bialecki
Fix For: 0.16.0
There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2365:
----------------------------------
Priority: Minor (was: Major)
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Priority: Minor
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549165 ]
Hadoop QA commented on HADOOP-2365:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371163/patch.txt
against trunk revision r601818.
@author +1. The patch does not contain any @author tags.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new compiler warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests -1. The patch failed contrib unit tests.
Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1283/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1283/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1283/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1283/console
This message is automatically generated.
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549498 ]
Jim Kellerman commented on HADOOP-2365:
---------------------------------------
Andrzej,
Do you think it is occurring for the same key?
If you could provide your initialization parameters and a test example, that would be very helpful.
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549219 ]
Jim Kellerman commented on HADOOP-2365:
---------------------------------------
The latest build failed on TestTableJoinMapReduce which does not use bloom filters and consequently has no bearing on this patch.
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki updated HADOOP-2365:
--------------------------------------
Attachment: hash-v1.patch
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549492 ]
Andrzej Bialecki commented on HADOOP-2365:
-------------------------------------------
There may be other bugs lurking in BloomFilter / HashFunction. This is very hard to reproduce, but once in a while (once per hundred million keys tested) I'm getting something like this:
java.lang.ArrayIndexOutOfBoundsException: -1215998
at org.onelab.filter.BloomFilter.membershipTest(BloomFilter.java:134)
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2365:
----------------------------------
Status: Patch Available (was: Open)
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549215 ]
Hadoop QA commented on HADOOP-2365:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371179/patch.txt
against trunk revision r601845.
@author +1. The patch does not contain any @author tags.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new compiler warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests -1. The patch failed contrib unit tests.
Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1285/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1285/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1285/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1285/console
This message is automatically generated.
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman reassigned HADOOP-2365:
-------------------------------------
Assignee: Jim Kellerman
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2365:
----------------------------------
Attachment: patch.txt
Fixed test case
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman resolved HADOOP-2365.
-----------------------------------
Resolution: Fixed
Closing this issue. Tracking ArrayIndexOutOfBoundsException in HADOOP-2414
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Priority: Minor
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549385 ]
Hudson commented on HADOOP-2365:
--------------------------------
Integrated in Hadoop-Nightly #325 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/325/])
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by Andrzej Bialecki <ab...@getopt.org>.
Jim Kellerman (JIRA) wrote:
> [ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549100 ]
>
> Jim Kellerman commented on HADOOP-2365:
> ---------------------------------------
>
> -1 on patch. This section of code should read:
>
> {code}
> int[] result = new int[nbHash];
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(JenkinsHash.hash(b, initval)) % maxValue;
> }
> return result;
> {code}
>
Yes, this works too - it shouldn't matter in this specific case.
Jenkins' hash has very good avalanche behavior, so even 1 bit difference
in the initvalue yields a completely different hash.
> However, thanks for finding my stupid mistake.
You're welcome. I'm using this class in a different Hadoop application,
where the problem became immediately apparent when I switched from my
home-grown BloomFilter implementation to this one.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
[jira] Commented: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549100 ]
Jim Kellerman commented on HADOOP-2365:
---------------------------------------
-1 on patch. This section of code should read:
{code}
int[] result = new int[nbHash];
for (int i = 0, initval = 0; i < nbHash; i++) {
initval = result[i] = Math.abs(JenkinsHash.hash(b, initval)) % maxValue;
}
return result;
{code}
However, thanks for finding my stupid mistake.
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2365:
----------------------------------
Status: Patch Available (was: Open)
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2365:
----------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
Committed change. Recent test failure was unrelated to this change.
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Reopened: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman reopened HADOOP-2365:
-----------------------------------
ArrayIndexOutOfBounds exception
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2365:
----------------------------------
Status: Open (was: Patch Available)
Fix test case
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2365) Result of HashFunction.hash()
contains all identical values
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2365:
----------------------------------
Attachment: patch.txt
use the previous result as the next seed.
> Result of HashFunction.hash() contains all identical values
> -----------------------------------------------------------
>
> Key: HADOOP-2365
> URL: https://issues.apache.org/jira/browse/HADOOP-2365
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Reporter: Andrzej Bialecki
> Assignee: Jim Kellerman
> Fix For: 0.16.0
>
> Attachments: hash-v1.patch, patch.txt
>
>
> There is a small bug in HashFunction:112 - initvalue should be changed between the loop iterations in order to spread the hash values over the whole allowed range. Instead the current code uses a fixed initvalue = 0, which gives all identical hash values in the result array. As a result, BloomFilter-s have extremely high rate of false positives.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.