You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Andrew Wang (JIRA)" <ji...@apache.org> on 2014/09/29 21:12:34 UTC

[jira] [Commented] (HADOOP-11152) Better random number generator

    [ https://issues.apache.org/jira/browse/HADOOP-11152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152097#comment-14152097 ] 

Andrew Wang commented on HADOOP-11152:
--------------------------------------

Something possibly related that [~jcb] mentioned to me is that it'd be even better if we used a more uniform block placement method. Even with a perfect random generator, we're going to see skew.

One idea is to use something like Mitzenmacher's Power of Two Choices. It's an interesting to think about how we could determine "load" on a DN:

* total # of blocks
* # of blocks assigned to it in the last n minutes
* # of open blocks

These could be mixed together of course too.

> Better random number generator
> ------------------------------
>
>                 Key: HADOOP-11152
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11152
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Luke Lu
>              Labels: newbie++
>
> HDFS-7122 showed that naive ThreadLocal usage of simple LCG based j.u.Random creates unacceptable distribution of random numbers for block placement. Similarly, ThreadLocalRandom in java 7 (same static thread local with synchronized methods overridden) has the same problem. 
> "Better" is defined as better quality and faster than j.u.Random (which is already much faster (20x) than SecureRandom).
> People (e.g. Numerical Recipes) have shown that by combining LCG and XORShift we can have a better fast RNG. It'd be worthwhile to investigate a thread local version of these "better" RNG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)