You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datafu.apache.org by "jian wang (JIRA)" <ji...@apache.org> on 2014/01/25 05:26:38 UTC

[jira] [Created] (DATAFU-21) Probability weighted sampling without reservoir

jian wang created DATAFU-21:
-------------------------------

             Summary: Probability weighted sampling without reservoir
                 Key: DATAFU-21
                 URL: https://issues.apache.org/jira/browse/DATAFU-21
             Project: DataFu
          Issue Type: New Feature
         Environment: Mac OS, Linux
            Reporter: jian wang


This issue is used to track investigation on finding a weighted sampler without using internal reservoir. 

At present, the SimpleRandomSample has implemented a good acceptance-rejection sampling algo on probability random sampling. The weighted sampler could utilize the simple random sample with slight modification.

One slight modification is:  the present simple random sample generates a uniform random number lies between (0, 1) as the random variable to accept or reject an item. The weighted sample may generate this random variable based on the item's weight and this random number still lies between (0, 1) and each item's random variable remain independent between each other.

Need further think the correctness of this solution and how to implement it in an effective way.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)