You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by sam wu <sw...@gmail.com> on 2013/01/12 23:58:47 UTC
MIA code bug?(Resevoir sampling, RandomPointUtil.java)
Hi,
I think in MIA code, ch09, RandomPointUtil.java , chooseRandomPoints()
method is ?????
I guess we'd like to have uniform random sampling (like Resevoir Sampling)
--- old code
public static List<Vector> chooseRandomPoints(Iterable<Vector> vectors,
int k) {
List<Vector> chosenPoints = new ArrayList<Vector>(k);
Random random = RandomUtils.getRandom();
for (Vector value : vectors) {
int currentSize = chosenPoints.size();
if (currentSize < k) {
chosenPoints.add(value);
} else if (random.nextInt(currentSize + 1) == 0) { // with chance
1/(currentSize+1) pick new element
int indexToRemove = random.nextInt(currentSize); // evict one
chosen randomly
chosenPoints.remove(indexToRemove);
chosenPoints.add(value);
}
}
return chosenPoints;
}
--- suggested code
public static List<Vector> chooseRandomPoints(Iterable<Vector> vectors, intk) {
List<Vector> chosenPoints = new ArrayList<Vector>(k);
Random random = RandomUtils.getRandom();
// resevoir sampling
int totalSize = 0;
for (Vector value : vectors) {
totalSize++;
int currentSize = chosenPoints.size();
if (currentSize < k) {
chosenPoints.add(value);
} else if (random.nextInt(totalSize) < currentSize) { // with chance
k/totalSize, pick new element
int indexToRemove = random.nextInt(currentSize); // evict one
chosen randomly
chosenPoints.remove(indexToRemove);
chosenPoints.add(value);
}
}
return chosenPoints;
}
------
Am I right ?
Sam
Re: MIA code bug?(Resevoir sampling, RandomPointUtil.java)
Posted by Ted Dunning <te...@gmail.com>.
Looks like it.
Steps from here include:
1) file a JIRA
2) Can you produce a test case to make the case?
3) one of us commits the fix.
On Sat, Jan 12, 2013 at 2:58 PM, sam wu <sw...@gmail.com> wrote:
> Hi,
>
> I think in MIA code, ch09, RandomPointUtil.java , chooseRandomPoints()
> method is ?????
> I guess we'd like to have uniform random sampling (like Resevoir Sampling)
>
> --- old code
>
> public static List<Vector> chooseRandomPoints(Iterable<Vector> vectors,
> int k) {
>
> List<Vector> chosenPoints = new ArrayList<Vector>(k);
>
> Random random = RandomUtils.getRandom();
>
> for (Vector value : vectors) {
>
> int currentSize = chosenPoints.size();
>
> if (currentSize < k) {
>
> chosenPoints.add(value);
>
> } else if (random.nextInt(currentSize + 1) == 0) { // with chance
> 1/(currentSize+1) pick new element
>
> int indexToRemove = random.nextInt(currentSize); // evict one
> chosen randomly
>
> chosenPoints.remove(indexToRemove);
>
> chosenPoints.add(value);
>
> }
>
> }
>
> return chosenPoints;
>
> }
>
> --- suggested code
>
> public static List<Vector> chooseRandomPoints(Iterable<Vector> vectors,
> intk) {
>
> List<Vector> chosenPoints = new ArrayList<Vector>(k);
>
> Random random = RandomUtils.getRandom();
>
> // resevoir sampling
>
> int totalSize = 0;
>
> for (Vector value : vectors) {
>
> totalSize++;
>
> int currentSize = chosenPoints.size();
>
> if (currentSize < k) {
>
> chosenPoints.add(value);
>
> } else if (random.nextInt(totalSize) < currentSize) { // with chance
> k/totalSize, pick new element
>
> int indexToRemove = random.nextInt(currentSize); // evict one
> chosen randomly
>
> chosenPoints.remove(indexToRemove);
>
> chosenPoints.add(value);
>
> }
>
>
>
> }
>
> return chosenPoints;
>
> }
>
> ------
>
> Am I right ?
>
> Sam
>