You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by sam wu <sw...@gmail.com> on 2013/01/12 23:58:47 UTC

MIA code bug?(Resevoir sampling, RandomPointUtil.java)

Hi,

I think in MIA code, ch09, RandomPointUtil.java , chooseRandomPoints()
method is ?????
I guess we'd like to have uniform random sampling (like Resevoir Sampling)

--- old code

  public static List<Vector> chooseRandomPoints(Iterable<Vector> vectors,
int k) {

    List<Vector> chosenPoints = new ArrayList<Vector>(k);

    Random random = RandomUtils.getRandom();

    for (Vector value : vectors) {

      int currentSize = chosenPoints.size();

      if (currentSize < k) {

        chosenPoints.add(value);

      } else if (random.nextInt(currentSize + 1) == 0) { // with chance
1/(currentSize+1) pick new element

        int indexToRemove = random.nextInt(currentSize); // evict one
chosen randomly

        chosenPoints.remove(indexToRemove);

        chosenPoints.add(value);

      }

    }

    return chosenPoints;

  }

---  suggested code

 public static List<Vector> chooseRandomPoints(Iterable<Vector> vectors, intk) {

    List<Vector> chosenPoints = new ArrayList<Vector>(k);

    Random random = RandomUtils.getRandom();

    // resevoir sampling

    int totalSize = 0;

    for (Vector value : vectors) {

    totalSize++;

      int currentSize = chosenPoints.size();

      if (currentSize < k) {

        chosenPoints.add(value);

      } else if (random.nextInt(totalSize) < currentSize) { // with chance
k/totalSize, pick new element

        int indexToRemove = random.nextInt(currentSize); // evict one
chosen randomly

        chosenPoints.remove(indexToRemove);

        chosenPoints.add(value);

      }



    }

    return chosenPoints;

  }

------

Am I right ?

Sam

Re: MIA code bug?(Resevoir sampling, RandomPointUtil.java)

Posted by Ted Dunning <te...@gmail.com>.
Looks like it.

Steps from here include:

1) file a JIRA

2) Can you produce a test case to make the case?

3) one of us commits the fix.



On Sat, Jan 12, 2013 at 2:58 PM, sam wu <sw...@gmail.com> wrote:

> Hi,
>
> I think in MIA code, ch09, RandomPointUtil.java , chooseRandomPoints()
> method is ?????
> I guess we'd like to have uniform random sampling (like Resevoir Sampling)
>
> --- old code
>
>   public static List<Vector> chooseRandomPoints(Iterable<Vector> vectors,
> int k) {
>
>     List<Vector> chosenPoints = new ArrayList<Vector>(k);
>
>     Random random = RandomUtils.getRandom();
>
>     for (Vector value : vectors) {
>
>       int currentSize = chosenPoints.size();
>
>       if (currentSize < k) {
>
>         chosenPoints.add(value);
>
>       } else if (random.nextInt(currentSize + 1) == 0) { // with chance
> 1/(currentSize+1) pick new element
>
>         int indexToRemove = random.nextInt(currentSize); // evict one
> chosen randomly
>
>         chosenPoints.remove(indexToRemove);
>
>         chosenPoints.add(value);
>
>       }
>
>     }
>
>     return chosenPoints;
>
>   }
>
> ---  suggested code
>
>  public static List<Vector> chooseRandomPoints(Iterable<Vector> vectors,
> intk) {
>
>     List<Vector> chosenPoints = new ArrayList<Vector>(k);
>
>     Random random = RandomUtils.getRandom();
>
>     // resevoir sampling
>
>     int totalSize = 0;
>
>     for (Vector value : vectors) {
>
>     totalSize++;
>
>       int currentSize = chosenPoints.size();
>
>       if (currentSize < k) {
>
>         chosenPoints.add(value);
>
>       } else if (random.nextInt(totalSize) < currentSize) { // with chance
> k/totalSize, pick new element
>
>         int indexToRemove = random.nextInt(currentSize); // evict one
> chosen randomly
>
>         chosenPoints.remove(indexToRemove);
>
>         chosenPoints.add(value);
>
>       }
>
>
>
>     }
>
>     return chosenPoints;
>
>   }
>
> ------
>
> Am I right ?
>
> Sam
>