You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (Updated) (JIRA)" <ji...@apache.org> on 2011/11/28 11:42:40 UTC

[jira] [Updated] (MAHOUT-900) RandomSeedGenerator samples / output k texts incorrectly

     [ https://issues.apache.org/jira/browse/MAHOUT-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-900:
-----------------------------

    Attachment: MAHOUT-900.patch
    
> RandomSeedGenerator samples / output k texts incorrectly
> --------------------------------------------------------
>
>                 Key: MAHOUT-900
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-900
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Sean Owen
>            Assignee: Robin Anil
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: MAHOUT-900.patch
>
>
>           int currentSize = chosenTexts.size();
>           if (currentSize < k) {
>             chosenTexts.add(newText);
>             chosenClusters.add(newCluster);
>           } else if (random.nextInt(currentSize + 1) == 0) { // with chance 1/(currentSize+1) pick new element
>             int indexToRemove = random.nextInt(currentSize); // evict one chosen randomly
>             chosenTexts.remove(indexToRemove);
>             chosenClusters.remove(indexToRemove);
>             chosenTexts.add(newText);
>             chosenClusters.add(newCluster);
>           }
> The second "if" condition ought to be "!= 0", right? Only if it is 0 do we skip the body, which removes an existing element, since the new element itself is evicted.
> Second, this code:
>         for (int i = 0; i < k; i++) {
>           writer.append(chosenTexts.get(i), chosenClusters.get(i));
>         }
> ... assumes that at least k elements existed in the input, and fails otherwise. Probably need to cap this.
> Patch attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira