You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Gaurav Redkar (JIRA)" <ji...@apache.org> on 2012/06/22 17:22:43 UTC

[jira] [Commented] (MAHOUT-966) Mismatch in the number of points given by the clusterDumper and ClusterOutputPostProcessor

    [ https://issues.apache.org/jira/browse/MAHOUT-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399375#comment-13399375 ] 

Gaurav Redkar commented on MAHOUT-966:
--------------------------------------

yeah i can try to look into thjs issue. I want a clarification regarding the difference between the variables "numPoints" and "boundPoints"  as mentioned in my previous comment above. 

The point to note is that the size of "boundPoints" ("boundpoints" is a list of points belonging to a cluster) that i tried to print by tweaking the clusterdumper code actually matched the number of points printed in each cluster. so could it be that the "numPoints" was not properly calculated at the end of last iteration before the algorithm terminates..? It is just a guess. I will try to look deeper into it.
                
> Mismatch in the number of points given by the clusterDumper and ClusterOutputPostProcessor
> ------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-966
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-966
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.6
>         Environment: hadoop 0.20.2 mahout 0.6 
>            Reporter: Gaurav Redkar
>            Priority: Minor
>         Attachments: cluster-dumper-output.txt, clusterpp-output.txt, mtestdata.txt, points100dCCNorm.txt
>
>
>  After running the post processor the number of points that each cluster contains is not matching the number of points each cluster should contain as stated by clusterdumper.
>  
> MSV-287{ n=90 c=[0.05195, 0.05675, 0.07151, 0.05713, 0.06946,...}
> MSV-145{ n=90 c=[0.93685, 0.93071, 0.93641, 0.94629, 0.94409,..}
> the n mentioned in clusters-n-final against each cluster is different from the number of points actually contained in d directory for each cluster. Any idea why is this happening ...?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira