You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Samee Zahur (JIRA)" <ji...@apache.org> on 2008/04/09 16:54:25 UTC

[jira] Commented: (MAHOUT-20) Migrate Canopy and KMeans Implementations to Vectors

    [ https://issues.apache.org/jira/browse/MAHOUT-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587223#action_12587223 ] 

Samee Zahur commented on MAHOUT-20:
-----------------------------------

some of the fuctions like add or distance seem to be iterating through each dimention in the point in a conventional loop: 
for(int i=0;i<z.cardinality();i++) ......
something like this. but in a high dimentional input, this seems to be cancelling out most of the advantages gained by the use of SparseVector. I mean we are not taking advantage of the sparseness of the input data and looping through all the elements in all cases. One possible alternative might be to add a sort of iterator mechanism in the Vector interface. That would only visit non-null elements. 

Samee

> Migrate Canopy and KMeans Implementations to Vectors
> ----------------------------------------------------
>
>                 Key: MAHOUT-20
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-20
>             Project: Mahout
>          Issue Type: Task
>          Components: Clustering
>    Affects Versions: 0.1
>            Reporter: Jeff Eastman
>            Assignee: Isabel Drost
>         Attachments: vectorClustering.txt
>
>
> Canopy and KMeans clustering implementations use Float[] representations instead of the new Vector package. They need to be migrated and the Vector package may need some enhancement to support the notion of payloads. This would be a good project for somebody new to the project who wants to get involved. If somebody wants to implement this, just assign the issue to yourself and I will hold off doing it myself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.