You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Isabel Drost (JIRA)" <ji...@apache.org> on 2008/04/08 16:03:29 UTC

[jira] Updated: (MAHOUT-20) Migrate Canopy and KMeans Implementations to Vectors

     [ https://issues.apache.org/jira/browse/MAHOUT-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Isabel Drost updated MAHOUT-20:
-------------------------------

    Attachment: vectorClustering.txt

I have moved the code from the use of Float[] to using Vector instead. Unit tests are all running again - would be nice if someone could have a quick look at the patch and point me to the hideous mistakes I made or point out suggestions for improvement.

> Migrate Canopy and KMeans Implementations to Vectors
> ----------------------------------------------------
>
>                 Key: MAHOUT-20
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-20
>             Project: Mahout
>          Issue Type: Task
>          Components: Clustering
>    Affects Versions: 0.1
>            Reporter: Jeff Eastman
>            Assignee: Isabel Drost
>         Attachments: vectorClustering.txt
>
>
> Canopy and KMeans clustering implementations use Float[] representations instead of the new Vector package. They need to be migrated and the Vector package may need some enhancement to support the notion of payloads. This would be a good project for somebody new to the project who wants to get involved. If somebody wants to implement this, just assign the issue to yourself and I will hold off doing it myself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Updated: (MAHOUT-20) Migrate Canopy and KMeans Implementations to Vectors

Posted by Jeff Eastman <je...@windwardsolutions.com>.
Isabel Drost (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/MAHOUT-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Isabel Drost updated MAHOUT-20:
> -------------------------------
>
>     Attachment: vectorClustering.txt
>
> I have moved the code from the use of Float[] to using Vector instead. Unit tests are all running again - would be nice if someone could have a quick look at the patch and point me to the hideous mistakes I made or point out suggestions for improvement.
>
>   
>> Migrate Canopy and KMeans Implementations to Vectors
>> ----------------------------------------------------
>>
>>                 Key: MAHOUT-20
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-20
>>             Project: Mahout
>>          Issue Type: Task
>>          Components: Clustering
>>    Affects Versions: 0.1
>>            Reporter: Jeff Eastman
>>            Assignee: Isabel Drost
>>         Attachments: vectorClustering.txt
>>
>>
>> Canopy and KMeans clustering implementations use Float[] representations instead of the new Vector package. They need to be migrated and the Vector package may need some enhancement to support the notion of payloads. This would be a good project for somebody new to the project who wants to get involved. If somebody wants to implement this, just assign the issue to yourself and I will hold off doing it myself.
>>     
>
>   

Hi Isabel,

- You might consider using the Vector.divide(double) operation in the 
computeCentroid() methods, but your version is the same as those 
method's implementations.
- I think Point is completely obsolete now and should be removed. There 
are still some dangling dependencies on its formatting and decoding 
operations that require it, however. If those operations were moved 
somewhere else (AbstractVector?) and the test also removed then Point 
could be eliminated.
- It would be good to make your patches from the Mahout directory so the 
paths are relative to that. Your patch applied cleanly with -p7 and all 
the unit tests ran.

+1 If you commit this patch you can clean up the other odds n ends 
another day.

+2 For staying in the game with Ted on the EM thread <grin>. I found the 
exchanges to be most beneficial to my learning process.

Jeff