You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Grant Ingersoll (JIRA)" <ji...@apache.org> on 2008/12/07 03:01:44 UTC

[jira] Commented: (MAHOUT-99) Improving speed of KMeans

    [ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654168#action_12654168 ] 

Grant Ingersoll commented on MAHOUT-99:
---------------------------------------

Hi Pallavi,

The core code works, but the change to the KMeansDriver causes a compile error in examples in the Kmeans demo code b/c it now asks for the number of map tasks and the number of centroids.  Could you document these new parameters and put in reasonable defaults and update the patch?

One thing I'm not certain of, though, is why we need to pass in the number of map tasks, isn't that a config thing already when you setup Hadoop?  

> Improving speed of KMeans
> -------------------------
>
>                 Key: MAHOUT-99
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-99
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Assignee: Grant Ingersoll
>         Attachments: MAHOUT-99.patch
>
>
> Improved the speed of KMeans by passing only cluster ID from mapper to reducer. Previously, whole Cluster Info as formatted s`tring was being sent.
> Also removed the implicit assumption of Combiner runs only once approach and the code is modified accordingly so that it won't create a bug when combiner runs zero or more than once.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


RE: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

Posted by "Uppuluri, Rohini" <ro...@corp.aol.com>.
Hi Grant, 

I am Rohini and work in the same team as Pallavi is. Pallavi is out of
Office till the end of this month. I will be taking care of this issue
now. 

I will look into the issue you have pointed out and get back to you. 

Thanks, 
-Rohini


-----Original Message-----
From: Grant Ingersoll (JIRA) [mailto:jira@apache.org] 
Sent: Sunday, December 07, 2008 7:32 AM
To: mahout-dev@lucene.apache.org
Subject: [jira] Commented: (MAHOUT-99) Improving speed of KMeans


    [
https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.
plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654168#
action_12654168 ] 

Grant Ingersoll commented on MAHOUT-99:
---------------------------------------

Hi Pallavi,

The core code works, but the change to the KMeansDriver causes a compile
error in examples in the Kmeans demo code b/c it now asks for the number
of map tasks and the number of centroids.  Could you document these new
parameters and put in reasonable defaults and update the patch?

One thing I'm not certain of, though, is why we need to pass in the
number of map tasks, isn't that a config thing already when you setup
Hadoop?  

> Improving speed of KMeans
> -------------------------
>
>                 Key: MAHOUT-99
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-99
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Assignee: Grant Ingersoll
>         Attachments: MAHOUT-99.patch
>
>
> Improved the speed of KMeans by passing only cluster ID from mapper to
reducer. Previously, whole Cluster Info as formatted s`tring was being
sent.
> Also removed the implicit assumption of Combiner runs only once
approach and the code is modified accordingly so that it won't create a
bug when combiner runs zero or more than once.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.