You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Isabel Drost (JIRA)" <ji...@apache.org> on 2009/12/04 17:20:20 UTC

[jira] Commented: (MAHOUT-11) Static fields used throughout clustering code (Canopy, K-Means).

    [ https://issues.apache.org/jira/browse/MAHOUT-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785985#action_12785985 ] 

Isabel Drost commented on MAHOUT-11:
------------------------------------

Applies cleanly and builds w/o unit test failures here.

The changes look all good to me. Great work, Drew.

One question though: In the TestMeanShift test (lines 301 and 304) you removed the canopyId adjustments - could you please explain what was the reason this was necessary?

I would like to commit this patch next week if noone objects.

> Static fields used throughout clustering code (Canopy, K-Means).
> ----------------------------------------------------------------
>
>                 Key: MAHOUT-11
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-11
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.1
>            Reporter: Dawid Weiss
>             Fix For: 0.3
>
>         Attachments: MAHOUT-11-all-cleanup-20091128.patch, MAHOUT-11-kmeans-cleanup.patch, MAHOUT-11-RandomSeedGenerator.patch, MAHOUT-11.patch
>
>
> I file this as a bug, even though I'm not 100% sure it is one. In the currect code the information is exchanged via static fields (for example, distance measure and thresholds for Canopies are static field). Is it always true in Hadoop that one job runs inside one JVM with exclusive access? I haven't seen it anywhere in Hadoop documentation and my impression was that everything uses JobConf to pass configuration to jobs, but jobs are configured on a per-object basis (a job is an object, a mapper is an object and everything else is basically an object).
> If it's possible for two jobs to run in parallel inside one JVM then this is a limitation and bug in our code that needs to be addressed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.