You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Paritosh Ranjan (Created) (JIRA)" <ji...@apache.org> on 2012/02/23 08:57:49 UTC

[jira] [Created] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
-----------------------------------------------------------------------------------

                 Key: MAHOUT-984
                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
             Project: Mahout
          Issue Type: Sub-task
          Components: Clustering
    Affects Versions: 0.6
            Reporter: Paritosh Ranjan
            Assignee: Paritosh Ranjan
             Fix For: 0.7


Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228179#comment-13228179 ] 

Saikat Kanjilal commented on MAHOUT-984:
----------------------------------------

I will start researching this issue using the CCD class as well
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228246#comment-13228246 ] 

Paritosh Ranjan commented on MAHOUT-984:
----------------------------------------

1) Yes
2) CCD takes a ccThreshold. We need to take it as input and pass it to the CCD.
3) This depends on how the test cases behave after the change. In the end, we need the FuzzyKMeans clustering tested via test case, this is what matters.
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231406#comment-13231406 ] 

Paritosh Ranjan commented on MAHOUT-984:
----------------------------------------

The code has been committed now, so you can take update instead of applying patch.
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229863#comment-13229863 ] 

Saikat Kanjilal commented on MAHOUT-984:
----------------------------------------

Paritosh,
I'm ready to start working on this issue, I was wondering if I could take a look at your implementation that you did for Mahout-981, has that been committed yet, I saw the patch being available so I guessed that it wasn't committed yet?
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243163#comment-13243163 ] 

Saikat Kanjilal commented on MAHOUT-984:
----------------------------------------

Paritosh,
Thanks for the update, sorry my availability is limited at the moment by my day job :)))), anyways as I mentioned I got the tests working when run individually but ran into errors when I ran the tests together.   Let me know more about what changes you make for this one.   Meanwhile I will look at the other two issues and assign one (or both) of those to myself.
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232455#comment-13232455 ] 

Saikat Kanjilal commented on MAHOUT-984:
----------------------------------------

Paritosh,
I am in the middle of the refactoring and had some questions, I removed the clusterDataMR and clusterDataSeq and replaced this with the clusterData similar to what you had setup for the kmeans, however for the fuzzy kmeans there are two additional parameters convergenceDelta and m, I was wondering how and where to take these parameters into account, the signature of the new clusterData function is shown below:

public static void clusterData(Path input,
                                 Path clustersIn,
                                 Path output,
                                 DistanceMeasure measure,
                                 double convergenceDelta,
                                 float m,
                                 boolean emitMostLikely,
                                 double threshold,
                                 boolean runSequential)
    throws IOException, ClassNotFoundException, InterruptedException {
	  if (log.isInfoEnabled()) {
	      log.info("Running Clustering");
	      log.info("Input: {} Clusters In: {} Out: {} Distance: {}", new Object[] {input, clustersIn, output, measure});
	    }
	    ClusterClassifier.writePolicy(new FuzzyKMeansClusteringPolicy(), clustersIn);
	    ClusterClassificationDriver.run(input, output, new Path(output, CLUSTERED_POINTS_DIRECTORY),
	        threshold, true, runSequential);

  }


Let me know your thoughts
Thanks
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243083#comment-13243083 ] 

Paritosh Ranjan commented on MAHOUT-984:
----------------------------------------

Saikat, I am picking this up now since I need to fix this to commit https://issues.apache.org/jira/browse/MAHOUT-989.

Would you like to help on MAHOUT-940 or MAHOUT-966? These two issues are good candidates for starting contribution. Looking forward for your enthusiasm and contribution.
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228238#comment-13228238 ] 

Saikat Kanjilal commented on MAHOUT-984:
----------------------------------------

Paritosh,
I've read through the FuzzyKMeansDriver and I have some initial questions:
1) Do we want to refactor both the sequential and the map-reduce version of the buildClusters?
2) How does the outlier pruning relate to this effort if at all?
3) Do we also need to refactor the tests for this class as well, I'm guessing not since the ClusterClassificationDriver is abstracted away?


Thanks
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240213#comment-13240213 ] 

Saikat Kanjilal commented on MAHOUT-984:
----------------------------------------

Paritosh,
Some updates for you, finally got some time to work on this:

1) I noticed that the following tests failing when run through the unix command line:
  testFuzzyKMeansSeqJob(org.apache.mahout.clustering.fuzzykmeans.TestFuzzyKmeansClustering): 0
  testFuzzyKMeansMRJob(org.apache.mahout.clustering.fuzzykmeans.TestFuzzyKmeansClustering): Cluster Classification Driver Job failed processing file:/tmp/mahout-TestFuzzyKmeansClustering-4873685436613465088/points

2) When I run the TestFuzzyKmeansClustering individually the test always passes

Is there a way to run the class under the eclipse debugger, I cant seem to get any of the breakpoints to be hit, I tried to debug it or running the tests above individually but they both seem to pass


Any thoughts or ideas on things to try, I am getting ready to put in debug statements and wanted to check with you beforehand.

One other thing I was wondering if its the right approach to model the FuzzyKMeansDriver like the KMeansDriver?

Your input is much appreciated
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228246#comment-13228246 ] 

Paritosh Ranjan edited comment on MAHOUT-984 at 3/13/12 6:34 AM:
-----------------------------------------------------------------

1) Yes ( this can be done just by passing a runSequential parameter to CCD )
2) CCD takes a ccThreshold. We need to take it as input and pass it to the CCD.
3) This depends on how the test cases behave after the change. In the end, we need the FuzzyKMeans clustering tested via test case, this is what matters.
                
      was (Author: paritoshranjan):
    1) Yes ( this can be done just by passing a runSequential method to CCD )
2) CCD takes a ccThreshold. We need to take it as input and pass it to the CCD.
3) This depends on how the test cases behave after the change. In the end, we need the FuzzyKMeans clustering tested via test case, this is what matters.
                  
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235423#comment-13235423 ] 

Paritosh Ranjan commented on MAHOUT-984:
----------------------------------------

Debugging the issue might help you find the exact problem.
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228246#comment-13228246 ] 

Paritosh Ranjan edited comment on MAHOUT-984 at 3/13/12 6:34 AM:
-----------------------------------------------------------------

1) Yes ( this can be done just by passing a runSequential parameter to CCD )
2) CCD takes a ccThreshold. We need to take it as input in FuzzyK and pass it to the CCD.
3) This depends on how the test cases behave after the change. In the end, we need the FuzzyKMeans clustering tested via test case, this is what matters.
                
      was (Author: paritoshranjan):
    1) Yes ( this can be done just by passing a runSequential parameter to CCD )
2) CCD takes a ccThreshold. We need to take it as input and pass it to the CCD.
3) This depends on how the test cases behave after the change. In the end, we need the FuzzyKMeans clustering tested via test case, this is what matters.
                  
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236761#comment-13236761 ] 

Saikat Kanjilal commented on MAHOUT-984:
----------------------------------------

Great will do for sure, just was confused when looking at Mahout-994, thanks will keep debugging this.
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231414#comment-13231414 ] 

Saikat Kanjilal commented on MAHOUT-984:
----------------------------------------

Thanks will do and will upload patch when I finish.
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229886#comment-13229886 ] 

Paritosh Ranjan commented on MAHOUT-984:
----------------------------------------

You can apply the patch on the trunk.
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228246#comment-13228246 ] 

Paritosh Ranjan edited comment on MAHOUT-984 at 3/13/12 6:33 AM:
-----------------------------------------------------------------

1) Yes ( this can be done just by passing a runSequential method to CCD )
2) CCD takes a ccThreshold. We need to take it as input and pass it to the CCD.
3) This depends on how the test cases behave after the change. In the end, we need the FuzzyKMeans clustering tested via test case, this is what matters.
                
      was (Author: paritoshranjan):
    1) Yes
2) CCD takes a ccThreshold. We need to take it as input and pass it to the CCD.
3) This depends on how the test cases behave after the change. In the end, we need the FuzzyKMeans clustering tested via test case, this is what matters.
                  
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228246#comment-13228246 ] 

Paritosh Ranjan edited comment on MAHOUT-984 at 3/13/12 6:56 AM:
-----------------------------------------------------------------

1) Both seuential and mapreduce version need refactoring( which can be done just by passing a runSequential parameter to CCD ), however, it is for clusterData and not for buildClusters
2) CCD takes a ccThreshold. We need to take it as input in FuzzyK and pass it to the CCD.
3) This depends on how the test cases behave after the change. In the end, we need the FuzzyKMeans clustering tested via test case, this is what matters.
                
      was (Author: paritoshranjan):
    1) Yes ( this can be done just by passing a runSequential parameter to CCD )
2) CCD takes a ccThreshold. We need to take it as input in FuzzyK and pass it to the CCD.
3) This depends on how the test cases behave after the change. In the end, we need the FuzzyKMeans clustering tested via test case, this is what matters.
                  
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228248#comment-13228248 ] 

Paritosh Ranjan commented on MAHOUT-984:
----------------------------------------

This refactoring is for clusterData phase and not for buildClusters phase.
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235364#comment-13235364 ] 

Saikat Kanjilal commented on MAHOUT-984:
----------------------------------------

Paritosh,
I'm running into a strange issue, I've refactored the FuzzyKMeansDriver similar to KMeansDriver and to use the FuzzyKMeansClusteringPolicy with the other logic being pretty much the same.  The unit test for FuzzyKMeansDriver when run individually passes, however the unit test fails when I go to run all the unit tests together.  I am attaching the clusterData function here, any ideas on this?

Regards


  public static void clusterData(Path input,
                                 Path clustersIn,
                                 Path output,
                                 DistanceMeasure measure,
                                 double convergenceDelta,
                                 float m,
                                 boolean emitMostLikely,
                                 double threshold,
                                 boolean runSequential)
    throws IOException, ClassNotFoundException, InterruptedException {    
    if (log.isInfoEnabled()) {
        log.info("Running Clustering");
        log.info("Input: {} Clusters In: {} Out: {} Distance: {}", new Object[] {input, clustersIn, output, measure});
      }
      ClusterClassifier.writePolicy(new FuzzyKMeansClusteringPolicy((double)m,convergenceDelta), clustersIn);
      ClusterClassificationDriver.run(input, output, new Path(output, CLUSTERED_POINTS_DIRECTORY),
          threshold, true, runSequential);

  }

                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229904#comment-13229904 ] 

Saikat Kanjilal commented on MAHOUT-984:
----------------------------------------

so I went through the patch applying process, I tried recompiling the code using mvn package and this is what I am seeing:


/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[96,0] class, interface, or enum expected
/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[98,0] class, interface, or enum expected
/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[99,0] class, interface, or enum expected
/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[101,0] class, interface, or enum expected
/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[102,0] class, interface, or enum expected
/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[103,0] class, interface, or enum expected
/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[104,0] class, interface, or enum expected
/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[105,0] class, interface, or enum expected
/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[106,0] class, interface, or enum expected
/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[107,0] class, interface, or enum expected
/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[108,0] class, interface, or enum expected
/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[109,0] class, interface, or enum expected
/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[110,0] class, interface, or enum expected
/Applications/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java:[111,0] class, interface, or enum expected


Thoughts on what I may be missing?

                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242585#comment-13242585 ] 

Paritosh Ranjan commented on MAHOUT-984:
----------------------------------------

Yes, you can debug that code in eclipse. No special settings is needed for debugging it, just debug the normal way. 
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236752#comment-13236752 ] 

Paritosh Ranjan commented on MAHOUT-984:
----------------------------------------

Saikat, I am expecting a patch from you on this issue. So, I am not working on it right now. I will inform if I pick up this issue.

I hope you will be able fix the junit test cases with some debugging.
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243290#comment-13243290 ] 

Hudson commented on MAHOUT-984:
-------------------------------

Integrated in Mahout-Quality #1418 (See [https://builds.apache.org/job/Mahout-Quality/1418/])
    MAHOUT-984. Refactored clustering out of FuzzyKMeansDriver using ClusterClassificationDriver.
All junit tests pass. (Revision 1307859)

     Result = SUCCESS
pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1307859
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansClusterMapper.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansDriver.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/fuzzykmeans/TestFuzzyKmeansClustering.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterEvaluator.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/cdbw/TestCDbwEvaluator.java

                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Work started] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Work started) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on MAHOUT-984 started by Paritosh Ranjan.

> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan resolved MAHOUT-984.
------------------------------------

    Resolution: Fixed

Now clustering is being done using ClusterClassificationDriver. FuzzyK already had a threshold, which is being used for outlier removal. All the code is committed. 

Resolving the issue.
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-984) Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning

Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232997#comment-13232997 ] 

Saikat Kanjilal commented on MAHOUT-984:
----------------------------------------

Never mind, figured it out, will be committing patch in the next few days.
                
> Refactor Fuzzy K Means Clustering into a separate post process with outlier pruning
> -----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-984
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-984
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of FuzzyKMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira