You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Paritosh Ranjan (Created) (JIRA)" <ji...@apache.org> on 2012/02/23 08:47:51 UTC

[jira] [Created] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

Refactor KMeans Clustering into a separate post process with outlier pruning
----------------------------------------------------------------------------

                 Key: MAHOUT-981
                 URL: https://issues.apache.org/jira/browse/MAHOUT-981
             Project: Mahout
          Issue Type: Sub-task
    Affects Versions: 0.6
            Reporter: Paritosh Ranjan
            Assignee: Paritosh Ranjan
             Fix For: 0.7


Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228176#comment-13228176 ] 

Paritosh Ranjan commented on MAHOUT-981:
----------------------------------------

Since I have already started this issue, so, if you want, you can help with MAHOUT-984. Its similar to K-means. If you are willing, add a comment on MAHOUT-984 saying you are looking into it. I will inform before working on that issue.
                
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-981
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-981
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: classification, clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229213#comment-13229213 ] 

Paritosh Ranjan commented on MAHOUT-981:
----------------------------------------

The patch is also uploaded on the review board.
                
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-981
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-981
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: classification, clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan updated MAHOUT-981:
-----------------------------------

    Status: Patch Available  (was: In Progress)
    
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-981
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-981
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: classification, clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231913#comment-13231913 ] 

Hudson commented on MAHOUT-981:
-------------------------------

Integrated in Mahout-Quality #1399 (See [https://builds.apache.org/job/Mahout-Quality/1399/])
    MAHOUT-981, Added outlier removal option in method and CLI for KMeansDriver. (Revision 1301886)

     Result = SUCCESS
pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301886
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/kmeans/SpectralKMeansDriver.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterCountReaderTest.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayKMeans.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterEvaluator.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/cdbw/TestCDbwEvaluator.java

                
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-981
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-981
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: classification, clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan updated MAHOUT-981:
-----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Clustering has been refactored using ClusterClassificationDriver and outlier removal capability has been added. The code has been committed.

Resolving the issue.
                
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-981
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-981
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: classification, clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan updated MAHOUT-981:
-----------------------------------

    Attachment: MAHOUT-981.txt

Refactored K-Means and Dirichlet to use ClusterClassificationDriver. 

I plan to commit this in a day or two. Please suggest if you see any concern.
                
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-981
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-981
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: classification, clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228156#comment-13228156 ] 

Saikat Kanjilal commented on MAHOUT-981:
----------------------------------------

Paritosh,
It looks like you've started on this issue, I have not had any more time to commit my tests to the ClusterClassificationDriver, I was wondering how best to help at this point, should I shift gears and help with this issue or focus on adding tests to the CCD ?

Thoughts
                
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-981
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-981
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: classification, clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231544#comment-13231544 ] 

Hudson commented on MAHOUT-981:
-------------------------------

Integrated in Mahout-Quality #1397 (See [https://builds.apache.org/job/Mahout-Quality/1397/])
    MAHOUT-981, MAHOUT-983. Refactored K-Means Clustering and Dirichlet Clustering to use ClusterClassificationDriver. 
Using cluster.getModel().configure() in ClusterClassificationDriver in order to configure DirichletCluster for MahalanobisDistanceMeasure. 
Added/fixed test cases by:
Using separate directories in test cases for supplying initial clusters and to store buildClusters to prevent two cluster-*-final files in the same directory.
Writing IntWritable in test cases instead of LongWritable ( As the ClusterClassificationDriver clusters records with IntWritable keys). (Revision 1301654)

     Result = FAILURE
pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301654
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationMapper.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterer.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/iterator/DirichletClusteringPolicy.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterMapper.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterer.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/classify/ClusterClassificationDriverTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestDirichletClustering.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterEvaluator.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/cdbw/TestCDbwEvaluator.java

                
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-981
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-981
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: classification, clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232269#comment-13232269 ] 

Hudson commented on MAHOUT-981:
-------------------------------

Integrated in Mahout-Quality #1401 (See [https://builds.apache.org/job/Mahout-Quality/1401/])
    Mahout-981, Changed key to WritableComparable<?> to fix the reuters examples build. Now any type can be feeded as a key in the input sequence file. (Revision 1302100)

     Result = SUCCESS
                
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-981
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-981
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: classification, clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231766#comment-13231766 ] 

Hudson commented on MAHOUT-981:
-------------------------------

Integrated in Mahout-Quality #1398 (See [https://builds.apache.org/job/Mahout-Quality/1398/])
    MAHOUT-981, MAHOUT-983. Fixing test cases which fail intermittently. 
Build is passing on my machine ( even for the last commit ). 
Tried to identify all test cases, which can fail intermittently and fixed them. (Revision 1301761)

     Result = SUCCESS
pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301761
Files : 
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java

                
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-981
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-981
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: classification, clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234184#comment-13234184 ] 

Hudson commented on MAHOUT-981:
-------------------------------

Integrated in Mahout-Quality #1405 (See [https://builds.apache.org/job/Mahout-Quality/1405/])
    Mahout-981, Fixing test cases which are keeping clusters-*-final in the same directory for canopy and kmeans. (Revision 1303282)

     Result = SUCCESS
                
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-981
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-981
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: classification, clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Work started] (MAHOUT-981) Refactor KMeans Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Work started) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on MAHOUT-981 started by Paritosh Ranjan.

> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
>                 Key: MAHOUT-981
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-981
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Classification, Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: classification, clustering
>             Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira