You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Paritosh Ranjan (Created) (JIRA)" <ji...@apache.org> on 2012/02/23 08:53:52 UTC

[jira] [Created] (MAHOUT-983) Refactor Dirichlet Clustering into a separate post process with outlier pruning

Refactor Dirichlet Clustering into a separate post process with outlier pruning
-------------------------------------------------------------------------------

                 Key: MAHOUT-983
                 URL: https://issues.apache.org/jira/browse/MAHOUT-983
             Project: Mahout
          Issue Type: Sub-task
          Components: Clustering
    Affects Versions: 0.6
            Reporter: Paritosh Ranjan
            Assignee: Paritosh Ranjan
             Fix For: 0.7


Use ClusterClassificationDriver to refactor clustering out of DirichletDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-983) Refactor Dirichlet Clustering into a separate post process with outlier pruning

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231767#comment-13231767 ] 

Hudson commented on MAHOUT-983:
-------------------------------

Integrated in Mahout-Quality #1398 (See [https://builds.apache.org/job/Mahout-Quality/1398/])
    MAHOUT-981, MAHOUT-983. Fixing test cases which fail intermittently. 
Build is passing on my machine ( even for the last commit ). 
Tried to identify all test cases, which can fail intermittently and fixed them. (Revision 1301761)

     Result = SUCCESS
pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301761
Files : 
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java

                
> Refactor Dirichlet Clustering into a separate post process with outlier pruning
> -------------------------------------------------------------------------------
>
>                 Key: MAHOUT-983
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-983
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of DirichletDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-983) Refactor Dirichlet Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan updated MAHOUT-983:
-----------------------------------

    Status: Patch Available  (was: In Progress)
    
> Refactor Dirichlet Clustering into a separate post process with outlier pruning
> -------------------------------------------------------------------------------
>
>                 Key: MAHOUT-983
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-983
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of DirichletDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-983) Refactor Dirichlet Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan updated MAHOUT-983:
-----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Refactored clustering out of DirichletDriver using ClusterClassificationDriver. Dirichlet was already having a threshold option. So, the issue has been developed completely now. 
Resolving the issue.
                
> Refactor Dirichlet Clustering into a separate post process with outlier pruning
> -------------------------------------------------------------------------------
>
>                 Key: MAHOUT-983
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-983
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of DirichletDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-983) Refactor Dirichlet Clustering into a separate post process with outlier pruning

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231545#comment-13231545 ] 

Hudson commented on MAHOUT-983:
-------------------------------

Integrated in Mahout-Quality #1397 (See [https://builds.apache.org/job/Mahout-Quality/1397/])
    MAHOUT-981, MAHOUT-983. Refactored K-Means Clustering and Dirichlet Clustering to use ClusterClassificationDriver. 
Using cluster.getModel().configure() in ClusterClassificationDriver in order to configure DirichletCluster for MahalanobisDistanceMeasure. 
Added/fixed test cases by:
Using separate directories in test cases for supplying initial clusters and to store buildClusters to prevent two cluster-*-final files in the same directory.
Writing IntWritable in test cases instead of LongWritable ( As the ClusterClassificationDriver clusters records with IntWritable keys). (Revision 1301654)

     Result = FAILURE
pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301654
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationMapper.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterer.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/iterator/DirichletClusteringPolicy.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterMapper.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterer.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/classify/ClusterClassificationDriverTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestDirichletClustering.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterEvaluator.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/cdbw/TestCDbwEvaluator.java

                
> Refactor Dirichlet Clustering into a separate post process with outlier pruning
> -------------------------------------------------------------------------------
>
>                 Key: MAHOUT-983
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-983
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of DirichletDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Work started] (MAHOUT-983) Refactor Dirichlet Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Work started) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on MAHOUT-983 started by Paritosh Ranjan.

> Refactor Dirichlet Clustering into a separate post process with outlier pruning
> -------------------------------------------------------------------------------
>
>                 Key: MAHOUT-983
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-983
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of DirichletDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-983) Refactor Dirichlet Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229214#comment-13229214 ] 

Paritosh Ranjan commented on MAHOUT-983:
----------------------------------------

The patch is also uploaded on the review board.
                
> Refactor Dirichlet Clustering into a separate post process with outlier pruning
> -------------------------------------------------------------------------------
>
>                 Key: MAHOUT-983
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-983
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of DirichletDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-983) Refactor Dirichlet Clustering into a separate post process with outlier pruning

Posted by "Paritosh Ranjan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan updated MAHOUT-983:
-----------------------------------

    Attachment: MAHOUT-981.txt

Refactored K-Means and Dirichlet to use ClusterClassificationDriver. 

I plan to commit this in a day or two. Please suggest if you see any concern.
                
> Refactor Dirichlet Clustering into a separate post process with outlier pruning
> -------------------------------------------------------------------------------
>
>                 Key: MAHOUT-983
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-983
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Paritosh Ranjan
>            Assignee: Paritosh Ranjan
>              Labels: clustering
>             Fix For: 0.7
>
>         Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of DirichletDriver with outlier pruning support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira