You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Paritosh Ranjan (Created) (JIRA)" <ji...@apache.org> on 2012/02/23 08:47:51 UTC
[jira] [Created] (MAHOUT-981) Refactor KMeans Clustering into a
separate post process with outlier pruning
Refactor KMeans Clustering into a separate post process with outlier pruning
----------------------------------------------------------------------------
Key: MAHOUT-981
URL: https://issues.apache.org/jira/browse/MAHOUT-981
Project: Mahout
Issue Type: Sub-task
Affects Versions: 0.6
Reporter: Paritosh Ranjan
Assignee: Paritosh Ranjan
Fix For: 0.7
Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a
separate post process with outlier pruning
Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228176#comment-13228176 ]
Paritosh Ranjan commented on MAHOUT-981:
----------------------------------------
Since I have already started this issue, so, if you want, you can help with MAHOUT-984. Its similar to K-means. If you are willing, add a comment on MAHOUT-984 saying you are looking into it. I will inform before working on that issue.
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-981
> URL: https://issues.apache.org/jira/browse/MAHOUT-981
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Assignee: Paritosh Ranjan
> Labels: classification, clustering
> Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a
separate post process with outlier pruning
Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229213#comment-13229213 ]
Paritosh Ranjan commented on MAHOUT-981:
----------------------------------------
The patch is also uploaded on the review board.
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-981
> URL: https://issues.apache.org/jira/browse/MAHOUT-981
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Assignee: Paritosh Ranjan
> Labels: classification, clustering
> Fix For: 0.7
>
> Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-981) Refactor KMeans Clustering into a
separate post process with outlier pruning
Posted by "Paritosh Ranjan (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paritosh Ranjan updated MAHOUT-981:
-----------------------------------
Status: Patch Available (was: In Progress)
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-981
> URL: https://issues.apache.org/jira/browse/MAHOUT-981
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Assignee: Paritosh Ranjan
> Labels: classification, clustering
> Fix For: 0.7
>
> Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a
separate post process with outlier pruning
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231913#comment-13231913 ]
Hudson commented on MAHOUT-981:
-------------------------------
Integrated in Mahout-Quality #1399 (See [https://builds.apache.org/job/Mahout-Quality/1399/])
MAHOUT-981, Added outlier removal option in method and CLI for KMeansDriver. (Revision 1301886)
Result = SUCCESS
pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301886
Files :
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/spectral/kmeans/SpectralKMeansDriver.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/topdown/postprocessor/ClusterCountReaderTest.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayKMeans.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterEvaluator.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/cdbw/TestCDbwEvaluator.java
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-981
> URL: https://issues.apache.org/jira/browse/MAHOUT-981
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Assignee: Paritosh Ranjan
> Labels: classification, clustering
> Fix For: 0.7
>
> Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-981) Refactor KMeans Clustering into a
separate post process with outlier pruning
Posted by "Paritosh Ranjan (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paritosh Ranjan updated MAHOUT-981:
-----------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
Clustering has been refactored using ClusterClassificationDriver and outlier removal capability has been added. The code has been committed.
Resolving the issue.
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-981
> URL: https://issues.apache.org/jira/browse/MAHOUT-981
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Assignee: Paritosh Ranjan
> Labels: classification, clustering
> Fix For: 0.7
>
> Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-981) Refactor KMeans Clustering into a
separate post process with outlier pruning
Posted by "Paritosh Ranjan (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paritosh Ranjan updated MAHOUT-981:
-----------------------------------
Attachment: MAHOUT-981.txt
Refactored K-Means and Dirichlet to use ClusterClassificationDriver.
I plan to commit this in a day or two. Please suggest if you see any concern.
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-981
> URL: https://issues.apache.org/jira/browse/MAHOUT-981
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Assignee: Paritosh Ranjan
> Labels: classification, clustering
> Fix For: 0.7
>
> Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a
separate post process with outlier pruning
Posted by "Saikat Kanjilal (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228156#comment-13228156 ]
Saikat Kanjilal commented on MAHOUT-981:
----------------------------------------
Paritosh,
It looks like you've started on this issue, I have not had any more time to commit my tests to the ClusterClassificationDriver, I was wondering how best to help at this point, should I shift gears and help with this issue or focus on adding tests to the CCD ?
Thoughts
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-981
> URL: https://issues.apache.org/jira/browse/MAHOUT-981
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Assignee: Paritosh Ranjan
> Labels: classification, clustering
> Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a
separate post process with outlier pruning
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231544#comment-13231544 ]
Hudson commented on MAHOUT-981:
-------------------------------
Integrated in Mahout-Quality #1397 (See [https://builds.apache.org/job/Mahout-Quality/1397/])
MAHOUT-981, MAHOUT-983. Refactored K-Means Clustering and Dirichlet Clustering to use ClusterClassificationDriver.
Using cluster.getModel().configure() in ClusterClassificationDriver in order to configure DirichletCluster for MahalanobisDistanceMeasure.
Added/fixed test cases by:
Using separate directories in test cases for supplying initial clusters and to store buildClusters to prevent two cluster-*-final files in the same directory.
Writing IntWritable in test cases instead of LongWritable ( As the ClusterClassificationDriver clusters records with IntWritable keys). (Revision 1301654)
Result = FAILURE
pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301654
Files :
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationMapper.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterMapper.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterer.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/iterator/DirichletClusteringPolicy.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterMapper.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterer.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/canopy/TestCanopyCreation.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/classify/ClusterClassificationDriverTest.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestDirichletClustering.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterEvaluator.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/cdbw/TestCDbwEvaluator.java
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-981
> URL: https://issues.apache.org/jira/browse/MAHOUT-981
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Assignee: Paritosh Ranjan
> Labels: classification, clustering
> Fix For: 0.7
>
> Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a
separate post process with outlier pruning
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232269#comment-13232269 ]
Hudson commented on MAHOUT-981:
-------------------------------
Integrated in Mahout-Quality #1401 (See [https://builds.apache.org/job/Mahout-Quality/1401/])
Mahout-981, Changed key to WritableComparable<?> to fix the reuters examples build. Now any type can be feeded as a key in the input sequence file. (Revision 1302100)
Result = SUCCESS
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-981
> URL: https://issues.apache.org/jira/browse/MAHOUT-981
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Assignee: Paritosh Ranjan
> Labels: classification, clustering
> Fix For: 0.7
>
> Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a
separate post process with outlier pruning
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231766#comment-13231766 ]
Hudson commented on MAHOUT-981:
-------------------------------
Integrated in Mahout-Quality #1398 (See [https://builds.apache.org/job/Mahout-Quality/1398/])
MAHOUT-981, MAHOUT-983. Fixing test cases which fail intermittently.
Build is passing on my machine ( even for the last commit ).
Tried to identify all test cases, which can fail intermittently and fixed them. (Revision 1301761)
Result = SUCCESS
pranjan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301761
Files :
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestKmeansClustering.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterDumper.java
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-981
> URL: https://issues.apache.org/jira/browse/MAHOUT-981
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Assignee: Paritosh Ranjan
> Labels: classification, clustering
> Fix For: 0.7
>
> Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-981) Refactor KMeans Clustering into a
separate post process with outlier pruning
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234184#comment-13234184 ]
Hudson commented on MAHOUT-981:
-------------------------------
Integrated in Mahout-Quality #1405 (See [https://builds.apache.org/job/Mahout-Quality/1405/])
Mahout-981, Fixing test cases which are keeping clusters-*-final in the same directory for canopy and kmeans. (Revision 1303282)
Result = SUCCESS
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-981
> URL: https://issues.apache.org/jira/browse/MAHOUT-981
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Assignee: Paritosh Ranjan
> Labels: classification, clustering
> Fix For: 0.7
>
> Attachments: MAHOUT-981.txt
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (MAHOUT-981) Refactor KMeans Clustering into
a separate post process with outlier pruning
Posted by "Paritosh Ranjan (Work started) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on MAHOUT-981 started by Paritosh Ranjan.
> Refactor KMeans Clustering into a separate post process with outlier pruning
> ----------------------------------------------------------------------------
>
> Key: MAHOUT-981
> URL: https://issues.apache.org/jira/browse/MAHOUT-981
> Project: Mahout
> Issue Type: Sub-task
> Components: Classification, Clustering
> Affects Versions: 0.6
> Reporter: Paritosh Ranjan
> Assignee: Paritosh Ranjan
> Labels: classification, clustering
> Fix For: 0.7
>
>
> Use ClusterClassificationDriver to refactor clustering out of KMeansDriver with outlier pruning support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira