You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jeff Eastman (Created) (JIRA)" <ji...@apache.org> on 2012/03/09 21:08:56 UTC

[jira] [Created] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

Convert Dirichlet buildClusters to use new ClusterIterator
----------------------------------------------------------

                 Key: MAHOUT-990
                 URL: https://issues.apache.org/jira/browse/MAHOUT-990
             Project: Mahout
          Issue Type: Sub-task
          Components: Clustering
    Affects Versions: 0.6
            Reporter: Jeff Eastman
            Assignee: Jeff Eastman
             Fix For: 0.7


Refactor the current Dirichlet implementation to use the ClusterIterator/Classifier implementation. This will replace the mapper, combiner, reducer, clusterer and many unit tests but will not modify the other driver APIs, thus retaining compatibility with existing CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Work started] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

Posted by "Paritosh Ranjan (Work started) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on MAHOUT-990 started by Paritosh Ranjan.

> Convert Dirichlet buildClusters to use new ClusterIterator
> ----------------------------------------------------------
>
>                 Key: MAHOUT-990
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-990
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Refactor the current Dirichlet implementation to use the ClusterIterator/Classifier implementation. This will replace the mapper, combiner, reducer, clusterer and many unit tests but will not modify the other driver APIs, thus retaining compatibility with existing CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

Posted by "Jeff Eastman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270130#comment-13270130 ] 

Jeff Eastman commented on MAHOUT-990:
-------------------------------------

The initial clusters are a bit tricky with the new implementation. Taking this to drive it home.
                
> Convert Dirichlet buildClusters to use new ClusterIterator
> ----------------------------------------------------------
>
>                 Key: MAHOUT-990
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-990
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>             Fix For: 0.7
>
>         Attachments: DirichletUtil.java, MAHOUT-990.txt
>
>
> Refactor the current Dirichlet implementation to use the ClusterIterator/Classifier implementation. This will replace the mapper, combiner, reducer, clusterer and many unit tests but will not modify the other driver APIs, thus retaining compatibility with existing CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

Posted by "Jeff Eastman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Eastman resolved MAHOUT-990.
---------------------------------

    Resolution: Fixed

Committed revision 1336424 that was based upon the above patches. Some changes required to properly initialize prior. All tests run
                
> Convert Dirichlet buildClusters to use new ClusterIterator
> ----------------------------------------------------------
>
>                 Key: MAHOUT-990
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-990
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>             Fix For: 0.7
>
>         Attachments: DirichletUtil.java, MAHOUT-990.txt
>
>
> Refactor the current Dirichlet implementation to use the ClusterIterator/Classifier implementation. This will replace the mapper, combiner, reducer, clusterer and many unit tests but will not modify the other driver APIs, thus retaining compatibility with existing CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

Posted by "Jeff Eastman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Eastman reassigned MAHOUT-990:
-----------------------------------

    Assignee: Jeff Eastman  (was: Paritosh Ranjan)
    
> Convert Dirichlet buildClusters to use new ClusterIterator
> ----------------------------------------------------------
>
>                 Key: MAHOUT-990
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-990
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>             Fix For: 0.7
>
>         Attachments: DirichletUtil.java, MAHOUT-990.txt
>
>
> Refactor the current Dirichlet implementation to use the ClusterIterator/Classifier implementation. This will replace the mapper, combiner, reducer, clusterer and many unit tests but will not modify the other driver APIs, thus retaining compatibility with existing CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

Posted by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245594#comment-13245594 ] 

Paritosh Ranjan commented on MAHOUT-990:
----------------------------------------

I am a bit confused on whether I am loading the initial clusters properly or not. All the junit test cases run successfully, and I have already implemented similar stuff earlier. Still I think a review can help.
                
> Convert Dirichlet buildClusters to use new ClusterIterator
> ----------------------------------------------------------
>
>                 Key: MAHOUT-990
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-990
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Refactor the current Dirichlet implementation to use the ClusterIterator/Classifier implementation. This will replace the mapper, combiner, reducer, clusterer and many unit tests but will not modify the other driver APIs, thus retaining compatibility with existing CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan updated MAHOUT-990:
-----------------------------------

    Attachment: DirichletUtil.java
                MAHOUT-990.txt

Attached the file and DirichletUtil.java ( Its a new file and is not present in the patch ). You will need to apply both to see the changes. There might be few test failures, which I can fix once we are sure that the overall logic of clustering is correct.
                
> Convert Dirichlet buildClusters to use new ClusterIterator
> ----------------------------------------------------------
>
>                 Key: MAHOUT-990
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-990
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>         Attachments: DirichletUtil.java, MAHOUT-990.txt
>
>
> Refactor the current Dirichlet implementation to use the ClusterIterator/Classifier implementation. This will replace the mapper, combiner, reducer, clusterer and many unit tests but will not modify the other driver APIs, thus retaining compatibility with existing CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245590#comment-13245590 ] 

jiraposter@reviews.apache.org commented on MAHOUT-990:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4625/
-----------------------------------------------------------

Review request for mahout.


Summary
-------

MAHOUT-990, Changed DirichletClustering to do buildClusters using ClusterIterator.


This addresses bug MAHOUT-990.
    https://issues.apache.org/jira/browse/MAHOUT-990


Diffs
-----

  trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletDriver.java 1307457 
  trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletMapper.java 1307457 
  trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletReducer.java 1307457 
  trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletState.java 1307457 
  trunk/core/src/main/java/org/apache/mahout/clustering/iterator/ClusterIterator.java 1307457 
  trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java 1307457 

Diff: https://reviews.apache.org/r/4625/diff


Testing
-------

All junit tests pass.


Thanks,

Paritosh


                
> Convert Dirichlet buildClusters to use new ClusterIterator
> ----------------------------------------------------------
>
>                 Key: MAHOUT-990
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-990
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Refactor the current Dirichlet implementation to use the ClusterIterator/Classifier implementation. This will replace the mapper, combiner, reducer, clusterer and many unit tests but will not modify the other driver APIs, thus retaining compatibility with existing CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267590#comment-13267590 ] 

Paritosh Ranjan commented on MAHOUT-990:
----------------------------------------

Sure, I will post the patch soon.
                
> Convert Dirichlet buildClusters to use new ClusterIterator
> ----------------------------------------------------------
>
>                 Key: MAHOUT-990
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-990
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Refactor the current Dirichlet implementation to use the ClusterIterator/Classifier implementation. This will replace the mapper, combiner, reducer, clusterer and many unit tests but will not modify the other driver APIs, thus retaining compatibility with existing CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

Posted by "Jeff Eastman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267461#comment-13267461 ] 

Jeff Eastman commented on MAHOUT-990:
-------------------------------------

Sorry for the long delay. I looked at the review and have trouble following all the details too. Seems like the writeInitialState should be creating the prior models (which it does) and writing the policy but I'm not sure how it's doing that. Could you post the patch so I can see the net effects?
                
> Convert Dirichlet buildClusters to use new ClusterIterator
> ----------------------------------------------------------
>
>                 Key: MAHOUT-990
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-990
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Refactor the current Dirichlet implementation to use the ClusterIterator/Classifier implementation. This will replace the mapper, combiner, reducer, clusterer and many unit tests but will not modify the other driver APIs, thus retaining compatibility with existing CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271975#comment-13271975 ] 

Hudson commented on MAHOUT-990:
-------------------------------

Integrated in Mahout-Quality #1469 (See [https://builds.apache.org/job/Mahout-Quality/1469/])
    MAHOUT-990: fixed problems with patch and all tests and displays run (Revision 1336424)

     Result = SUCCESS
jeastman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336424
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassificationMapper.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/classify/ClusterClassifier.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletCluster.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletClusterer.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletMapper.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletReducer.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/DirichletState.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/dirichlet/models/DistributionDescription.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/iterator/CIMapper.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/iterator/CIReducer.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/iterator/ClusterIterator.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java
* /mahout/trunk/core/src/main/java/org/apache/mahout/common/distance/MahalanobisDistanceMeasure.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestDirichletClustering.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/dirichlet/TestMapReduce.java
* /mahout/trunk/core/src/test/java/org/apache/mahout/clustering/iterator/TestClusterClassifier.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayCanopy.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayClustering.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayDirichlet.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayFuzzyKMeans.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayKMeans.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayMeanShift.java
* /mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/dirichlet/Job.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/TestClusterEvaluator.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/cdbw/TestCDbwEvaluator.java
* /mahout/trunk/integration/src/test/java/org/apache/mahout/clustering/dirichlet/TestL1ModelClustering.java

                
> Convert Dirichlet buildClusters to use new ClusterIterator
> ----------------------------------------------------------
>
>                 Key: MAHOUT-990
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-990
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Jeff Eastman
>             Fix For: 0.7
>
>         Attachments: DirichletUtil.java, MAHOUT-990.txt
>
>
> Refactor the current Dirichlet implementation to use the ClusterIterator/Classifier implementation. This will replace the mapper, combiner, reducer, clusterer and many unit tests but will not modify the other driver APIs, thus retaining compatibility with existing CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (MAHOUT-990) Convert Dirichlet buildClusters to use new ClusterIterator

Posted by "Paritosh Ranjan (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan reassigned MAHOUT-990:
--------------------------------------

    Assignee: Paritosh Ranjan  (was: Jeff Eastman)
    
> Convert Dirichlet buildClusters to use new ClusterIterator
> ----------------------------------------------------------
>
>                 Key: MAHOUT-990
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-990
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Refactor the current Dirichlet implementation to use the ClusterIterator/Classifier implementation. This will replace the mapper, combiner, reducer, clusterer and many unit tests but will not modify the other driver APIs, thus retaining compatibility with existing CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira