You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Vasil Vasilev (JIRA)" <ji...@apache.org> on 2011/03/06 14:11:25 UTC

[jira] Created: (MAHOUT-616) Cannot use MahalanobisDistanceMeasure with Dirichlet clustering

Cannot use MahalanobisDistanceMeasure with Dirichlet clustering
---------------------------------------------------------------

                 Key: MAHOUT-616
                 URL: https://issues.apache.org/jira/browse/MAHOUT-616
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.5
            Reporter: Vasil Vasilev
            Priority: Minor


When Dirichlet clustering is run with DistanceMeasureClusterDistribution and MahalanobisDistanceMeasure the configure method of the distance measure is not called. In this way the configuration data cannot be supplied to the distance measure. In addition there is a bug in the MahalanobisDistanceMeasure that does not take into account the matrix class

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-616) Cannot use MahalanobisDistanceMeasure with Dirichlet clustering

Posted by "Vasil Vasilev (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009049#comment-13009049 ] 

Vasil Vasilev commented on MAHOUT-616:
--------------------------------------

Only the config method in DistanceMeasureCluster calls getMeasure().configure(job);

> Cannot use MahalanobisDistanceMeasure with Dirichlet clustering
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-616
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-616
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Vasil Vasilev
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: mahalanobis_fix.patch, mahalanobis_fix.patch
>
>
> When Dirichlet clustering is run with DistanceMeasureClusterDistribution and MahalanobisDistanceMeasure the configure method of the distance measure is not called. In this way the configuration data cannot be supplied to the distance measure. In addition there is a bug in the MahalanobisDistanceMeasure that does not take into account the matrix class

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-616) Cannot use MahalanobisDistanceMeasure with Dirichlet clustering

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008929#comment-13008929 ] 

Hudson commented on MAHOUT-616:
-------------------------------

Integrated in Mahout-Quality #678 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/678/])
    MAHOUT-616 add "configure" hooks to clusters and configure throughout the code


> Cannot use MahalanobisDistanceMeasure with Dirichlet clustering
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-616
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-616
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Vasil Vasilev
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: mahalanobis_fix.patch, mahalanobis_fix.patch
>
>
> When Dirichlet clustering is run with DistanceMeasureClusterDistribution and MahalanobisDistanceMeasure the configure method of the distance measure is not called. In this way the configuration data cannot be supplied to the distance measure. In addition there is a bug in the MahalanobisDistanceMeasure that does not take into account the matrix class

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (MAHOUT-616) Cannot use MahalanobisDistanceMeasure with Dirichlet clustering

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-616.
------------------------------

    Resolution: Fixed
      Assignee: Sean Owen

I committed it -- just modified the style to for example use foreach syntax. It looks harmless and the tests pass. Do any of the config methods do something?

> Cannot use MahalanobisDistanceMeasure with Dirichlet clustering
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-616
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-616
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Vasil Vasilev
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: mahalanobis_fix.patch, mahalanobis_fix.patch
>
>
> When Dirichlet clustering is run with DistanceMeasureClusterDistribution and MahalanobisDistanceMeasure the configure method of the distance measure is not called. In this way the configuration data cannot be supplied to the distance measure. In addition there is a bug in the MahalanobisDistanceMeasure that does not take into account the matrix class

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAHOUT-616) Cannot use MahalanobisDistanceMeasure with Dirichlet clustering

Posted by "Vasil Vasilev (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vasil Vasilev updated MAHOUT-616:
---------------------------------

    Attachment: mahalanobis_fix.patch

A newer version of the patch. Also 2 tests were added to TestMapReduce to verify the fix

> Cannot use MahalanobisDistanceMeasure with Dirichlet clustering
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-616
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-616
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Vasil Vasilev
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: mahalanobis_fix.patch, mahalanobis_fix.patch
>
>
> When Dirichlet clustering is run with DistanceMeasureClusterDistribution and MahalanobisDistanceMeasure the configure method of the distance measure is not called. In this way the configuration data cannot be supplied to the distance measure. In addition there is a bug in the MahalanobisDistanceMeasure that does not take into account the matrix class

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-616) Cannot use MahalanobisDistanceMeasure with Dirichlet clustering

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003457#comment-13003457 ] 

Ted Dunning commented on MAHOUT-616:
------------------------------------

Yes.

And pushing the test for non null measure down into the cluster seems like a good idea.  That could be done using a configure method on the cluster
that calls the configure method on the measure.  The measure can probably even be guaranteed to be non-null if we think about things right.

> Cannot use MahalanobisDistanceMeasure with Dirichlet clustering
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-616
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-616
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Vasil Vasilev
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: mahalanobis_fix.patch
>
>
> When Dirichlet clustering is run with DistanceMeasureClusterDistribution and MahalanobisDistanceMeasure the configure method of the distance measure is not called. In this way the configuration data cannot be supplied to the distance measure. In addition there is a bug in the MahalanobisDistanceMeasure that does not take into account the matrix class

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-616) Cannot use MahalanobisDistanceMeasure with Dirichlet clustering

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003344#comment-13003344 ] 

Sean Owen commented on MAHOUT-616:
----------------------------------

Hmm, surely there is a better approach that doesn't use "instanceof" everywhere. I feel like it would be cleaner to give all Clusters a configure() method and always call it? Even if it is a no-op for other implementations?

> Cannot use MahalanobisDistanceMeasure with Dirichlet clustering
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-616
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-616
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Vasil Vasilev
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: mahalanobis_fix.patch
>
>
> When Dirichlet clustering is run with DistanceMeasureClusterDistribution and MahalanobisDistanceMeasure the configure method of the distance measure is not called. In this way the configuration data cannot be supplied to the distance measure. In addition there is a bug in the MahalanobisDistanceMeasure that does not take into account the matrix class

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAHOUT-616) Cannot use MahalanobisDistanceMeasure with Dirichlet clustering

Posted by "Vasil Vasilev (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vasil Vasilev updated MAHOUT-616:
---------------------------------

    Comment: was deleted

(was: Patch with fix for the issue)

> Cannot use MahalanobisDistanceMeasure with Dirichlet clustering
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-616
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-616
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Vasil Vasilev
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: mahalanobis_fix.patch
>
>
> When Dirichlet clustering is run with DistanceMeasureClusterDistribution and MahalanobisDistanceMeasure the configure method of the distance measure is not called. In this way the configuration data cannot be supplied to the distance measure. In addition there is a bug in the MahalanobisDistanceMeasure that does not take into account the matrix class

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (MAHOUT-616) Cannot use MahalanobisDistanceMeasure with Dirichlet clustering

Posted by "Vasil Vasilev (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vasil Vasilev updated MAHOUT-616:
---------------------------------

    Attachment: mahalanobis_fix.patch

Patch for fixing the issue

> Cannot use MahalanobisDistanceMeasure with Dirichlet clustering
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-616
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-616
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.5
>            Reporter: Vasil Vasilev
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: mahalanobis_fix.patch
>
>
> When Dirichlet clustering is run with DistanceMeasureClusterDistribution and MahalanobisDistanceMeasure the configure method of the distance measure is not called. In this way the configuration data cannot be supplied to the distance measure. In addition there is a bug in the MahalanobisDistanceMeasure that does not take into account the matrix class

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira