You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Nate Paymer (JIRA)" <ji...@apache.org> on 2011/03/12 06:05:59 UTC

[jira] Created: (MATH-547) KMeansPlusPlusClusterer should not call equals()

KMeansPlusPlusClusterer should not call equals()
------------------------------------------------

                 Key: MATH-547
                 URL: https://issues.apache.org/jira/browse/MATH-547
             Project: Commons Math
          Issue Type: Improvement
    Affects Versions: 3.0
            Reporter: Nate Paymer
            Priority: Minor


In determining whether the clusters have changed between iterations, the KMeansPlusPlusClusterer currently calls equals to determine whether the cluster centers have changed.  It would be better to avoid relying on equals by instead checking whether any points have moved between clusters.

equals can be problematic because floating point operations are not strictly commutative or associative, so getCentroid may return slightly different values even when called with the same set of inputs.  Additionally, the client may choose not to override equals at all, since it's not clear that it's required.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-547) KMeansPlusPlusClusterer should not call equals()

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart updated MATH-547:
---------------------------------

    Attachment: MATH-547.patch

Added a simple patch to prevent comparing cluster centroids as an exit condition. 
Instead, the assignments of the data points to the clusters are tracked using an int array.

> KMeansPlusPlusClusterer should not call equals()
> ------------------------------------------------
>
>                 Key: MATH-547
>                 URL: https://issues.apache.org/jira/browse/MATH-547
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.0
>            Reporter: Nate Paymer
>            Priority: Minor
>         Attachments: MATH-547.patch
>
>
> In determining whether the clusters have changed between iterations, the KMeansPlusPlusClusterer currently calls equals to determine whether the cluster centers have changed.  It would be better to avoid relying on equals by instead checking whether any points have moved between clusters.
> equals can be problematic because floating point operations are not strictly commutative or associative, so getCentroid may return slightly different values even when called with the same set of inputs.  Additionally, the client may choose not to override equals at all, since it's not clear that it's required.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-547) KMeansPlusPlusClusterer should not call equals()

Posted by "Luc Maisonobe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015538#comment-13015538 ] 

Luc Maisonobe commented on MATH-547:
------------------------------------

I thought you wrote this code but in fact it was Ben, sorry for the confusion.


> KMeansPlusPlusClusterer should not call equals()
> ------------------------------------------------
>
>                 Key: MATH-547
>                 URL: https://issues.apache.org/jira/browse/MATH-547
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.0
>            Reporter: Nate Paymer
>            Priority: Minor
>         Attachments: MATH-547.patch
>
>
> In determining whether the clusters have changed between iterations, the KMeansPlusPlusClusterer currently calls equals to determine whether the cluster centers have changed.  It would be better to avoid relying on equals by instead checking whether any points have moved between clusters.
> equals can be problematic because floating point operations are not strictly commutative or associative, so getCentroid may return slightly different values even when called with the same set of inputs.  Additionally, the client may choose not to override equals at all, since it's not clear that it's required.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-547) KMeansPlusPlusClusterer should not call equals()

Posted by "Gilles (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015351#comment-13015351 ] 

Gilles commented on MATH-547:
-----------------------------

I'm not familiar with the code; so please go ahead with the changes if they are satisfactory to you.


> KMeansPlusPlusClusterer should not call equals()
> ------------------------------------------------
>
>                 Key: MATH-547
>                 URL: https://issues.apache.org/jira/browse/MATH-547
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.0
>            Reporter: Nate Paymer
>            Priority: Minor
>         Attachments: MATH-547.patch
>
>
> In determining whether the clusters have changed between iterations, the KMeansPlusPlusClusterer currently calls equals to determine whether the cluster centers have changed.  It would be better to avoid relying on equals by instead checking whether any points have moved between clusters.
> equals can be problematic because floating point operations are not strictly commutative or associative, so getCentroid may return slightly different values even when called with the same set of inputs.  Additionally, the client may choose not to override equals at all, since it's not clear that it's required.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MATH-547) KMeansPlusPlusClusterer should not call equals()

Posted by "Luc Maisonobe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luc Maisonobe resolved MATH-547.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 3.0
         Assignee: Luc Maisonobe

Fixed in subversion repository as of r1088702.

Thanks to Nate for identifying the problem and thanks to Thomas for providing the patch

> KMeansPlusPlusClusterer should not call equals()
> ------------------------------------------------
>
>                 Key: MATH-547
>                 URL: https://issues.apache.org/jira/browse/MATH-547
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.0
>            Reporter: Nate Paymer
>            Assignee: Luc Maisonobe
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: MATH-547.patch
>
>
> In determining whether the clusters have changed between iterations, the KMeansPlusPlusClusterer currently calls equals to determine whether the cluster centers have changed.  It would be better to avoid relying on equals by instead checking whether any points have moved between clusters.
> equals can be problematic because floating point operations are not strictly commutative or associative, so getCentroid may return slightly different values even when called with the same set of inputs.  Additionally, the client may choose not to override equals at all, since it's not clear that it's required.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-547) KMeansPlusPlusClusterer should not call equals()

Posted by "Luc Maisonobe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015155#comment-13015155 ] 

Luc Maisonobe commented on MATH-547:
------------------------------------

This seems good to me.
I let Gilles review it too.

> KMeansPlusPlusClusterer should not call equals()
> ------------------------------------------------
>
>                 Key: MATH-547
>                 URL: https://issues.apache.org/jira/browse/MATH-547
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.0
>            Reporter: Nate Paymer
>            Priority: Minor
>         Attachments: MATH-547.patch
>
>
> In determining whether the clusters have changed between iterations, the KMeansPlusPlusClusterer currently calls equals to determine whether the cluster centers have changed.  It would be better to avoid relying on equals by instead checking whether any points have moved between clusters.
> equals can be problematic because floating point operations are not strictly commutative or associative, so getCentroid may return slightly different values even when called with the same set of inputs.  Additionally, the client may choose not to override equals at all, since it's not clear that it's required.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira