You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Nate Paymer (JIRA)" <ji...@apache.org> on 2011/03/12 06:05:59 UTC
[jira] Created: (MATH-547) KMeansPlusPlusClusterer should not call
equals()
KMeansPlusPlusClusterer should not call equals()
------------------------------------------------
Key: MATH-547
URL: https://issues.apache.org/jira/browse/MATH-547
Project: Commons Math
Issue Type: Improvement
Affects Versions: 3.0
Reporter: Nate Paymer
Priority: Minor
In determining whether the clusters have changed between iterations, the KMeansPlusPlusClusterer currently calls equals to determine whether the cluster centers have changed. It would be better to avoid relying on equals by instead checking whether any points have moved between clusters.
equals can be problematic because floating point operations are not strictly commutative or associative, so getCentroid may return slightly different values even when called with the same set of inputs. Additionally, the client may choose not to override equals at all, since it's not clear that it's required.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-547) KMeansPlusPlusClusterer should not call
equals()
Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MATH-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Neidhart updated MATH-547:
---------------------------------
Attachment: MATH-547.patch
Added a simple patch to prevent comparing cluster centroids as an exit condition.
Instead, the assignments of the data points to the clusters are tracked using an int array.
> KMeansPlusPlusClusterer should not call equals()
> ------------------------------------------------
>
> Key: MATH-547
> URL: https://issues.apache.org/jira/browse/MATH-547
> Project: Commons Math
> Issue Type: Improvement
> Affects Versions: 3.0
> Reporter: Nate Paymer
> Priority: Minor
> Attachments: MATH-547.patch
>
>
> In determining whether the clusters have changed between iterations, the KMeansPlusPlusClusterer currently calls equals to determine whether the cluster centers have changed. It would be better to avoid relying on equals by instead checking whether any points have moved between clusters.
> equals can be problematic because floating point operations are not strictly commutative or associative, so getCentroid may return slightly different values even when called with the same set of inputs. Additionally, the client may choose not to override equals at all, since it's not clear that it's required.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-547) KMeansPlusPlusClusterer should not
call equals()
Posted by "Luc Maisonobe (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MATH-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015538#comment-13015538 ]
Luc Maisonobe commented on MATH-547:
------------------------------------
I thought you wrote this code but in fact it was Ben, sorry for the confusion.
> KMeansPlusPlusClusterer should not call equals()
> ------------------------------------------------
>
> Key: MATH-547
> URL: https://issues.apache.org/jira/browse/MATH-547
> Project: Commons Math
> Issue Type: Improvement
> Affects Versions: 3.0
> Reporter: Nate Paymer
> Priority: Minor
> Attachments: MATH-547.patch
>
>
> In determining whether the clusters have changed between iterations, the KMeansPlusPlusClusterer currently calls equals to determine whether the cluster centers have changed. It would be better to avoid relying on equals by instead checking whether any points have moved between clusters.
> equals can be problematic because floating point operations are not strictly commutative or associative, so getCentroid may return slightly different values even when called with the same set of inputs. Additionally, the client may choose not to override equals at all, since it's not clear that it's required.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-547) KMeansPlusPlusClusterer should not
call equals()
Posted by "Gilles (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MATH-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015351#comment-13015351 ]
Gilles commented on MATH-547:
-----------------------------
I'm not familiar with the code; so please go ahead with the changes if they are satisfactory to you.
> KMeansPlusPlusClusterer should not call equals()
> ------------------------------------------------
>
> Key: MATH-547
> URL: https://issues.apache.org/jira/browse/MATH-547
> Project: Commons Math
> Issue Type: Improvement
> Affects Versions: 3.0
> Reporter: Nate Paymer
> Priority: Minor
> Attachments: MATH-547.patch
>
>
> In determining whether the clusters have changed between iterations, the KMeansPlusPlusClusterer currently calls equals to determine whether the cluster centers have changed. It would be better to avoid relying on equals by instead checking whether any points have moved between clusters.
> equals can be problematic because floating point operations are not strictly commutative or associative, so getCentroid may return slightly different values even when called with the same set of inputs. Additionally, the client may choose not to override equals at all, since it's not clear that it's required.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MATH-547) KMeansPlusPlusClusterer should not
call equals()
Posted by "Luc Maisonobe (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MATH-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luc Maisonobe resolved MATH-547.
--------------------------------
Resolution: Fixed
Fix Version/s: 3.0
Assignee: Luc Maisonobe
Fixed in subversion repository as of r1088702.
Thanks to Nate for identifying the problem and thanks to Thomas for providing the patch
> KMeansPlusPlusClusterer should not call equals()
> ------------------------------------------------
>
> Key: MATH-547
> URL: https://issues.apache.org/jira/browse/MATH-547
> Project: Commons Math
> Issue Type: Improvement
> Affects Versions: 3.0
> Reporter: Nate Paymer
> Assignee: Luc Maisonobe
> Priority: Minor
> Fix For: 3.0
>
> Attachments: MATH-547.patch
>
>
> In determining whether the clusters have changed between iterations, the KMeansPlusPlusClusterer currently calls equals to determine whether the cluster centers have changed. It would be better to avoid relying on equals by instead checking whether any points have moved between clusters.
> equals can be problematic because floating point operations are not strictly commutative or associative, so getCentroid may return slightly different values even when called with the same set of inputs. Additionally, the client may choose not to override equals at all, since it's not clear that it's required.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-547) KMeansPlusPlusClusterer should not
call equals()
Posted by "Luc Maisonobe (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MATH-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015155#comment-13015155 ]
Luc Maisonobe commented on MATH-547:
------------------------------------
This seems good to me.
I let Gilles review it too.
> KMeansPlusPlusClusterer should not call equals()
> ------------------------------------------------
>
> Key: MATH-547
> URL: https://issues.apache.org/jira/browse/MATH-547
> Project: Commons Math
> Issue Type: Improvement
> Affects Versions: 3.0
> Reporter: Nate Paymer
> Priority: Minor
> Attachments: MATH-547.patch
>
>
> In determining whether the clusters have changed between iterations, the KMeansPlusPlusClusterer currently calls equals to determine whether the cluster centers have changed. It would be better to avoid relying on equals by instead checking whether any points have moved between clusters.
> equals can be problematic because floating point operations are not strictly commutative or associative, so getCentroid may return slightly different values even when called with the same set of inputs. Additionally, the client may choose not to override equals at all, since it's not clear that it's required.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira