You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Nate Paymer (JIRA)" <ji...@apache.org> on 2011/03/12 05:53:59 UTC

[jira] Created: (MATH-546) Truncation issue in KMeansPlusPlusClusterer

Truncation issue in KMeansPlusPlusClusterer
-------------------------------------------

                 Key: MATH-546
                 URL: https://issues.apache.org/jira/browse/MATH-546
             Project: Commons Math
          Issue Type: Bug
    Affects Versions: 3.0
            Reporter: Nate Paymer
            Priority: Minor


The for loop inside KMeansPlusPlusClusterer.chooseInitialClusters defines a variable
  int sum = 0;
This variable should have type double, rather than int.  Using an int causes the method to truncate the distances between points to (square roots of) integers.  It's especially bad when the distances between points are typically less than 1.

As an aside, in version 2.2, this bug manifested itself by making the clusterer return empty clusters.  I wonder if the EmptyClusterStrategy would still be necessary if this bug were fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MATH-546) Truncation issue in KMeansPlusPlusClusterer

Posted by "Luc Maisonobe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006894#comment-13006894 ] 

Luc Maisonobe commented on MATH-546:
------------------------------------

The empty cluster strategy is needed regardless of this bug. It may appear with different conditions and is a feature commonly found in clustering implementations.
This issue can be marked as resolved if the patch has been applied and works.

Thanks to Nate for reporting and fixing the issue, thanks to Gilles for reviewing and applying the patch.

> Truncation issue in KMeansPlusPlusClusterer
> -------------------------------------------
>
>                 Key: MATH-546
>                 URL: https://issues.apache.org/jira/browse/MATH-546
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.0
>            Reporter: Nate Paymer
>            Priority: Minor
>              Labels: cluster
>         Attachments: MATH-546.txt
>
>
> The for loop inside KMeansPlusPlusClusterer.chooseInitialClusters defines a variable
>   int sum = 0;
> This variable should have type double, rather than int.  Using an int causes the method to truncate the distances between points to (square roots of) integers.  It's especially bad when the distances between points are typically less than 1.
> As an aside, in version 2.2, this bug manifested itself by making the clusterer return empty clusters.  I wonder if the EmptyClusterStrategy would still be necessary if this bug were fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (MATH-546) Truncation issue in KMeansPlusPlusClusterer

Posted by "Gilles (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gilles resolved MATH-546.
-------------------------

       Resolution: Fixed
    Fix Version/s: 3.0

> Truncation issue in KMeansPlusPlusClusterer
> -------------------------------------------
>
>                 Key: MATH-546
>                 URL: https://issues.apache.org/jira/browse/MATH-546
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.0
>            Reporter: Nate Paymer
>            Priority: Minor
>              Labels: cluster
>             Fix For: 3.0
>
>         Attachments: MATH-546.txt
>
>
> The for loop inside KMeansPlusPlusClusterer.chooseInitialClusters defines a variable
>   int sum = 0;
> This variable should have type double, rather than int.  Using an int causes the method to truncate the distances between points to (square roots of) integers.  It's especially bad when the distances between points are typically less than 1.
> As an aside, in version 2.2, this bug manifested itself by making the clusterer return empty clusters.  I wonder if the EmptyClusterStrategy would still be necessary if this bug were fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MATH-546) Truncation issue in KMeansPlusPlusClusterer

Posted by "Nate Paymer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nate Paymer updated MATH-546:
-----------------------------

    Attachment: MATH-546.txt

I've a patch to fix this bug.

This is my first contribution to this project, so apologies if I've screwed something up :)

> Truncation issue in KMeansPlusPlusClusterer
> -------------------------------------------
>
>                 Key: MATH-546
>                 URL: https://issues.apache.org/jira/browse/MATH-546
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.0
>            Reporter: Nate Paymer
>            Priority: Minor
>              Labels: cluster
>         Attachments: MATH-546.txt
>
>
> The for loop inside KMeansPlusPlusClusterer.chooseInitialClusters defines a variable
>   int sum = 0;
> This variable should have type double, rather than int.  Using an int causes the method to truncate the distances between points to (square roots of) integers.  It's especially bad when the distances between points are typically less than 1.
> As an aside, in version 2.2, this bug manifested itself by making the clusterer return empty clusters.  I wonder if the EmptyClusterStrategy would still be necessary if this bug were fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MATH-546) Truncation issue in KMeansPlusPlusClusterer

Posted by "Gilles (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006890#comment-13006890 ] 

Gilles commented on MATH-546:
-----------------------------

Fixed in revision 1081744.
Thanks for the report and the patch.

Leaving open until an answer can be provided concerning the "EmptyClusterStrategy" question.


> Truncation issue in KMeansPlusPlusClusterer
> -------------------------------------------
>
>                 Key: MATH-546
>                 URL: https://issues.apache.org/jira/browse/MATH-546
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.0
>            Reporter: Nate Paymer
>            Priority: Minor
>              Labels: cluster
>         Attachments: MATH-546.txt
>
>
> The for loop inside KMeansPlusPlusClusterer.chooseInitialClusters defines a variable
>   int sum = 0;
> This variable should have type double, rather than int.  Using an int causes the method to truncate the distances between points to (square roots of) integers.  It's especially bad when the distances between points are typically less than 1.
> As an aside, in version 2.2, this bug manifested itself by making the clusterer return empty clusters.  I wonder if the EmptyClusterStrategy would still be necessary if this bug were fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira