You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Paul Baclace (JIRA)" <ji...@apache.org> on 2011/05/11 01:39:48 UTC

[jira] [Created] (MAHOUT-692) OnlineSummarizer does not tolerate fewer than 100 samples

OnlineSummarizer does not tolerate fewer than 100 samples
---------------------------------------------------------

                 Key: MAHOUT-692
                 URL: https://issues.apache.org/jira/browse/MAHOUT-692
             Project: Mahout
          Issue Type: Bug
    Affects Versions: 0.4
            Reporter: Paul Baclace
            Priority: Minor


If fewer than 100 samples are add()ed to an instance of org.apache.mahout.math.stats.OnlineSummarizer an exception will be thrown during a sort when getQuartile() is called:

Caused by: java.lang.IndexOutOfBoundsException: from: 0, to: 99, size=89

    at org.apache.mahout.math.list.AbstractList.checkRangeFromTo(AbstractList.java:87)
    at org.apache.mahout.math.list.DoubleArrayList.sortFromTo(DoubleArrayList.java:573)
    at org.apache.mahout.math.stats.OnlineSummarizer.sort(OnlineSummarizer.java:116)
    at org.apache.mahout.math.stats.OnlineSummarizer.getQuartile(OnlineSummarizer.java:129)

The problem is that sort is on index range 0,99 but 0,n-1 should be used.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAHOUT-692) OnlineSummarizer does not tolerate fewer than 100 samples

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-692.
------------------------------

    Resolution: Fixed
      Assignee: Ted Dunning

Looks like a very simple fix, to sort the whole "starter" array rather than sort potentially off the end. While I don't know the logic 100% I understand it enough at first glance to not see an obvious reason that would be wrong.

> OnlineSummarizer does not tolerate fewer than 100 samples
> ---------------------------------------------------------
>
>                 Key: MAHOUT-692
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-692
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Paul Baclace
>            Assignee: Ted Dunning
>            Priority: Minor
>             Fix For: 0.6
>
>
> If fewer than 100 samples are add()ed to an instance of org.apache.mahout.math.stats.OnlineSummarizer an exception will be thrown during a sort when getQuartile() is called:
> Caused by: java.lang.IndexOutOfBoundsException: from: 0, to: 99, size=89
>     at org.apache.mahout.math.list.AbstractList.checkRangeFromTo(AbstractList.java:87)
>     at org.apache.mahout.math.list.DoubleArrayList.sortFromTo(DoubleArrayList.java:573)
>     at org.apache.mahout.math.stats.OnlineSummarizer.sort(OnlineSummarizer.java:116)
>     at org.apache.mahout.math.stats.OnlineSummarizer.getQuartile(OnlineSummarizer.java:129)
> The problem is that sort is on index range 0,99 but 0,n-1 should be used.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-692) OnlineSummarizer does not tolerate fewer than 100 samples

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031553#comment-13031553 ] 

Ted Dunning commented on MAHOUT-692:
------------------------------------

Harumph.  this bug looks familiar.

Let me look to see if I have a fix on a dev branch.

> OnlineSummarizer does not tolerate fewer than 100 samples
> ---------------------------------------------------------
>
>                 Key: MAHOUT-692
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-692
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Paul Baclace
>            Priority: Minor
>
> If fewer than 100 samples are add()ed to an instance of org.apache.mahout.math.stats.OnlineSummarizer an exception will be thrown during a sort when getQuartile() is called:
> Caused by: java.lang.IndexOutOfBoundsException: from: 0, to: 99, size=89
>     at org.apache.mahout.math.list.AbstractList.checkRangeFromTo(AbstractList.java:87)
>     at org.apache.mahout.math.list.DoubleArrayList.sortFromTo(DoubleArrayList.java:573)
>     at org.apache.mahout.math.stats.OnlineSummarizer.sort(OnlineSummarizer.java:116)
>     at org.apache.mahout.math.stats.OnlineSummarizer.getQuartile(OnlineSummarizer.java:129)
> The problem is that sort is on index range 0,99 but 0,n-1 should be used.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-692) OnlineSummarizer does not tolerate fewer than 100 samples

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning updated MAHOUT-692:
-------------------------------

    Fix Version/s: 0.6

This definitely needs fixing.

> OnlineSummarizer does not tolerate fewer than 100 samples
> ---------------------------------------------------------
>
>                 Key: MAHOUT-692
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-692
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Paul Baclace
>            Priority: Minor
>             Fix For: 0.6
>
>
> If fewer than 100 samples are add()ed to an instance of org.apache.mahout.math.stats.OnlineSummarizer an exception will be thrown during a sort when getQuartile() is called:
> Caused by: java.lang.IndexOutOfBoundsException: from: 0, to: 99, size=89
>     at org.apache.mahout.math.list.AbstractList.checkRangeFromTo(AbstractList.java:87)
>     at org.apache.mahout.math.list.DoubleArrayList.sortFromTo(DoubleArrayList.java:573)
>     at org.apache.mahout.math.stats.OnlineSummarizer.sort(OnlineSummarizer.java:116)
>     at org.apache.mahout.math.stats.OnlineSummarizer.getQuartile(OnlineSummarizer.java:129)
> The problem is that sort is on index range 0,99 but 0,n-1 should be used.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira