You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Paolo Repele (JIRA)" <ji...@apache.org> on 2011/05/16 16:35:49 UTC

[jira] [Created] (MATH-578) Decrease DescriptiveStatistics performance from 2.0 to 2.2

Decrease DescriptiveStatistics performance from 2.0 to 2.2
----------------------------------------------------------

                 Key: MATH-578
                 URL: https://issues.apache.org/jira/browse/MATH-578
             Project: Commons Math
          Issue Type: Bug
    Affects Versions: 2.2
         Environment: Linux
            Reporter: Paolo Repele
            Priority: Minor


Switching between commons-math 2.0 to 2.2 we note how the
DescriptiveStatistics.addValue(double) has decrease the performance.

I tested with 2 million values.

DescriptiveStatistics ds = new DescriptiveStatistics();
for(int i = 0; i<1000*1000*2; i++) { //2 million values
    ds.addValue(v);
}

ds.getPercentile(50);


Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:

* with a single value (0)
** 2.0 -> take ~500 ms
** 2.2 -> take more than 10 minutes
* with 50% fixed value (0) and 50% Math.random()
** 2.0 -> take ~500 ms
** 2.2 -> take ~250000 ms -> ~250 second
* with 100% Math.random()
** 2.0 -> take ~500 ms
** 2.2 -> take ~70 ms



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-578) Decrease DescriptiveStatistics performance from 2.0 to 2.2

Posted by "Paolo Repele (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034070#comment-13034070 ] 

Paolo Repele commented on MATH-578:
-----------------------------------

* yep, the time was only for the getPercentile() method.
* I added an image where you can see the profile snapshot

Usually we use this library to analyze some grids. These grids can be very huge and can be generated using the same values for all the cells or a continue function around the grid or any combination of both.
Then we have really no idea how these grids can be generated.

> Decrease DescriptiveStatistics performance from 2.0 to 2.2
> ----------------------------------------------------------
>
>                 Key: MATH-578
>                 URL: https://issues.apache.org/jira/browse/MATH-578
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>         Environment: Linux
>            Reporter: Paolo Repele
>            Assignee: Mikkel Meyer Andersen
>            Priority: Minor
>         Attachments: percentile.png
>
>
> Switching between commons-math 2.0 to 2.2 we note how the
> DescriptiveStatistics.addValue(double) has decrease the performance.
> I tested with 2 million values.
> DescriptiveStatistics ds = new DescriptiveStatistics();
> for(int i = 0; i<1000*1000*2; i++) { //2 million values
>     ds.addValue(v);
> }
> ds.getPercentile(50);
> Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:
> * with a single value (0)
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take more than 10 minutes
> * with 50% fixed value (0) and 50% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~250000 ms -> ~250 second
> * with 100% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~70 ms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-578) Decrease DescriptiveStatistics performance from 2.0 to 2.2

Posted by "Mikkel Meyer Andersen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034056#comment-13034056 ] 

Mikkel Meyer Andersen commented on MATH-578:
--------------------------------------------

Have you tried more detailed profiling? E.g. in Eclipse to see which methods are using the majority of time?

> Decrease DescriptiveStatistics performance from 2.0 to 2.2
> ----------------------------------------------------------
>
>                 Key: MATH-578
>                 URL: https://issues.apache.org/jira/browse/MATH-578
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>         Environment: Linux
>            Reporter: Paolo Repele
>            Priority: Minor
>
> Switching between commons-math 2.0 to 2.2 we note how the
> DescriptiveStatistics.addValue(double) has decrease the performance.
> I tested with 2 million values.
> DescriptiveStatistics ds = new DescriptiveStatistics();
> for(int i = 0; i<1000*1000*2; i++) { //2 million values
>     ds.addValue(v);
> }
> ds.getPercentile(50);
> Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:
> * with a single value (0)
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take more than 10 minutes
> * with 50% fixed value (0) and 50% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~250000 ms -> ~250 second
> * with 100% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~70 ms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MATH-578) Decrease DescriptiveStatistics performance from 2.0 to 2.2

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart resolved MATH-578.
----------------------------------

    Resolution: Fixed

Fixed in r1364318.
See also MATH-805 with a description of the problem.
                
> Decrease DescriptiveStatistics performance from 2.0 to 2.2
> ----------------------------------------------------------
>
>                 Key: MATH-578
>                 URL: https://issues.apache.org/jira/browse/MATH-578
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>         Environment: Linux
>            Reporter: Paolo Repele
>            Assignee: Mikkel Meyer Andersen
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: percentile.png
>
>
> Switching between commons-math 2.0 to 2.2 we note how the
> DescriptiveStatistics.addValue(double) has decrease the performance.
> I tested with 2 million values.
> DescriptiveStatistics ds = new DescriptiveStatistics();
> for(int i = 0; i<1000*1000*2; i++) { //2 million values
>     ds.addValue(v);
> }
> ds.getPercentile(50);
> Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:
> * with a single value (0)
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take more than 10 minutes
> * with 50% fixed value (0) and 50% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~250000 ms -> ~250 second
> * with 100% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~70 ms

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MATH-578) Decrease DescriptiveStatistics performance from 2.0 to 2.2

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phil Steitz updated MATH-578:
-----------------------------

    Fix Version/s: 3.1

Not sure this is in fact a bug, but rather a feature resulting from overall performance improvements in Percentile (poorer performance for a relatively small number of problem instances).  I do not see it as showstopper for 3.0, so moving to 3.1.

> Decrease DescriptiveStatistics performance from 2.0 to 2.2
> ----------------------------------------------------------
>
>                 Key: MATH-578
>                 URL: https://issues.apache.org/jira/browse/MATH-578
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>         Environment: Linux
>            Reporter: Paolo Repele
>            Assignee: Mikkel Meyer Andersen
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: percentile.png
>
>
> Switching between commons-math 2.0 to 2.2 we note how the
> DescriptiveStatistics.addValue(double) has decrease the performance.
> I tested with 2 million values.
> DescriptiveStatistics ds = new DescriptiveStatistics();
> for(int i = 0; i<1000*1000*2; i++) { //2 million values
>     ds.addValue(v);
> }
> ds.getPercentile(50);
> Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:
> * with a single value (0)
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take more than 10 minutes
> * with 50% fixed value (0) and 50% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~250000 ms -> ~250 second
> * with 100% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~70 ms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-578) Decrease DescriptiveStatistics performance from 2.0 to 2.2

Posted by "Mikkel Meyer Andersen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034068#comment-13034068 ] 

Mikkel Meyer Andersen commented on MATH-578:
--------------------------------------------

Sorry for my (too) short first answer. Thanks for your proper introduction, Phil.

I'll try a more detailed profiling to see what's causing the performance problems.



> Decrease DescriptiveStatistics performance from 2.0 to 2.2
> ----------------------------------------------------------
>
>                 Key: MATH-578
>                 URL: https://issues.apache.org/jira/browse/MATH-578
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>         Environment: Linux
>            Reporter: Paolo Repele
>            Assignee: Mikkel Meyer Andersen
>            Priority: Minor
>         Attachments: percentile.png
>
>
> Switching between commons-math 2.0 to 2.2 we note how the
> DescriptiveStatistics.addValue(double) has decrease the performance.
> I tested with 2 million values.
> DescriptiveStatistics ds = new DescriptiveStatistics();
> for(int i = 0; i<1000*1000*2; i++) { //2 million values
>     ds.addValue(v);
> }
> ds.getPercentile(50);
> Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:
> * with a single value (0)
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take more than 10 minutes
> * with 50% fixed value (0) and 50% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~250000 ms -> ~250 second
> * with 100% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~70 ms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-578) Decrease DescriptiveStatistics performance from 2.0 to 2.2

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034061#comment-13034061 ] 

Phil Steitz commented on MATH-578:
----------------------------------

Thanks for reporting this.  I assume the timings include the percentile calculation, right? 

This could be related to the changes in the Percentile implementation in 2.2. If isolating the timing to just the percentile calculation shows that is where the latency difference is, we should reopen MATH-417.  The changes there were to improve Percentile performance, which in most cases they do.  The first two results above are disturbing, however.  If your data is largely constant and this creates a problem in your application, as a workaround, you can provide an alternative Percentile implementation to DescriptiveStatistics using setPercentileImpl.

> Decrease DescriptiveStatistics performance from 2.0 to 2.2
> ----------------------------------------------------------
>
>                 Key: MATH-578
>                 URL: https://issues.apache.org/jira/browse/MATH-578
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>         Environment: Linux
>            Reporter: Paolo Repele
>            Assignee: Mikkel Meyer Andersen
>            Priority: Minor
>
> Switching between commons-math 2.0 to 2.2 we note how the
> DescriptiveStatistics.addValue(double) has decrease the performance.
> I tested with 2 million values.
> DescriptiveStatistics ds = new DescriptiveStatistics();
> for(int i = 0; i<1000*1000*2; i++) { //2 million values
>     ds.addValue(v);
> }
> ds.getPercentile(50);
> Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:
> * with a single value (0)
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take more than 10 minutes
> * with 50% fixed value (0) and 50% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~250000 ms -> ~250 second
> * with 100% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~70 ms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-578) Decrease DescriptiveStatistics performance from 2.0 to 2.2

Posted by "Mikkel Meyer Andersen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034094#comment-13034094 ] 

Mikkel Meyer Andersen commented on MATH-578:
--------------------------------------------

Also, it seems like FastMath is new to 2.2. I'll try to investigate what causes this.

> Decrease DescriptiveStatistics performance from 2.0 to 2.2
> ----------------------------------------------------------
>
>                 Key: MATH-578
>                 URL: https://issues.apache.org/jira/browse/MATH-578
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>         Environment: Linux
>            Reporter: Paolo Repele
>            Assignee: Mikkel Meyer Andersen
>            Priority: Minor
>         Attachments: percentile.png
>
>
> Switching between commons-math 2.0 to 2.2 we note how the
> DescriptiveStatistics.addValue(double) has decrease the performance.
> I tested with 2 million values.
> DescriptiveStatistics ds = new DescriptiveStatistics();
> for(int i = 0; i<1000*1000*2; i++) { //2 million values
>     ds.addValue(v);
> }
> ds.getPercentile(50);
> Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:
> * with a single value (0)
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take more than 10 minutes
> * with 50% fixed value (0) and 50% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~250000 ms -> ~250 second
> * with 100% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~70 ms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-578) Decrease DescriptiveStatistics performance from 2.0 to 2.2

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420188#comment-13420188 ] 

Thomas Neidhart commented on MATH-578:
--------------------------------------

I did the provided test myself and indeed it is the same problem and is fixed by the suggested changes.
                
> Decrease DescriptiveStatistics performance from 2.0 to 2.2
> ----------------------------------------------------------
>
>                 Key: MATH-578
>                 URL: https://issues.apache.org/jira/browse/MATH-578
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>         Environment: Linux
>            Reporter: Paolo Repele
>            Assignee: Mikkel Meyer Andersen
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: percentile.png
>
>
> Switching between commons-math 2.0 to 2.2 we note how the
> DescriptiveStatistics.addValue(double) has decrease the performance.
> I tested with 2 million values.
> DescriptiveStatistics ds = new DescriptiveStatistics();
> for(int i = 0; i<1000*1000*2; i++) { //2 million values
>     ds.addValue(v);
> }
> ds.getPercentile(50);
> Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:
> * with a single value (0)
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take more than 10 minutes
> * with 50% fixed value (0) and 50% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~250000 ms -> ~250 second
> * with 100% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~70 ms

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (MATH-578) Decrease DescriptiveStatistics performance from 2.0 to 2.2

Posted by "Paolo Repele (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034070#comment-13034070 ] 

Paolo Repele edited comment on MATH-578 at 5/16/11 3:57 PM:
------------------------------------------------------------

No Problem :)
* yep, the time was only for the getPercentile() method.
* I added an image where you can see the profile snapshot

Usually we use this library to analyze some grids. These grids can be very huge and can be generated using the same values for all the cells or a continue function around the grid or any combination of both.
Then we have really no idea how these grids can be generated.

      was (Author: paolo.repele):
    * yep, the time was only for the getPercentile() method.
* I added an image where you can see the profile snapshot

Usually we use this library to analyze some grids. These grids can be very huge and can be generated using the same values for all the cells or a continue function around the grid or any combination of both.
Then we have really no idea how these grids can be generated.
  
> Decrease DescriptiveStatistics performance from 2.0 to 2.2
> ----------------------------------------------------------
>
>                 Key: MATH-578
>                 URL: https://issues.apache.org/jira/browse/MATH-578
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>         Environment: Linux
>            Reporter: Paolo Repele
>            Assignee: Mikkel Meyer Andersen
>            Priority: Minor
>         Attachments: percentile.png
>
>
> Switching between commons-math 2.0 to 2.2 we note how the
> DescriptiveStatistics.addValue(double) has decrease the performance.
> I tested with 2 million values.
> DescriptiveStatistics ds = new DescriptiveStatistics();
> for(int i = 0; i<1000*1000*2; i++) { //2 million values
>     ds.addValue(v);
> }
> ds.getPercentile(50);
> Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:
> * with a single value (0)
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take more than 10 minutes
> * with 50% fixed value (0) and 50% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~250000 ms -> ~250 second
> * with 100% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~70 ms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-578) Decrease DescriptiveStatistics performance from 2.0 to 2.2

Posted by "Mikkel Meyer Andersen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034152#comment-13034152 ] 

Mikkel Meyer Andersen commented on MATH-578:
--------------------------------------------

As far as I can see, Percentile contributes a lot to the longer execution time, so reopening MATH-417 for datasets of this type might be the right thing to do.

> Decrease DescriptiveStatistics performance from 2.0 to 2.2
> ----------------------------------------------------------
>
>                 Key: MATH-578
>                 URL: https://issues.apache.org/jira/browse/MATH-578
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>         Environment: Linux
>            Reporter: Paolo Repele
>            Assignee: Mikkel Meyer Andersen
>            Priority: Minor
>         Attachments: percentile.png
>
>
> Switching between commons-math 2.0 to 2.2 we note how the
> DescriptiveStatistics.addValue(double) has decrease the performance.
> I tested with 2 million values.
> DescriptiveStatistics ds = new DescriptiveStatistics();
> for(int i = 0; i<1000*1000*2; i++) { //2 million values
>     ds.addValue(v);
> }
> ds.getPercentile(50);
> Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:
> * with a single value (0)
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take more than 10 minutes
> * with 50% fixed value (0) and 50% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~250000 ms -> ~250 second
> * with 100% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~70 ms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-578) Decrease DescriptiveStatistics performance from 2.0 to 2.2

Posted by "Paolo Repele (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paolo Repele updated MATH-578:
------------------------------

    Attachment: percentile.png

Image file to show the profile snapshot

> Decrease DescriptiveStatistics performance from 2.0 to 2.2
> ----------------------------------------------------------
>
>                 Key: MATH-578
>                 URL: https://issues.apache.org/jira/browse/MATH-578
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>         Environment: Linux
>            Reporter: Paolo Repele
>            Assignee: Mikkel Meyer Andersen
>            Priority: Minor
>         Attachments: percentile.png
>
>
> Switching between commons-math 2.0 to 2.2 we note how the
> DescriptiveStatistics.addValue(double) has decrease the performance.
> I tested with 2 million values.
> DescriptiveStatistics ds = new DescriptiveStatistics();
> for(int i = 0; i<1000*1000*2; i++) { //2 million values
>     ds.addValue(v);
> }
> ds.getPercentile(50);
> Seems that depending by the values inserted in the DescriptiveStatistics it takes different time:
> * with a single value (0)
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take more than 10 minutes
> * with 50% fixed value (0) and 50% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~250000 ms -> ~250 second
> * with 100% Math.random()
> ** 2.0 -> take ~500 ms
> ** 2.2 -> take ~70 ms

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira