You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Luc Maisonobe <Lu...@free.fr> on 2010/10/10 17:14:24 UTC

(math] Percentile performance improvement

Hi all,

The new implementation of Percentile improving performances has been
checked in. It should solve issue
<https://issues.apache.org/jira/browse/MATH-417>.

As discussed here, the implementation is based on a selection algorithm
instead of a complete sort. A setData method has also been added to the
AbstractUnivariateStatistic base class to allow caching some work
between calls when several percentiles are requested. Both the
partitioned data array and the first levels of pivots are cached. For
now, I have set the limit to 10 levels of pivots, but this can be
changed (the memory cost is a (2^n)-1 integer array for n levels).

>From the few performances tests I have done, the improvements are really
huge and depend on both the array size and the number of percentiles
desired. Compared with a single cached sort followed by direct array
accesses, the selection-based implementation is still about twice faster
up to a few hundreds percentiles for array sizes of a few millions
elements. For a single percentile, I saw up to a few hundreds times
faster computation.

Leanne, could you check this suits your needs ?

Luc

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: (math] Percentile performance improvement

Posted by Leanne Guy <Le...@unige.ch>.

Hi Luc,

Sorry to not reply sooner, I have been away most of October.  I've tested
the snapshot with these changes and see the same level of performance 
improvement.

So yes, this suits my needs. Thanks to everyone for their efforts
Leanne

On 10/10/10 17:14 PM, Luc Maisonobe wrote:
> Hi all,
>
> The new implementation of Percentile improving performances has been
> checked in. It should solve issue
> <https://issues.apache.org/jira/browse/MATH-417>.
>
> As discussed here, the implementation is based on a selection algorithm
> instead of a complete sort. A setData method has also been added to the
> AbstractUnivariateStatistic base class to allow caching some work
> between calls when several percentiles are requested. Both the
> partitioned data array and the first levels of pivots are cached. For
> now, I have set the limit to 10 levels of pivots, but this can be
> changed (the memory cost is a (2^n)-1 integer array for n levels).
>
>  From the few performances tests I have done, the improvements are really
> huge and depend on both the array size and the number of percentiles
> desired. Compared with a single cached sort followed by direct array
> accesses, the selection-based implementation is still about twice faster
> up to a few hundreds percentiles for array sizes of a few millions
> elements. For a single percentile, I saw up to a few hundreds times
> faster computation.
>
> Leanne, could you check this suits your needs ?
>
> Luc
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

-- 
-----------------------------------------------------------------------
Leanne Guy
Variability Processing Group, Gaia Project, ESA
Observatoire de Geneve / ISDC Astrophysics Data Centre
Chemin d'Ecogia 16, CH-1290 Versoix,  Switzerland

Email: leanne.guy-at-unige.ch
Tel: +41 22 379 2151    Fax: +41 (0)22 37 92 135
WWW: http://isdc.unige.ch/~leanne/   Skype: leanneobsge
-----------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org