You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Andrei Dulceanu <an...@gmail.com> on 2017/07/04 08:52:35 UTC

Percentile implementation

Hi all,

I was working on an issue for which I needed to use *only* a percentile for
some recorded stats. Initially I used SynchronizedDescriptiveStatistics [0]
from commons-math3 [1], but then I thought that adding this dependency
would be too much for such a simple use case.

Now my question is this: do we have a simple percentile implementation in
Oak (I didn't find one)? If not, would you recommend writing my own or
adapting/extracting an existing one in a utility class?

Regards,
Andrei

[0]
http://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/stat/descriptive/SynchronizedDescriptiveStatistics.html
[1] http://commons.apache.org/proper/commons-math/

Re: Percentile implementation

Posted by Chetan Mehrotra <ch...@gmail.com>.
Oak has an optional dependency on dropwizard metric library which has
a Histogram implementation [1] which can be used.

So possibly we can use that. So far its optional and hidden behind the
statistics api. Recently I also had to make use of that for rate
estimation in indexing so used it in a way that Metrics based logic
gets used if avialable otherwise fallback to a simple mean based
implementation [2].

May be we can make it a required dependency?

Chetan Mehrotra
[1] http://metrics.dropwizard.io/3.1.0/apidocs/com/codahale/metrics/Histogram.html
[2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/progress/MetricRateEstimator.java

On Tue, Jul 4, 2017 at 2:22 PM, Andrei Dulceanu
<an...@gmail.com> wrote:
> Hi all,
>
> I was working on an issue for which I needed to use *only* a percentile for
> some recorded stats. Initially I used SynchronizedDescriptiveStatistics [0]
> from commons-math3 [1], but then I thought that adding this dependency
> would be too much for such a simple use case.
>
> Now my question is this: do we have a simple percentile implementation in
> Oak (I didn't find one)? If not, would you recommend writing my own or
> adapting/extracting an existing one in a utility class?
>
> Regards,
> Andrei
>
> [0]
> http://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/stat/descriptive/SynchronizedDescriptiveStatistics.html
> [1] http://commons.apache.org/proper/commons-math/

Re: Percentile implementation

Posted by Andrei Dulceanu <an...@gmail.com>.
Hi Francesco,

Is exposing a percentile really necessary?
>

To give you some background, I'm talking about OAK-4732 [2]. I don't know
if we can achieve the same result without the percentile.


> If yes, how big of a problem is embedding of commons-math3?
>

2.1M commons-math3-3.6.1.jar

I'd say it's too much to add it as a dependency to oak-segment-tar.

[2] https://issues.apache.org/jira/browse/OAK-4732

Thanks,
Andrei

Re: Percentile implementation

Posted by Andrei Dulceanu <an...@gmail.com>.
> The only problem that I see is the fact that it doesn't provide a way to
> easily access a desired percentile (only mean and 75th, 95th, 98th, 99th
> and 999th). Currently we are using 50th percentile, i.e. mean, but in the
> future that might change.
>

Please read median instead of mean above. Implementing the change, I
discovered Histogram#getSnapshot().getValue(double quantile) which is
exactly what I was looking for.


> I will try to make the adjustments and will revisit the percentile
> implementation once we'll change our use pattern there.
>

This change is tracked in OAK-6430 [0] and fixed at r1801043.

[0] https://issues.apache.org/jira/browse/OAK-6430

2017-07-06 14:55 GMT+03:00 Andrei Dulceanu <an...@gmail.com>:

> Hi Chetan,
>
>
>> Instead of commons-math can we use Metric Histogram  (which I also
>> suggested earlier in the thread).
>
>
> I took another look at the Metric Histogram and I think at the moment it
> can be used instead of SynchronizedDescriptiveStatistics from
> commons-math3. The only problem that I see is the fact that it doesn't
> provide a way to easily access a desired percentile (only mean and 75th,
> 95th, 98th, 99th and 999th). Currently we are using 50th percentile, i.e.
> mean, but in the future that might change.
>
>
>> This would avoid downstream Oak
>> users to include another dependency as Oak is already using Metrics in
>> other places.
>>
>
> I will try to make the adjustments and will revisit the percentile
> implementation once we'll change our use pattern there.
>
> Regards,
> Andrei
>
> 2017-07-06 14:38 GMT+03:00 Chetan Mehrotra <ch...@gmail.com>:
>
>> Instead of commons-math can we use Metric Histogram  (which I also
>> suggested earlier in the thread). This would avoid downstream Oak
>> users to include another dependency as Oak is already using Metrics in
>> other places.
>>
>> Can we reconsider this decision?
>> Chetan Mehrotra
>>
>>
>> On Tue, Jul 4, 2017 at 4:45 PM, Julian Sedding <js...@gmail.com>
>> wrote:
>> > Maybe it is not necessary to embed *all* of commons-math3. The bnd
>> > tool (used by maven-bundle-plugin) can intelligently embed classes
>> > from specified java packages, but only if they are referenced.
>> > Depending on how well commons-math3 is modularized, that could allow
>> > for much less embedded classes. Neil Bartlett wrote a good blog post
>> > about this feature[0].
>> >
>> > Regards
>> > Julian
>> >
>> > [0] http://njbartlett.name/2014/05/26/static-linking.html
>> >
>> >
>> > On Tue, Jul 4, 2017 at 12:20 PM, Andrei Dulceanu
>> > <an...@gmail.com> wrote:
>> >> I'll add the dependency.
>> >>
>> >> Thanks,
>> >> Andrei
>> >>
>> >> 2017-07-04 13:10 GMT+03:00 Michael Dürig <md...@apache.org>:
>> >>
>> >>>
>> >>>
>> >>> On 04.07.17 11:15, Francesco Mari wrote:
>> >>>
>> >>>> 2017-07-04 10:52 GMT+02:00 Andrei Dulceanu <
>> andrei.dulceanu@gmail.com>:
>> >>>>
>> >>>>> Now my question is this: do we have a simple percentile
>> implementation in
>> >>>>> Oak (I didn't find one)?
>> >>>>>
>> >>>>
>> >>>> I'm not aware of a percentile implementation in Oak.
>> >>>>
>> >>>> If not, would you recommend writing my own or adapting/extracting an
>> >>>>> existing one in a utility class?
>> >>>>>
>> >>>>
>> >>>> In the past we copied and pasted source code from other projects in
>> >>>> Oak. As long as the license allows it and proper attribution is
>> given,
>> >>>> it shouldn't be a problem. That said, I'm not a big fan of either
>> >>>> rewriting an implementation from scratch or copying and pasting
>> source
>> >>>> code from other projects. Is exposing a percentile really necessary?
>> >>>> If yes, how big of a problem is embedding of commons-math3?
>> >>>>
>> >>>>
>> >>> We should avoid copy paste as we might miss important fixes in later
>> >>> releases. I only did this once for some code where we needed a fix
>> that
>> >>> wasn't yet released. It was a hassle.
>> >>> I would just add a dependency to commons-math3. Its a library
>> exposing the
>> >>> functionality we require, so let's use it.
>> >>>
>> >>> Michael
>> >>>
>>
>
>

Re: Percentile implementation

Posted by Andrei Dulceanu <an...@gmail.com>.
Hi Chetan,


> Instead of commons-math can we use Metric Histogram  (which I also
> suggested earlier in the thread).


I took another look at the Metric Histogram and I think at the moment it
can be used instead of SynchronizedDescriptiveStatistics from
commons-math3. The only problem that I see is the fact that it doesn't
provide a way to easily access a desired percentile (only mean and 75th,
95th, 98th, 99th and 999th). Currently we are using 50th percentile, i.e.
mean, but in the future that might change.


> This would avoid downstream Oak
> users to include another dependency as Oak is already using Metrics in
> other places.
>

I will try to make the adjustments and will revisit the percentile
implementation once we'll change our use pattern there.

Regards,
Andrei

2017-07-06 14:38 GMT+03:00 Chetan Mehrotra <ch...@gmail.com>:

> Instead of commons-math can we use Metric Histogram  (which I also
> suggested earlier in the thread). This would avoid downstream Oak
> users to include another dependency as Oak is already using Metrics in
> other places.
>
> Can we reconsider this decision?
> Chetan Mehrotra
>
>
> On Tue, Jul 4, 2017 at 4:45 PM, Julian Sedding <js...@gmail.com> wrote:
> > Maybe it is not necessary to embed *all* of commons-math3. The bnd
> > tool (used by maven-bundle-plugin) can intelligently embed classes
> > from specified java packages, but only if they are referenced.
> > Depending on how well commons-math3 is modularized, that could allow
> > for much less embedded classes. Neil Bartlett wrote a good blog post
> > about this feature[0].
> >
> > Regards
> > Julian
> >
> > [0] http://njbartlett.name/2014/05/26/static-linking.html
> >
> >
> > On Tue, Jul 4, 2017 at 12:20 PM, Andrei Dulceanu
> > <an...@gmail.com> wrote:
> >> I'll add the dependency.
> >>
> >> Thanks,
> >> Andrei
> >>
> >> 2017-07-04 13:10 GMT+03:00 Michael Dürig <md...@apache.org>:
> >>
> >>>
> >>>
> >>> On 04.07.17 11:15, Francesco Mari wrote:
> >>>
> >>>> 2017-07-04 10:52 GMT+02:00 Andrei Dulceanu <andrei.dulceanu@gmail.com
> >:
> >>>>
> >>>>> Now my question is this: do we have a simple percentile
> implementation in
> >>>>> Oak (I didn't find one)?
> >>>>>
> >>>>
> >>>> I'm not aware of a percentile implementation in Oak.
> >>>>
> >>>> If not, would you recommend writing my own or adapting/extracting an
> >>>>> existing one in a utility class?
> >>>>>
> >>>>
> >>>> In the past we copied and pasted source code from other projects in
> >>>> Oak. As long as the license allows it and proper attribution is given,
> >>>> it shouldn't be a problem. That said, I'm not a big fan of either
> >>>> rewriting an implementation from scratch or copying and pasting source
> >>>> code from other projects. Is exposing a percentile really necessary?
> >>>> If yes, how big of a problem is embedding of commons-math3?
> >>>>
> >>>>
> >>> We should avoid copy paste as we might miss important fixes in later
> >>> releases. I only did this once for some code where we needed a fix that
> >>> wasn't yet released. It was a hassle.
> >>> I would just add a dependency to commons-math3. Its a library exposing
> the
> >>> functionality we require, so let's use it.
> >>>
> >>> Michael
> >>>
>

Re: Percentile implementation

Posted by Chetan Mehrotra <ch...@gmail.com>.
Instead of commons-math can we use Metric Histogram  (which I also
suggested earlier in the thread). This would avoid downstream Oak
users to include another dependency as Oak is already using Metrics in
other places.

Can we reconsider this decision?
Chetan Mehrotra


On Tue, Jul 4, 2017 at 4:45 PM, Julian Sedding <js...@gmail.com> wrote:
> Maybe it is not necessary to embed *all* of commons-math3. The bnd
> tool (used by maven-bundle-plugin) can intelligently embed classes
> from specified java packages, but only if they are referenced.
> Depending on how well commons-math3 is modularized, that could allow
> for much less embedded classes. Neil Bartlett wrote a good blog post
> about this feature[0].
>
> Regards
> Julian
>
> [0] http://njbartlett.name/2014/05/26/static-linking.html
>
>
> On Tue, Jul 4, 2017 at 12:20 PM, Andrei Dulceanu
> <an...@gmail.com> wrote:
>> I'll add the dependency.
>>
>> Thanks,
>> Andrei
>>
>> 2017-07-04 13:10 GMT+03:00 Michael Dürig <md...@apache.org>:
>>
>>>
>>>
>>> On 04.07.17 11:15, Francesco Mari wrote:
>>>
>>>> 2017-07-04 10:52 GMT+02:00 Andrei Dulceanu <an...@gmail.com>:
>>>>
>>>>> Now my question is this: do we have a simple percentile implementation in
>>>>> Oak (I didn't find one)?
>>>>>
>>>>
>>>> I'm not aware of a percentile implementation in Oak.
>>>>
>>>> If not, would you recommend writing my own or adapting/extracting an
>>>>> existing one in a utility class?
>>>>>
>>>>
>>>> In the past we copied and pasted source code from other projects in
>>>> Oak. As long as the license allows it and proper attribution is given,
>>>> it shouldn't be a problem. That said, I'm not a big fan of either
>>>> rewriting an implementation from scratch or copying and pasting source
>>>> code from other projects. Is exposing a percentile really necessary?
>>>> If yes, how big of a problem is embedding of commons-math3?
>>>>
>>>>
>>> We should avoid copy paste as we might miss important fixes in later
>>> releases. I only did this once for some code where we needed a fix that
>>> wasn't yet released. It was a hassle.
>>> I would just add a dependency to commons-math3. Its a library exposing the
>>> functionality we require, so let's use it.
>>>
>>> Michael
>>>

Re: Percentile implementation

Posted by Julian Sedding <js...@gmail.com>.
Maybe it is not necessary to embed *all* of commons-math3. The bnd
tool (used by maven-bundle-plugin) can intelligently embed classes
from specified java packages, but only if they are referenced.
Depending on how well commons-math3 is modularized, that could allow
for much less embedded classes. Neil Bartlett wrote a good blog post
about this feature[0].

Regards
Julian

[0] http://njbartlett.name/2014/05/26/static-linking.html


On Tue, Jul 4, 2017 at 12:20 PM, Andrei Dulceanu
<an...@gmail.com> wrote:
> I'll add the dependency.
>
> Thanks,
> Andrei
>
> 2017-07-04 13:10 GMT+03:00 Michael Dürig <md...@apache.org>:
>
>>
>>
>> On 04.07.17 11:15, Francesco Mari wrote:
>>
>>> 2017-07-04 10:52 GMT+02:00 Andrei Dulceanu <an...@gmail.com>:
>>>
>>>> Now my question is this: do we have a simple percentile implementation in
>>>> Oak (I didn't find one)?
>>>>
>>>
>>> I'm not aware of a percentile implementation in Oak.
>>>
>>> If not, would you recommend writing my own or adapting/extracting an
>>>> existing one in a utility class?
>>>>
>>>
>>> In the past we copied and pasted source code from other projects in
>>> Oak. As long as the license allows it and proper attribution is given,
>>> it shouldn't be a problem. That said, I'm not a big fan of either
>>> rewriting an implementation from scratch or copying and pasting source
>>> code from other projects. Is exposing a percentile really necessary?
>>> If yes, how big of a problem is embedding of commons-math3?
>>>
>>>
>> We should avoid copy paste as we might miss important fixes in later
>> releases. I only did this once for some code where we needed a fix that
>> wasn't yet released. It was a hassle.
>> I would just add a dependency to commons-math3. Its a library exposing the
>> functionality we require, so let's use it.
>>
>> Michael
>>

Re: Percentile implementation

Posted by Andrei Dulceanu <an...@gmail.com>.
I'll add the dependency.

Thanks,
Andrei

2017-07-04 13:10 GMT+03:00 Michael Dürig <md...@apache.org>:

>
>
> On 04.07.17 11:15, Francesco Mari wrote:
>
>> 2017-07-04 10:52 GMT+02:00 Andrei Dulceanu <an...@gmail.com>:
>>
>>> Now my question is this: do we have a simple percentile implementation in
>>> Oak (I didn't find one)?
>>>
>>
>> I'm not aware of a percentile implementation in Oak.
>>
>> If not, would you recommend writing my own or adapting/extracting an
>>> existing one in a utility class?
>>>
>>
>> In the past we copied and pasted source code from other projects in
>> Oak. As long as the license allows it and proper attribution is given,
>> it shouldn't be a problem. That said, I'm not a big fan of either
>> rewriting an implementation from scratch or copying and pasting source
>> code from other projects. Is exposing a percentile really necessary?
>> If yes, how big of a problem is embedding of commons-math3?
>>
>>
> We should avoid copy paste as we might miss important fixes in later
> releases. I only did this once for some code where we needed a fix that
> wasn't yet released. It was a hassle.
> I would just add a dependency to commons-math3. Its a library exposing the
> functionality we require, so let's use it.
>
> Michael
>

Re: Percentile implementation

Posted by Michael Dürig <md...@apache.org>.

On 04.07.17 11:15, Francesco Mari wrote:
> 2017-07-04 10:52 GMT+02:00 Andrei Dulceanu <an...@gmail.com>:
>> Now my question is this: do we have a simple percentile implementation in
>> Oak (I didn't find one)?
> 
> I'm not aware of a percentile implementation in Oak.
> 
>> If not, would you recommend writing my own or adapting/extracting an
>> existing one in a utility class?
> 
> In the past we copied and pasted source code from other projects in
> Oak. As long as the license allows it and proper attribution is given,
> it shouldn't be a problem. That said, I'm not a big fan of either
> rewriting an implementation from scratch or copying and pasting source
> code from other projects. Is exposing a percentile really necessary?
> If yes, how big of a problem is embedding of commons-math3?
> 

We should avoid copy paste as we might miss important fixes in later 
releases. I only did this once for some code where we needed a fix that 
wasn't yet released. It was a hassle.
I would just add a dependency to commons-math3. Its a library exposing 
the functionality we require, so let's use it.

Michael

Re: Percentile implementation

Posted by Francesco Mari <ma...@gmail.com>.
2017-07-04 10:52 GMT+02:00 Andrei Dulceanu <an...@gmail.com>:
> Now my question is this: do we have a simple percentile implementation in
> Oak (I didn't find one)?

I'm not aware of a percentile implementation in Oak.

> If not, would you recommend writing my own or adapting/extracting an
> existing one in a utility class?

In the past we copied and pasted source code from other projects in
Oak. As long as the license allows it and proper attribution is given,
it shouldn't be a problem. That said, I'm not a big fan of either
rewriting an implementation from scratch or copying and pasting source
code from other projects. Is exposing a percentile really necessary?
If yes, how big of a problem is embedding of commons-math3?