You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by venkatesha murthy <ve...@gmail.com> on 2014/06/01 22:31:05 UTC

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

I have gone through Wikipedia and R functions to get an understanding.

My idea is to come up with different estimation techniques as strategies
(Enums) and constrction inject during percentile object creation.
The evaluate method could then use this estimation tecnhique to complete
the computation. kth selection, pivoting can be futher encapsulated as
nested classes and be used within EstimationTecnhique Enum.

I have updated the bug 1120 along with a patch for more details. Please let
know your opinions.

Thanks
Venkat.


On Thu, May 22, 2014 at 7:53 AM, venkatesha murthy <
venkateshamurthyts@gmail.com> wrote:

> All,
>
> Agreed and thanks for opinionating..
> I will work through this to get up with a draft design on the same and
> propse for review in some time.
>
> Thanks
> Venkat.
>
> On Thu, May 22, 2014 at 2:27 AM, Phil Steitz <ph...@gmail.com>
> wrote:
>
>>  On 5/21/14, 1:43 PM, Gilles wrote:
>> > On Wed, 21 May 2014 13:16:26 -0700, Phil Steitz wrote:
>> >> On 5/21/14, 12:18 PM, venkatesha murthy wrote:
>> >>> Hi All,
>> >>>
>> >>> The existing Percentile class calculates the percentile based on
>> >>> the
>> >>> quantile position of the array fixed as
>> >>> p * (N+1)/100 for a pth Percentile on an Array of size N.
>> >>> However if we
>> >>> were to add these numbers in MS Excel
>> >>> to calculate the percentile it provides a different result and
>> >>> closely
>> >>> resembeles the formula [p*(N-1)/100]+1.
>> >>>
>> >>> Its imperative at times to match the computations to a standard
>> >>> spreadsheet
>> >>> calculations or to a standard tool;
>> >>
>> >> What is "imperative" is that the implementation matches what the
>> >> documentation says.  We do like to compare our results to other
>> >> packages, though, and to explain differences where they exist.  You
>> >> have basically done that above.
>> >>> which is why i request for allowing the quantile position to be
>> >>> customized.
>> >>
>> >> That is a reasonable request, as there are lots of different ways to
>> >> compute quantiles.
>> >>> Infact even the kth selection used
>> >>> can also be refactored as a strategy(than as a private methods)
>> >>> as a
>> >>> further step.
>> >>
>> >> Agreed.
>> >>>
>> >>> So if atleast the Percentile class were to allow the quantile
>> >>> position to
>> >>> be customized in the sub classes; then
>> >>> the end user may be helped in providing the formula of their
>> >>> choice.
>> >>>
>> >>> The most minimal change i am proposing here is to just make the
>> >>> quantile
>> >>> position setting as a protected method and i have attached a
>> >>> possible patch
>> >>> in [MATH-1120] <https://issues.apache.org/jira/browse/MATH-1120>
>> >>>
>> >>> Request all to opinionate on this
>> >>
>> >> I think that what would be best here would be to really dig into the
>> >> different kinds of algorithms that see practical use and then
>> >> encapsulate a strategy object of some kind that could be passed in
>> >> as an optional constructor argument.  I would start with [1] as a
>> >> reference.  We don't actually have to implement anything but what
>> >> you have immediate need for; but we should design the
>> >> QuantileStrategy (or better name) object so that it can carry the
>> >> right configuration parameters for the different strategies likely
>> >> to be needed.
>> >
>> > Any objection to having a protected method, as the OP suggested?
>>
>> The problem there is that it forces the user to actually subclass
>> and once that is done the behavior is essentially undefined (i.e.,
>> the end user of whatever is created doesn't really have a clearly
>> defined contract unless they rewrite it).   Much better to actually
>> implement - and document - alternatives.
>>
>> That approach also only covers one aspect of the variability in
>> algorithms.
>>
>> Phil
>>  >
>> >
>> > Gilles
>> >
>> >>
>> >> Phil
>> >>
>> >> [1] Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in
>> >> statistical packages, /American Statistician/ *50*, 361–365.
>> >>>
>> >>> thanks
>> >>> venkat
>> >>>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> > For additional commands, e-mail: dev-help@commons.apache.org
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

Posted by Gilles <gi...@harfang.homelinux.org>.

On Wed, 11 Jun 2014 11:50:13 +0100, Schalk W. Cronjé wrote:
> As a consumer of the library I'll have no idea what R_1 means. Even 
> if I
> know what is is, I might have forgotten what it is at the time of 
> usage, so
> a mental reminder might be useful. At the minimum the javadoc should 
> refer
> to the link you have shown.

In the current state, the Javadoc refers to a Wikipedia article, which 
itself
refers to the R manual.
The Javadoc also shows the formulae used to implement the variant at 
hand.

> Even better if you rather use something like INV_EDF_R1 which at 
> least
> might remind me that it is the inverse empiral distribution function 
> and is
> equivalent to R's Type 1.

I also prefer explicit names but I have no idea of what would be 
explicit
enough (without being extremely lengthy).


Regards,
Gilles

>
> HTH
>
>
> On 11 June 2014 11:40, Gilles <gi...@harfang.homelinux.org> wrote:
>
>> On Mon, 9 Jun 2014 20:03:57 +0800 (SGT), venkatesha m wrote:
>>
>>> Hi All,
>>>
>>> I am looking for opinion on the name of the enum for the various
>>> estimation strategies.
>>> This is a public static enum under Percentile and i wish to call it
>>> EstimationTecnique.
>>> Would appreciate if you can provide feedback on the name or the
>>> current proposed name is fine.
>>>
>>> I have the patch attached to MATH-1120
>>> (percentile-wth-estimation-patch) for the reference.
>>>
>>
>> IIUC, in this reference
>>   
>> http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
>> what you called "EstimationTechnique" is referred to as "Type".
>>
>> Then the R manual uses a numbering: 1 to 9.
>>
>> Was Commons Math's implementation none of those nine types?
>> I wouldn't name the CM's implementation DEFAULT (and the R's manual
>> refers to a paper that recommends "type 8").
>>
>> If it's OK to keep a tight link to the R's description of the 
>> variants,
>> I'd suggest
>>
>> public enum Type {
>>   CM,  // instead of DEFAULT
>>   R_1,
>>   R_2,
>>   R_3,
>>   R_4,
>>   R_5,
>>   R_6,
>>   R_7,
>>   R_8,
>>   R_9,
>>   // TYPE_TEN ?
>> }
>>
>> R_9 is not implemented in the patch. Is it intended?
>> Then on the Wikipedia page there is an unnamed 10th variant, also
>> not implemented.
>>
>> People knowledgeable in what should be expected from such a
>> functionality are most welcome to provide feedback...
>>
>>
>> Regards,
>> Gilles
>>
>>
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

Posted by "Schalk W. Cronjé" <ys...@gmail.com>.

As a consumer of the library I'll have no idea what R_1 means. Even if I
know what is is, I might have forgotten what it is at the time of usage, so
a mental reminder might be useful. At the minimum the javadoc should refer
to the link you have shown.

Even better if you rather use something like INV_EDF_R1 which at least
might remind me that it is the inverse empiral distribution function and is
equivalent to R's Type 1.

HTH


On 11 June 2014 11:40, Gilles <gi...@harfang.homelinux.org> wrote:

> On Mon, 9 Jun 2014 20:03:57 +0800 (SGT), venkatesha m wrote:
>
>> Hi All,
>>
>> I am looking for opinion on the name of the enum for the various
>> estimation strategies.
>> This is a public static enum under Percentile and i wish to call it
>> EstimationTecnique.
>> Would appreciate if you can provide feedback on the name or the
>> current proposed name is fine.
>>
>> I have the patch attached to MATH-1120
>> (percentile-wth-estimation-patch) for the reference.
>>
>
> IIUC, in this reference
>   http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
> what you called "EstimationTechnique" is referred to as "Type".
>
> Then the R manual uses a numbering: 1 to 9.
>
> Was Commons Math's implementation none of those nine types?
> I wouldn't name the CM's implementation DEFAULT (and the R's manual
> refers to a paper that recommends "type 8").
>
> If it's OK to keep a tight link to the R's description of the variants,
> I'd suggest
>
> public enum Type {
>   CM,  // instead of DEFAULT
>   R_1,
>   R_2,
>   R_3,
>   R_4,
>   R_5,
>   R_6,
>   R_7,
>   R_8,
>   R_9,
>   // TYPE_TEN ?
> }
>
> R_9 is not implemented in the patch. Is it intended?
> Then on the Wikipedia page there is an unnamed 10th variant, also
> not implemented.
>
> People knowledgeable in what should be expected from such a
> functionality are most welcome to provide feedback...
>
>
> Regards,
> Gilles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

Posted by venkatesha murthy <ve...@gmail.com>.

All comments are incorporated except for type 10 of wkipedia as i couldnt
get a comparing tool to provide.
I have also added min and maxLimits to javadoc.

Please let know.


On Wed, Jun 11, 2014 at 6:27 PM, venkatesha murthy <
venkateshamurthyts@gmail.com> wrote:

>
>
> On Wed, Jun 11, 2014 at 4:10 PM, Gilles <gi...@harfang.homelinux.org>
> wrote:
>
>> On Mon, 9 Jun 2014 20:03:57 +0800 (SGT), venkatesha m wrote:
>>
>>> Hi All,
>>>
>>> I am looking for opinion on the name of the enum for the various
>>> estimation strategies.
>>> This is a public static enum under Percentile and i wish to call it
>>> EstimationTecnique.
>>> Would appreciate if you can provide feedback on the name or the
>>> current proposed name is fine.
>>>
>>> I have the patch attached to MATH-1120
>>> (percentile-wth-estimation-patch) for the reference.
>>>
>>
>> IIUC, in this reference
>>   http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
>> what you called "EstimationTechnique" is referred to as "Type".
>>
>> Then the R manual uses a numbering: 1 to 9.
>>
>
> Would it be ok to call it as EstimateType as is mentioned in the
> Wikipedia? (rather than just type since it is about different estimation
> styles/types)? Please let know,
>
>>
>> Was Commons Math's implementation none of those nine types?
>> I wouldn't name the CM's implementation DEFAULT (and the R's manual
>> refers to a paper that recommends "type 8").
>>
>> Commons Math comes very close to R-6 however it is the max and min limits
> as to when x1 and xN needs to be considered that would differ  between CM
> and R6..
>
>
>> If it's OK to keep a tight link to the R's description of the variants,
>> I'd suggest
>>
>> public enum Type {
>>   CM,  // instead of DEFAULT
>>   R_1,
>>   R_2,
>>   R_3,
>>   R_4,
>>   R_5,
>>   R_6,
>>   R_7,
>>   R_8,
>>   R_9,
>>   // TYPE_TEN ?
>> }
>>
>> Agreed taken. Also please let know if R_10 is ok for un-named estimation
> type.(the TYPE_TEN?? that you mention) ?
>
>
>> R_9 is not implemented in the patch. Is it intended?
>> Then on the Wikipedia page there is an unnamed 10th variant, also
>> not implemented.
>>
>> Well yes i didnt go about implementing all of them however initially.
> But; i can add those in the next patch
>
>
>> People knowledgeable in what should be expected from such a
>> functionality are most welcome to provide feedback...
>>
>>
>> Regards,
>> Gilles
>>
>>
>> Thanks so much for the comments.
>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

Posted by venkatesha murthy <ve...@gmail.com>.

On Wed, Jun 11, 2014 at 4:10 PM, Gilles <gi...@harfang.homelinux.org>
wrote:

> On Mon, 9 Jun 2014 20:03:57 +0800 (SGT), venkatesha m wrote:
>
>> Hi All,
>>
>> I am looking for opinion on the name of the enum for the various
>> estimation strategies.
>> This is a public static enum under Percentile and i wish to call it
>> EstimationTecnique.
>> Would appreciate if you can provide feedback on the name or the
>> current proposed name is fine.
>>
>> I have the patch attached to MATH-1120
>> (percentile-wth-estimation-patch) for the reference.
>>
>
> IIUC, in this reference
>   http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
> what you called "EstimationTechnique" is referred to as "Type".
>
> Then the R manual uses a numbering: 1 to 9.
>

Would it be ok to call it as EstimateType as is mentioned in the Wikipedia?
(rather than just type since it is about different estimation
styles/types)? Please let know,

>
> Was Commons Math's implementation none of those nine types?
> I wouldn't name the CM's implementation DEFAULT (and the R's manual
> refers to a paper that recommends "type 8").
>
> Commons Math comes very close to R-6 however it is the max and min limits
as to when x1 and xN needs to be considered that would differ  between CM
and R6..


> If it's OK to keep a tight link to the R's description of the variants,
> I'd suggest
>
> public enum Type {
>   CM,  // instead of DEFAULT
>   R_1,
>   R_2,
>   R_3,
>   R_4,
>   R_5,
>   R_6,
>   R_7,
>   R_8,
>   R_9,
>   // TYPE_TEN ?
> }
>
> Agreed taken. Also please let know if R_10 is ok for un-named estimation
type.(the TYPE_TEN?? that you mention) ?


> R_9 is not implemented in the patch. Is it intended?
> Then on the Wikipedia page there is an unnamed 10th variant, also
> not implemented.
>
> Well yes i didnt go about implementing all of them however initially. But;
i can add those in the next patch


> People knowledgeable in what should be expected from such a
> functionality are most welcome to provide feedback...
>
>
> Regards,
> Gilles
>
>
> Thanks so much for the comments.

>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

Posted by Gilles <gi...@harfang.homelinux.org>.

On Mon, 9 Jun 2014 20:03:57 +0800 (SGT), venkatesha m wrote:
> Hi All,
>
> I am looking for opinion on the name of the enum for the various
> estimation strategies.
> This is a public static enum under Percentile and i wish to call it
> EstimationTecnique.
> Would appreciate if you can provide feedback on the name or the
> current proposed name is fine.
>
> I have the patch attached to MATH-1120
> (percentile-wth-estimation-patch) for the reference.

IIUC, in this reference
   http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
what you called "EstimationTechnique" is referred to as "Type".

Then the R manual uses a numbering: 1 to 9.

Was Commons Math's implementation none of those nine types?
I wouldn't name the CM's implementation DEFAULT (and the R's manual
refers to a paper that recommends "type 8").

If it's OK to keep a tight link to the R's description of the variants,
I'd suggest

public enum Type {
   CM,  // instead of DEFAULT
   R_1,
   R_2,
   R_3,
   R_4,
   R_5,
   R_6,
   R_7,
   R_8,
   R_9,
   // TYPE_TEN ?
}

R_9 is not implemented in the patch. Is it intended?
Then on the Wikipedia page there is an unnamed 10th variant, also
not implemented.

People knowledgeable in what should be expected from such a
functionality are most welcome to provide feedback...

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [MATH-1120] Needed opinion about support on variations in percentile calculation

Posted by venkatesha m <ts...@yahoo.com.INVALID>.

Hi All,

I am looking for opinion on the name of the enum for the various estimation strategies. 
This is a public static enum under Percentile and i wish to call it EstimationTecnique.
Would appreciate if you can provide feedback on the name or the current proposed name is fine.

I have the patch attached to MATH-1120 (percentile-wth-estimation-patch) for the reference.

thanks
venkat.



--------------------------------------------
On Mon, 2/6/14, venkatesha murthy <ve...@gmail.com> wrote:

 Subject: Re: [MATH-1120] Needed opinion about support on variations in percentile calculation
 To: "Commons Developers List" <de...@commons.apache.org>
 Date: Monday, 2 June, 2014, 2:01 AM
 
 I have gone through Wikipedia and R
 functions to get an understanding.
 
 My idea is to come up with different estimation techniques
 as strategies
 (Enums) and constrction inject during percentile object
 creation.
 The evaluate method could then use this estimation tecnhique
 to complete
 the computation. kth selection, pivoting can be futher
 encapsulated as
 nested classes and be used within EstimationTecnhique Enum.
 
 I have updated the bug 1120 along with a patch for more
 details. Please let
 know your opinions.
 
 Thanks
 Venkat.
 
 
 On Thu, May 22, 2014 at 7:53 AM, venkatesha murthy <
 venkateshamurthyts@gmail.com>
 wrote:
 
 > All,
 >
 > Agreed and thanks for opinionating..
 > I will work through this to get up with a draft design
 on the same and
 > propse for review in some time.
 >
 > Thanks
 > Venkat.
 >
 > On Thu, May 22, 2014 at 2:27 AM, Phil Steitz <ph...@gmail.com>
 > wrote:
 >
 >>  On 5/21/14, 1:43 PM, Gilles wrote:
 >> > On Wed, 21 May 2014 13:16:26 -0700, Phil
 Steitz wrote:
 >> >> On 5/21/14, 12:18 PM, venkatesha murthy
 wrote:
 >> >>> Hi All,
 >> >>>
 >> >>> The existing Percentile class
 calculates the percentile based on
 >> >>> the
 >> >>> quantile position of the array fixed
 as
 >> >>> p * (N+1)/100 for a pth Percentile on
 an Array of size N.
 >> >>> However if we
 >> >>> were to add these numbers in MS Excel
 >> >>> to calculate the percentile it
 provides a different result and
 >> >>> closely
 >> >>> resembeles the formula
 [p*(N-1)/100]+1.
 >> >>>
 >> >>> Its imperative at times to match the
 computations to a standard
 >> >>> spreadsheet
 >> >>> calculations or to a standard tool;
 >> >>
 >> >> What is "imperative" is that the
 implementation matches what the
 >> >> documentation says.  We do like to
 compare our results to other
 >> >> packages, though, and to explain
 differences where they exist.  You
 >> >> have basically done that above.
 >> >>> which is why i request for allowing
 the quantile position to be
 >> >>> customized.
 >> >>
 >> >> That is a reasonable request, as there are
 lots of different ways to
 >> >> compute quantiles.
 >> >>> Infact even the kth selection used
 >> >>> can also be refactored as a
 strategy(than as a private methods)
 >> >>> as a
 >> >>> further step.
 >> >>
 >> >> Agreed.
 >> >>>
 >> >>> So if atleast the Percentile class
 were to allow the quantile
 >> >>> position to
 >> >>> be customized in the sub classes;
 then
 >> >>> the end user may be helped in
 providing the formula of their
 >> >>> choice.
 >> >>>
 >> >>> The most minimal change i am proposing
 here is to just make the
 >> >>> quantile
 >> >>> position setting as a protected method
 and i have attached a
 >> >>> possible patch
 >> >>> in [MATH-1120] <https://issues.apache.org/jira/browse/MATH-1120>
 >> >>>
 >> >>> Request all to opinionate on this
 >> >>
 >> >> I think that what would be best here would
 be to really dig into the
 >> >> different kinds of algorithms that see
 practical use and then
 >> >> encapsulate a strategy object of some kind
 that could be passed in
 >> >> as an optional constructor argument. 
 I would start with [1] as a
 >> >> reference.  We don't actually have to
 implement anything but what
 >> >> you have immediate need for; but we should
 design the
 >> >> QuantileStrategy (or better name) object
 so that it can carry the
 >> >> right configuration parameters for the
 different strategies likely
 >> >> to be needed.
 >> >
 >> > Any objection to having a protected method, as
 the OP suggested?
 >>
 >> The problem there is that it forces the user to
 actually subclass
 >> and once that is done the behavior is essentially
 undefined (i.e.,
 >> the end user of whatever is created doesn't really
 have a clearly
 >> defined contract unless they rewrite
 it).   Much better to actually
 >> implement - and document - alternatives.
 >>
 >> That approach also only covers one aspect of the
 variability in
 >> algorithms.
 >>
 >> Phil
 >>  >
 >> >
 >> > Gilles
 >> >
 >> >>
 >> >> Phil
 >> >>
 >> >> [1] Hyndman, R. J. and Fan, Y. (1996)
 Sample quantiles in
 >> >> statistical packages, /American
 Statistician/ *50*, 361–365.
 >> >>>
 >> >>> thanks
 >> >>> venkat
 >> >>>
 >> >
 >> >
 >> >
 ---------------------------------------------------------------------
 >> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
 >> > For additional commands, e-mail: dev-help@commons.apache.org
 >> >
 >> >
 >>
 >>
 >>
 ---------------------------------------------------------------------
 >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
 >> For additional commands, e-mail: dev-help@commons.apache.org
 >>
 >>
 >
 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org