You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Adriean Khisbe <ad...@live.fr> on 2015/01/15 10:02:32 UTC

[math] suggestion: introducing getSummary method on descriptive statistics

Hi, 
Working on a project I had to capture current state of a DescriptiveStatistics, and choosed to use a StatisticalSummaryValues to hold the value. 
I looked it if was possible to do it in one short method call, but didn't found the method.
However I found some equivalent in SummaryStatistics:
    /**     * Return a {@link StatisticalSummaryValues} instance reporting current     * statistics.     * @return Current values of statistics     */    public StatisticalSummary getSummary() {        return new StatisticalSummaryValues(getMean(), getVariance(), getN(),                getMax(), getMin(), getSum());    }
Might be a good thing to port this functionnality in DescriptiveStatistics so ones can do        recap.getSummary()instead of 
      new StatisticalSummaryValues(recap.getMean(), recap.getVariance(), recap.getN(),                    recap.getMax(), recap.getMin(), recap.getSum())
What do you think about this?

Regards :)

Adriean 		 	   		  

[Math] Fluent API or not? (Was: [math] suggestion: [...] on descriptive statistics)

Posted by Gilles <gi...@harfang.homelinux.org>.
Hello.

Some time ago, we had a discussion about the "fluent API"
paradigm:
   http://markmail.org/message/3fjvucyz7rax4cyi

The excerpt of the conversation below is typical of the antagonism
between proponents of the evolution of the library and proponents
of stability.

In particular, when most people agreed on an API change (based on
identified problems and "by-design" improvements, a.o. increased
robustness, easier maintenance), it is unexpected to encounter
opposition later on, and have to justify from scratch the value of
a similar proposal, for every part of the library.

It is a loss of time for everybody.

A project must have a clear direction, to avoid discussing matters
at lower levels when the high level policy has been defined.

A divisive stand is prone to scare would-be contributors: Not knowing
where CM is heading to does not encourage people to spend time looking
at ways to improve the codebase.


Regards,
Gilles



>>>>>> [...]
>>>>>>
>>>>>> As I said above, what we need is a simple value object.  I am 
>>>>>> fine
>>>>>> with dispensing with the interface though.  What is useful is 
>>>>>> the
>>>>>> value object.
>>>>>
>>>>> If so, I'd insist that each class provides its _own_ "Values" 
>>>>> type,
>>>>> as an inner class (to be as close as possible to the producer),
>>>>> thus:
>>>>>
>>>>>   StatisticalSummary.Values (instead of 
>>>>> "StatisticalSummaryValues")
>>>>>   DescriptiveStatistics.Values
>>>>
>>>> The synchronized versions also produce these.  Why insist on the
>>>> two-level names?
>>>
>>> 1. Sync vs non-sync:
>>>    They produce the (final) "values" of the _same_ type, so the 
>>> base
>>>    class's "Values" can (and should) be used.
>>>
>>> 2. StatisticalSummary and DescriptiveStatistics
>>>    Two different classes: thus a priori, one should not be forced 
>>> to
>>>    produce the same set of values.
>>
>> That is not what I am proposing.  DescriptiveStatistics would get
>> its own value object.
>>
>> I am -1 on eliminating the existing, already used StatisticalSummary
>> VO.  I would like to add DescriptiveStaticsValues that works
>> similarly, but includes more statistics.
>>
>>>    If the classes are meant to always evolve in parallel, I feel 
>>> that
>>>    it could (and, if so, should) be enforced by the design.
>>>
>>> 3. [After you draw my attention to the "Synchronized..." classes.]
>>>    (a)  It is dangerous to have the synchronized version inherit
>>>    from the non-sync one: forgetting to update the subclass (as a
>>>    new method is added to the parent) will make it thread-unsafe.
>>>    Composition is necessary (and, in this case, not more work since
>>>    all methods must be overridden either way) to detect this kind 
>>> of
>>>    bug as early as possible.
>>>    This is surely a change to be implemented in 4.0 in order to
>>>    increase robustness.
>>>    [Sorry to touch a venerable class (@since 1.2)!]
>>
>> -1 on this.  These classes and APIs have been stable for a very long
>> time.  As new methods are added, it is simple and easy to make the
>> required updates.
>
> It's also simple and easy to miss the updates.
>
>>>    (b)  Again, if the sync and non-sync versions are meant to 
>>> evolve
>>>    together (as they obviously are), there shouldn't be two
>>>    distinct classes, but the synchronized version should probably
>>>    be created by a factory method (à la 
>>> "Collection.synchronized..."
>>>    methods).
>>
>> -1 to gratuitous API change giving no benefit to users.
>
> It's no more "gratuitous" than what we have done along the
> years in other packages; it's a real improvement.
> But if you want to oppose modifying old code on the basis that it
> is old, there is no room for discussion.
>
> Synchronization is an implementation choice and does not warrant
> an independent type. The factory method would return an anonymous
> sub-type of "DescriptiveStatistics".
>
>
> Gilles
>> [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] suggestion: introducing getSummary method on descriptive statistics

Posted by Gilles <gi...@harfang.homelinux.org>.
On Sat, 17 Jan 2015 10:16:31 -0700, Phil Steitz wrote:
> On 1/17/15 8:18 AM, Gilles wrote:
>> On Thu, 15 Jan 2015 17:01:46 -0700, Phil Steitz wrote:
>>> On 1/15/15 4:41 PM, Gilles wrote:
>>>> On Thu, 15 Jan 2015 15:09:29 -0700, Phil Steitz wrote:
>>>>> On 1/15/15 9:50 AM, Gilles wrote:
>>>>>> On Thu, 15 Jan 2015 10:02:32 +0100, Adriean Khisbe wrote:
>>>>>>> Hi,
>>>>>>> Working on a project I had to capture current state of a
>>>>>>> DescriptiveStatistics, and choosed to use a
>>>>>>> StatisticalSummaryValues
>>>>>>> to hold the value.
>>>>>>> I looked it if was possible to do it in one short method
>>>>>>> call, but
>>>>>>> didn't found the method.
>>>>>
>>>>> Good point. We should add this.
>>>>>>> However I found some equivalent in SummaryStatistics:
>>>>>>>     /**     * Return a {@link StatisticalSummaryValues} 
>>>>>>> instance
>>>>>>> reporting current     * statistics.     * @return Current
>>>>>>> values of
>>>>>>> statistics     */    public StatisticalSummary getSummary() {
>>>>>>> return new StatisticalSummaryValues(getMean(), getVariance(),
>>>>>>> getN(),
>>>>>>>              getMax(), getMin(), getSum());    }
>>>>>>> Might be a good thing to port this functionnality in
>>>>>>> DescriptiveStatistics so ones can do
>>>>>>> recap.getSummary()instead
>>>>>>> of
>>>>>>>       new StatisticalSummaryValues(recap.getMean(),
>>>>>>> recap.getVariance(), recap.getN(),
>>>>>>> recap.getMax(),
>>>>>>> recap.getMin(), recap.getSum())
>>>>>>> What do you think about this?
>>>>>>
>>>>>> Personally, I don't like the "...Values" class.
>>>>>> I think that it should be a private inner class of
>>>>>> "SummaryStatistics"
>>>>>> (or package-private).
>>>>>>
>>>>>
>>>>> I like it and use it a lot.  It is a handy way to persist / pass
>>>>> around the results of a SummaryStatistics calculation -
>>>>> basically a
>>>>> value object.
>>>>>> I wonder whether, to capture the state, we could have a
>>>>>>   Map<DataTypeId, Double> getState()
>>>>>> method, where "DataTypeId" would either be an "enum" or a
>>>>>> "String".
>>>>>
>>>>> The problem with the data bag type approach to these things is
>>>>> there
>>>>> is no type safety and you can end up with nasty bugs due to 
>>>>> things
>>>>> not being in the bag that you expect.  If you try to make it
>>>>> safe by
>>>>> making the DataTypeID an enum, you have just created an
>>>>> excessively
>>>>> complicated value object - basically, engineering your own 
>>>>> get/set
>>>>> property mgt.
>>>>
>>>> You might be right; I didn't think a lot about it.
>>>>
>>>>>>
>>>>>> Then couldn't "StatisticalSummary" itself be that enum rather
>>>>>> than
>>>>>> an interface (since it seems to only enumerate data and have no
>>>>>> "behaviour")?
>>>>>
>>>>> As I said above, what we need is a simple value object.  I am 
>>>>> fine
>>>>> with dispensing with the interface though.  What is useful is the
>>>>> value object.
>>>>
>>>> If so, I'd insist that each class provides its _own_ "Values" 
>>>> type,
>>>> as an inner class (to be as close as possible to the producer),
>>>> thus:
>>>>
>>>>   StatisticalSummary.Values (instead of 
>>>> "StatisticalSummaryValues")
>>>>   DescriptiveStatistics.Values
>>>
>>> The synchronized versions also produce these.  Why insist on the
>>> two-level names?
>>
>> 1. Sync vs non-sync:
>>    They produce the (final) "values" of the _same_ type, so the base
>>    class's "Values" can (and should) be used.
>>
>> 2. StatisticalSummary and DescriptiveStatistics
>>    Two different classes: thus a priori, one should not be forced to
>>    produce the same set of values.
>
> That is not what I am proposing.  DescriptiveStatistics would get
> its own value object.
>
> I am -1 on eliminating the existing, already used StatisticalSummary
> VO.  I would like to add DescriptiveStaticsValues that works
> similarly, but includes more statistics.
>
>>    If the classes are meant to always evolve in parallel, I feel 
>> that
>>    it could (and, if so, should) be enforced by the design.
>>
>> 3. [After you draw my attention to the "Synchronized..." classes.]
>>    (a)  It is dangerous to have the synchronized version inherit
>>    from the non-sync one: forgetting to update the subclass (as a
>>    new method is added to the parent) will make it thread-unsafe.
>>    Composition is necessary (and, in this case, not more work since
>>    all methods must be overridden either way) to detect this kind of
>>    bug as early as possible.
>>    This is surely a change to be implemented in 4.0 in order to
>>    increase robustness.
>>    [Sorry to touch a venerable class (@since 1.2)!]
>
> -1 on this.  These classes and APIs have been stable for a very long
> time.  As new methods are added, it is simple and easy to make the
> required updates.

It's also simple and easy to miss the updates.

>>    (b)  Again, if the sync and non-sync versions are meant to evolve
>>    together (as they obviously are), there shouldn't be two
>>    distinct classes, but the synchronized version should probably
>>    be created by a factory method (à la "Collection.synchronized..."
>>    methods).
>
> -1 to gratuitous API change giving no benefit to users.

It's no more "gratuitous" than what we have done along the
years in other packages; it's a real improvement.
But if you want to oppose modifying old code on the basis that it
is old, there is no room for discussion.

Synchronization is an implementation choice and does not warrant
an independent type. The factory method would return an anonymous
sub-type of "DescriptiveStatistics".


Gilles

>
> Phil
>>
>>
>> Gilles
>>
>>>
>>> Phil
>>>>
>>>>
>>>>
>>>> Gilles
>>>>
>>>>>
>>>>> For DescriptiveStatistics, the set of statistics is different,
>>>>> so I
>>>>> would suggest we just add DescriptiveStatisticsValues and a
>>>>> getSummary or getResults() method that returns one of those.
>>>>> Patches welcome.
>>>>>
>>>>> Phil
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Gilles
>>>>>>
>>>>>>>
>>>>>>> Regards :)
>>>>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] suggestion: introducing getSummary method on descriptive statistics

Posted by Phil Steitz <ph...@gmail.com>.
On 1/17/15 8:18 AM, Gilles wrote:
> On Thu, 15 Jan 2015 17:01:46 -0700, Phil Steitz wrote:
>> On 1/15/15 4:41 PM, Gilles wrote:
>>> On Thu, 15 Jan 2015 15:09:29 -0700, Phil Steitz wrote:
>>>> On 1/15/15 9:50 AM, Gilles wrote:
>>>>> On Thu, 15 Jan 2015 10:02:32 +0100, Adriean Khisbe wrote:
>>>>>> Hi,
>>>>>> Working on a project I had to capture current state of a
>>>>>> DescriptiveStatistics, and choosed to use a
>>>>>> StatisticalSummaryValues
>>>>>> to hold the value.
>>>>>> I looked it if was possible to do it in one short method
>>>>>> call, but
>>>>>> didn't found the method.
>>>>
>>>> Good point. We should add this.
>>>>>> However I found some equivalent in SummaryStatistics:
>>>>>>     /**     * Return a {@link StatisticalSummaryValues} instance
>>>>>> reporting current     * statistics.     * @return Current
>>>>>> values of
>>>>>> statistics     */    public StatisticalSummary getSummary() {
>>>>>> return new StatisticalSummaryValues(getMean(), getVariance(),
>>>>>> getN(),
>>>>>>              getMax(), getMin(), getSum());    }
>>>>>> Might be a good thing to port this functionnality in
>>>>>> DescriptiveStatistics so ones can do
>>>>>> recap.getSummary()instead
>>>>>> of
>>>>>>       new StatisticalSummaryValues(recap.getMean(),
>>>>>> recap.getVariance(), recap.getN(),
>>>>>> recap.getMax(),
>>>>>> recap.getMin(), recap.getSum())
>>>>>> What do you think about this?
>>>>>
>>>>> Personally, I don't like the "...Values" class.
>>>>> I think that it should be a private inner class of
>>>>> "SummaryStatistics"
>>>>> (or package-private).
>>>>>
>>>>
>>>> I like it and use it a lot.  It is a handy way to persist / pass
>>>> around the results of a SummaryStatistics calculation -
>>>> basically a
>>>> value object.
>>>>> I wonder whether, to capture the state, we could have a
>>>>>   Map<DataTypeId, Double> getState()
>>>>> method, where "DataTypeId" would either be an "enum" or a
>>>>> "String".
>>>>
>>>> The problem with the data bag type approach to these things is
>>>> there
>>>> is no type safety and you can end up with nasty bugs due to things
>>>> not being in the bag that you expect.  If you try to make it
>>>> safe by
>>>> making the DataTypeID an enum, you have just created an
>>>> excessively
>>>> complicated value object - basically, engineering your own get/set
>>>> property mgt.
>>>
>>> You might be right; I didn't think a lot about it.
>>>
>>>>>
>>>>> Then couldn't "StatisticalSummary" itself be that enum rather
>>>>> than
>>>>> an interface (since it seems to only enumerate data and have no
>>>>> "behaviour")?
>>>>
>>>> As I said above, what we need is a simple value object.  I am fine
>>>> with dispensing with the interface though.  What is useful is the
>>>> value object.
>>>
>>> If so, I'd insist that each class provides its _own_ "Values" type,
>>> as an inner class (to be as close as possible to the producer),
>>> thus:
>>>
>>>   StatisticalSummary.Values (instead of "StatisticalSummaryValues")
>>>   DescriptiveStatistics.Values
>>
>> The synchronized versions also produce these.  Why insist on the
>> two-level names?
>
> 1. Sync vs non-sync:
>    They produce the (final) "values" of the _same_ type, so the base
>    class's "Values" can (and should) be used.
>
> 2. StatisticalSummary and DescriptiveStatistics
>    Two different classes: thus a priori, one should not be forced to
>    produce the same set of values.

That is not what I am proposing.  DescriptiveStatistics would get
its own value object.

I am -1 on eliminating the existing, already used StatisticalSummary
VO.  I would like to add DescriptiveStaticsValues that works
similarly, but includes more statistics.  

>    If the classes are meant to always evolve in parallel, I feel that
>    it could (and, if so, should) be enforced by the design.
>
> 3. [After you draw my attention to the "Synchronized..." classes.]
>    (a)  It is dangerous to have the synchronized version inherit
>    from the non-sync one: forgetting to update the subclass (as a
>    new method is added to the parent) will make it thread-unsafe.
>    Composition is necessary (and, in this case, not more work since
>    all methods must be overridden either way) to detect this kind of
>    bug as early as possible.
>    This is surely a change to be implemented in 4.0 in order to
>    increase robustness.
>    [Sorry to touch a venerable class (@since 1.2)!]

-1 on this.  These classes and APIs have been stable for a very long
time.  As new methods are added, it is simple and easy to make the
required updates. 
>    (b)  Again, if the sync and non-sync versions are meant to evolve
>    together (as they obviously are), there shouldn't be two
>    distinct classes, but the synchronized version should probably
>    be created by a factory method (à la "Collection.synchronized..."
>    methods).

-1 to gratuitous API change giving no benefit to users.

Phil
>
>
> Gilles
>
>>
>> Phil
>>>
>>>
>>>
>>> Gilles
>>>
>>>>
>>>> For DescriptiveStatistics, the set of statistics is different,
>>>> so I
>>>> would suggest we just add DescriptiveStatisticsValues and a
>>>> getSummary or getResults() method that returns one of those.
>>>> Patches welcome.
>>>>
>>>> Phil
>>>>>
>>>>>
>>>>> Regards,
>>>>> Gilles
>>>>>
>>>>>>
>>>>>> Regards :)
>>>>>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] suggestion: introducing getSummary method on descriptive statistics

Posted by Gilles <gi...@harfang.homelinux.org>.
On Thu, 15 Jan 2015 17:01:46 -0700, Phil Steitz wrote:
> On 1/15/15 4:41 PM, Gilles wrote:
>> On Thu, 15 Jan 2015 15:09:29 -0700, Phil Steitz wrote:
>>> On 1/15/15 9:50 AM, Gilles wrote:
>>>> On Thu, 15 Jan 2015 10:02:32 +0100, Adriean Khisbe wrote:
>>>>> Hi,
>>>>> Working on a project I had to capture current state of a
>>>>> DescriptiveStatistics, and choosed to use a
>>>>> StatisticalSummaryValues
>>>>> to hold the value.
>>>>> I looked it if was possible to do it in one short method call, 
>>>>> but
>>>>> didn't found the method.
>>>
>>> Good point. We should add this.
>>>>> However I found some equivalent in SummaryStatistics:
>>>>>     /**     * Return a {@link StatisticalSummaryValues} instance
>>>>> reporting current     * statistics.     * @return Current
>>>>> values of
>>>>> statistics     */    public StatisticalSummary getSummary() {
>>>>> return new StatisticalSummaryValues(getMean(), getVariance(),
>>>>> getN(),
>>>>>              getMax(), getMin(), getSum());    }
>>>>> Might be a good thing to port this functionnality in
>>>>> DescriptiveStatistics so ones can do
>>>>> recap.getSummary()instead
>>>>> of
>>>>>       new StatisticalSummaryValues(recap.getMean(),
>>>>> recap.getVariance(), recap.getN(),
>>>>> recap.getMax(),
>>>>> recap.getMin(), recap.getSum())
>>>>> What do you think about this?
>>>>
>>>> Personally, I don't like the "...Values" class.
>>>> I think that it should be a private inner class of
>>>> "SummaryStatistics"
>>>> (or package-private).
>>>>
>>>
>>> I like it and use it a lot.  It is a handy way to persist / pass
>>> around the results of a SummaryStatistics calculation - basically a
>>> value object.
>>>> I wonder whether, to capture the state, we could have a
>>>>   Map<DataTypeId, Double> getState()
>>>> method, where "DataTypeId" would either be an "enum" or a 
>>>> "String".
>>>
>>> The problem with the data bag type approach to these things is 
>>> there
>>> is no type safety and you can end up with nasty bugs due to things
>>> not being in the bag that you expect.  If you try to make it safe 
>>> by
>>> making the DataTypeID an enum, you have just created an excessively
>>> complicated value object - basically, engineering your own get/set
>>> property mgt.
>>
>> You might be right; I didn't think a lot about it.
>>
>>>>
>>>> Then couldn't "StatisticalSummary" itself be that enum rather than
>>>> an interface (since it seems to only enumerate data and have no
>>>> "behaviour")?
>>>
>>> As I said above, what we need is a simple value object.  I am fine
>>> with dispensing with the interface though.  What is useful is the
>>> value object.
>>
>> If so, I'd insist that each class provides its _own_ "Values" type,
>> as an inner class (to be as close as possible to the producer), 
>> thus:
>>
>>   StatisticalSummary.Values (instead of "StatisticalSummaryValues")
>>   DescriptiveStatistics.Values
>
> The synchronized versions also produce these.  Why insist on the
> two-level names?

1. Sync vs non-sync:
    They produce the (final) "values" of the _same_ type, so the base
    class's "Values" can (and should) be used.

2. StatisticalSummary and DescriptiveStatistics
    Two different classes: thus a priori, one should not be forced to
    produce the same set of values.
    If the classes are meant to always evolve in parallel, I feel that
    it could (and, if so, should) be enforced by the design.

3. [After you draw my attention to the "Synchronized..." classes.]
    (a)  It is dangerous to have the synchronized version inherit
    from the non-sync one: forgetting to update the subclass (as a
    new method is added to the parent) will make it thread-unsafe.
    Composition is necessary (and, in this case, not more work since
    all methods must be overridden either way) to detect this kind of
    bug as early as possible.
    This is surely a change to be implemented in 4.0 in order to
    increase robustness.
    [Sorry to touch a venerable class (@since 1.2)!]
    (b)  Again, if the sync and non-sync versions are meant to evolve
    together (as they obviously are), there shouldn't be two
    distinct classes, but the synchronized version should probably
    be created by a factory method (à la "Collection.synchronized..."
    methods).


Gilles

>
> Phil
>>
>>
>>
>> Gilles
>>
>>>
>>> For DescriptiveStatistics, the set of statistics is different, so I
>>> would suggest we just add DescriptiveStatisticsValues and a
>>> getSummary or getResults() method that returns one of those.
>>> Patches welcome.
>>>
>>> Phil
>>>>
>>>>
>>>> Regards,
>>>> Gilles
>>>>
>>>>>
>>>>> Regards :)
>>>>>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] suggestion: introducing getSummary method on descriptive statistics

Posted by Phil Steitz <ph...@gmail.com>.
On 1/15/15 4:41 PM, Gilles wrote:
> On Thu, 15 Jan 2015 15:09:29 -0700, Phil Steitz wrote:
>> On 1/15/15 9:50 AM, Gilles wrote:
>>> On Thu, 15 Jan 2015 10:02:32 +0100, Adriean Khisbe wrote:
>>>> Hi,
>>>> Working on a project I had to capture current state of a
>>>> DescriptiveStatistics, and choosed to use a
>>>> StatisticalSummaryValues
>>>> to hold the value.
>>>> I looked it if was possible to do it in one short method call, but
>>>> didn't found the method.
>>
>> Good point. We should add this.
>>>> However I found some equivalent in SummaryStatistics:
>>>>     /**     * Return a {@link StatisticalSummaryValues} instance
>>>> reporting current     * statistics.     * @return Current
>>>> values of
>>>> statistics     */    public StatisticalSummary getSummary() {
>>>> return new StatisticalSummaryValues(getMean(), getVariance(),
>>>> getN(),
>>>>              getMax(), getMin(), getSum());    }
>>>> Might be a good thing to port this functionnality in
>>>> DescriptiveStatistics so ones can do
>>>> recap.getSummary()instead
>>>> of
>>>>       new StatisticalSummaryValues(recap.getMean(),
>>>> recap.getVariance(), recap.getN(),
>>>> recap.getMax(),
>>>> recap.getMin(), recap.getSum())
>>>> What do you think about this?
>>>
>>> Personally, I don't like the "...Values" class.
>>> I think that it should be a private inner class of
>>> "SummaryStatistics"
>>> (or package-private).
>>>
>>
>> I like it and use it a lot.  It is a handy way to persist / pass
>> around the results of a SummaryStatistics calculation - basically a
>> value object.
>>> I wonder whether, to capture the state, we could have a
>>>   Map<DataTypeId, Double> getState()
>>> method, where "DataTypeId" would either be an "enum" or a "String".
>>
>> The problem with the data bag type approach to these things is there
>> is no type safety and you can end up with nasty bugs due to things
>> not being in the bag that you expect.  If you try to make it safe by
>> making the DataTypeID an enum, you have just created an excessively
>> complicated value object - basically, engineering your own get/set
>> property mgt.
>
> You might be right; I didn't think a lot about it.
>
>>>
>>> Then couldn't "StatisticalSummary" itself be that enum rather than
>>> an interface (since it seems to only enumerate data and have no
>>> "behaviour")?
>>
>> As I said above, what we need is a simple value object.  I am fine
>> with dispensing with the interface though.  What is useful is the
>> value object.
>
> If so, I'd insist that each class provides its _own_ "Values" type,
> as an inner class (to be as close as possible to the producer), thus:
>
>   StatisticalSummary.Values (instead of "StatisticalSummaryValues")
>   DescriptiveStatistics.Values

The synchronized versions also produce these.  Why insist on the
two-level names?

Phil
>
>
>
> Gilles
>
>>
>> For DescriptiveStatistics, the set of statistics is different, so I
>> would suggest we just add DescriptiveStatisticsValues and a
>> getSummary or getResults() method that returns one of those.
>> Patches welcome.
>>
>> Phil
>>>
>>>
>>> Regards,
>>> Gilles
>>>
>>>>
>>>> Regards :)
>>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] suggestion: introducing getSummary method on descriptive statistics

Posted by Gilles <gi...@harfang.homelinux.org>.
On Thu, 15 Jan 2015 15:09:29 -0700, Phil Steitz wrote:
> On 1/15/15 9:50 AM, Gilles wrote:
>> On Thu, 15 Jan 2015 10:02:32 +0100, Adriean Khisbe wrote:
>>> Hi,
>>> Working on a project I had to capture current state of a
>>> DescriptiveStatistics, and choosed to use a 
>>> StatisticalSummaryValues
>>> to hold the value.
>>> I looked it if was possible to do it in one short method call, but
>>> didn't found the method.
>
> Good point. We should add this.
>>> However I found some equivalent in SummaryStatistics:
>>>     /**     * Return a {@link StatisticalSummaryValues} instance
>>> reporting current     * statistics.     * @return Current values of
>>> statistics     */    public StatisticalSummary getSummary() {
>>> return new StatisticalSummaryValues(getMean(), getVariance(),
>>> getN(),
>>>              getMax(), getMin(), getSum());    }
>>> Might be a good thing to port this functionnality in
>>> DescriptiveStatistics so ones can do
>>> recap.getSummary()instead
>>> of
>>>       new StatisticalSummaryValues(recap.getMean(),
>>> recap.getVariance(), recap.getN(),
>>> recap.getMax(),
>>> recap.getMin(), recap.getSum())
>>> What do you think about this?
>>
>> Personally, I don't like the "...Values" class.
>> I think that it should be a private inner class of
>> "SummaryStatistics"
>> (or package-private).
>>
>
> I like it and use it a lot.  It is a handy way to persist / pass
> around the results of a SummaryStatistics calculation - basically a
> value object.
>> I wonder whether, to capture the state, we could have a
>>   Map<DataTypeId, Double> getState()
>> method, where "DataTypeId" would either be an "enum" or a "String".
>
> The problem with the data bag type approach to these things is there
> is no type safety and you can end up with nasty bugs due to things
> not being in the bag that you expect.  If you try to make it safe by
> making the DataTypeID an enum, you have just created an excessively
> complicated value object - basically, engineering your own get/set
> property mgt.

You might be right; I didn't think a lot about it.

>>
>> Then couldn't "StatisticalSummary" itself be that enum rather than
>> an interface (since it seems to only enumerate data and have no
>> "behaviour")?
>
> As I said above, what we need is a simple value object.  I am fine
> with dispensing with the interface though.  What is useful is the
> value object.

If so, I'd insist that each class provides its _own_ "Values" type,
as an inner class (to be as close as possible to the producer), thus:

   StatisticalSummary.Values (instead of "StatisticalSummaryValues")
   DescriptiveStatistics.Values
   etc.


Gilles

>
> For DescriptiveStatistics, the set of statistics is different, so I
> would suggest we just add DescriptiveStatisticsValues and a
> getSummary or getResults() method that returns one of those.
> Patches welcome.
>
> Phil
>>
>>
>> Regards,
>> Gilles
>>
>>>
>>> Regards :)
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] suggestion: introducing getSummary method on descriptive statistics

Posted by Phil Steitz <ph...@gmail.com>.
On 1/15/15 9:50 AM, Gilles wrote:
> On Thu, 15 Jan 2015 10:02:32 +0100, Adriean Khisbe wrote:
>> Hi,
>> Working on a project I had to capture current state of a
>> DescriptiveStatistics, and choosed to use a StatisticalSummaryValues
>> to hold the value.
>> I looked it if was possible to do it in one short method call, but
>> didn't found the method.

Good point. We should add this.
>> However I found some equivalent in SummaryStatistics:
>>     /**     * Return a {@link StatisticalSummaryValues} instance
>> reporting current     * statistics.     * @return Current values of
>> statistics     */    public StatisticalSummary getSummary() {
>> return new StatisticalSummaryValues(getMean(), getVariance(),
>> getN(),
>>              getMax(), getMin(), getSum());    }
>> Might be a good thing to port this functionnality in
>> DescriptiveStatistics so ones can do       
>> recap.getSummary()instead
>> of
>>       new StatisticalSummaryValues(recap.getMean(),
>> recap.getVariance(), recap.getN(),                   
>> recap.getMax(),
>> recap.getMin(), recap.getSum())
>> What do you think about this?
>
> Personally, I don't like the "...Values" class.
> I think that it should be a private inner class of
> "SummaryStatistics"
> (or package-private).
>

I like it and use it a lot.  It is a handy way to persist / pass
around the results of a SummaryStatistics calculation - basically a
value object. 
> I wonder whether, to capture the state, we could have a
>   Map<DataTypeId, Double> getState()
> method, where "DataTypeId" would either be an "enum" or a "String".

The problem with the data bag type approach to these things is there
is no type safety and you can end up with nasty bugs due to things
not being in the bag that you expect.  If you try to make it safe by
making the DataTypeID an enum, you have just created an excessively
complicated value object - basically, engineering your own get/set
property mgt.
>
> Then couldn't "StatisticalSummary" itself be that enum rather than
> an interface (since it seems to only enumerate data and have no
> "behaviour")?

As I said above, what we need is a simple value object.  I am fine
with dispensing with the interface though.  What is useful is the
value object.

For DescriptiveStatistics, the set of statistics is different, so I
would suggest we just add DescriptiveStatisticsValues and a
getSummary or getResults() method that returns one of those. 
Patches welcome.

Phil
>
>
> Regards,
> Gilles
>
>>
>> Regards :)
>>
>> Adriean
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] suggestion: introducing getSummary method on descriptive statistics

Posted by Gilles <gi...@harfang.homelinux.org>.
On Thu, 15 Jan 2015 10:02:32 +0100, Adriean Khisbe wrote:
> Hi,
> Working on a project I had to capture current state of a
> DescriptiveStatistics, and choosed to use a StatisticalSummaryValues
> to hold the value.
> I looked it if was possible to do it in one short method call, but
> didn't found the method.
> However I found some equivalent in SummaryStatistics:
>     /**     * Return a {@link StatisticalSummaryValues} instance
> reporting current     * statistics.     * @return Current values of
> statistics     */    public StatisticalSummary getSummary() {
> return new StatisticalSummaryValues(getMean(), getVariance(), getN(),
>              getMax(), getMin(), getSum());    }
> Might be a good thing to port this functionnality in
> DescriptiveStatistics so ones can do        recap.getSummary()instead
> of
>       new StatisticalSummaryValues(recap.getMean(),
> recap.getVariance(), recap.getN(),                    recap.getMax(),
> recap.getMin(), recap.getSum())
> What do you think about this?

Personally, I don't like the "...Values" class.
I think that it should be a private inner class of "SummaryStatistics"
(or package-private).

I wonder whether, to capture the state, we could have a
   Map<DataTypeId, Double> getState()
method, where "DataTypeId" would either be an "enum" or a "String".

Then couldn't "StatisticalSummary" itself be that enum rather than
an interface (since it seems to only enumerate data and have no
"behaviour")?


Regards,
Gilles

>
> Regards :)
>
> Adriean


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org