You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Eric Barnhill <er...@gmail.com> on 2018/01/30 09:02:03 UTC

[statistics] Storeless statistics - two points about Mean()

Overall I think the old math-statistics functioned well and I would not  be
inclined to mess with the old object hierarchy without reason. But there
are some strange design choices in this code. Mean() is used here as an
example.

1) In Mean() the two constructors create a FirstMoment() object:

--
/** Constructs a Mean. */
    public Mean() {
        incMoment = true;
        moment = new FirstMoment();
    }

    /**
     * Constructs a Mean with an External Moment.
     *
     * @param m1 the moment
     */
    public Mean(final FirstMoment m1) {
        this.moment = m1;
        incMoment = false;
    }
--

And then this moment object, as far as I can tell, is never used in the
mean calculation at all:

---

@Override
    public double evaluate(final double[] values, final int begin, final
int length)
        throws MathIllegalArgumentException {

        if (MathArrays.verifyValues(values, begin, length)) {
            Sum sum = new Sum();
            double sampleSize = length;

            // Compute initial estimate using definitional formula
            double xbar = sum.evaluate(values, begin, length) / sampleSize;

            // Compute correction factor in second pass
            double correction = 0;
            for (int i = begin; i < begin + length; i++) {
                correction += values[i] - xbar;
            }
            return xbar + (correction/sampleSize);
        }
        return Double.NaN;
    }
---

FirstMoment just seems  to be in the constructor for its own sake.

I think  it would make more sense to at the very least make Mean and
similar classes very lean, and see if there are any use cases for which
this is not sufficient.

2)

Mean extends AbstractStorelessUnivariateStatistics. I have no  problem with
the class being Storeless, which I think means immutable. But, in that case
the methods might as well be static.  However the old grammar of

new Mean().evaluate()

is only marginally different from what the static grammar would be:

Mean.evaluate

-Eric

Re: [statistics] Storeless statistics - two points about Mean()

Posted by Gilles <gi...@harfang.homelinux.org>.
Hi.

On Tue, 30 Jan 2018 10:02:03 +0100, Eric Barnhill wrote:
> Overall I think the old math-statistics functioned well and I would 
> not  be
> inclined to mess with the old object hierarchy without reason.

There are good reasons:
   https://issues.apache.org/jira/browse/MATH-1228
   https://issues.apache.org/jira/browse/MATH-1281

Porting without fixing such things is nonsense.

> But there
> are some strange design choices in this code.

Strange designs should not stay without reason. ;-)

> Mean() is used here as an
> example.
>
> 1) In Mean() the two constructors create a FirstMoment() object:
>
> --
> /** Constructs a Mean. */
>     public Mean() {
>         incMoment = true;
>         moment = new FirstMoment();
>     }
>
>     /**
>      * Constructs a Mean with an External Moment.
>      *
>      * @param m1 the moment
>      */
>     public Mean(final FirstMoment m1) {
>         this.moment = m1;
>         incMoment = false;
>     }
> --

"FirstMoment" is mentioned in the above JIRA issue.

> And then this moment object, as far as I can tell, is never used in 
> the
> mean calculation at all:
>
> ---
>
> @Override
>     public double evaluate(final double[] values, final int begin, 
> final
> int length)
>         throws MathIllegalArgumentException {
>
>         if (MathArrays.verifyValues(values, begin, length)) {
>             Sum sum = new Sum();
>             double sampleSize = length;
>
>             // Compute initial estimate using definitional formula
>             double xbar = sum.evaluate(values, begin, length) / 
> sampleSize;
>
>             // Compute correction factor in second pass
>             double correction = 0;
>             for (int i = begin; i < begin + length; i++) {
>                 correction += values[i] - xbar;
>             }
>             return xbar + (correction/sampleSize);
>         }
>         return Double.NaN;
>     }
> ---

Indeed, problem 1 is, as you note below, that an instance method
is used to compute the mean of external data.  It should have been
"static".
Another possibility is to pass the data to the instance, overloading
"increment":
   void increment(double[] values, int startIndex, int len);

>
> FirstMoment just seems  to be in the constructor for its own sake.

Not so; "moment" is used in "getResult()".
IIRC, the constructor you refer to enables aggregation.
But this seems an ad-hoc addition to the API (and the cause of a bug,
as the OP reported on JIRA).

> I think  it would make more sense to at the very least make Mean and
> similar classes very lean, and see if there are any use cases for 
> which
> this is not sufficient.

So indeed, you suggest to either change the design or drop some
of the current functionality (aggregation).

>
> 2)
>
> Mean extends AbstractStorelessUnivariateStatistics. I have no  
> problem with
> the class being Storeless, which I think means immutable.

Not so: e.g. in "Mean", the "moment" field is
  * "protected",
  * not "final", and
  * its class ("FirstMoment") contains non-final fields.
We can't be farther from immutability.

In the new API, "protected" should be avoided unless there is a
good (documented) reason. ;-)

> But, in that case
> the methods might as well be static.  However the old grammar of
>
> new Mean().evaluate()
>
> is only marginally different from what the static grammar would be:
>
> Mean.evaluate

The latter is the right choice, given the semantics of the method.
However, if object creation is assumed to be cheap, the better
OO alternative is ti pass the data to a constructor and retrieve
the result:
   m = new Mean(data).get();

Then, given that the project is targetting Java 8, the new API
must take advantage of that.  Once it is "right", adapter can
be thought about to ease the transition of codes that used CM.
Concerning "evaluate", shouldn't it be based on streams?

Regards,
Gilles

>
> -Eric


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org