You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Andrew Schaumberg <sc...@gmail.com> on 2015/08/23 03:04:59 UTC

[math] chi-squared test default deg freedom should be n-2 not n-1?

Dear devs,

Chi-squared test takes degrees of freedom to be one fewer than the sample
count.
https://commons.apache.org/proper/commons-math/apidocs/src-html/org/apache/commons/math3/stat/inference/ChiSquareTest.html#line.154

Should be two fewer, right?
http://courses.wcupa.edu/rbove/Berenson/10th%20ed%20CD-ROM%20topics/section12_5.pdf

d.f. = k - p - 1
k is sample count
p is # parameters being estimated (typically 1)

One fewer is typically for T-test, I think.

Not a statistician, thanks for your time, sorry if this is the wrong agent,
first time here,
-A

Re: [math] chi-squared test default deg freedom should be n-2 not n-1?

Posted by Phil Steitz <ph...@gmail.com>.

On 8/22/15 6:04 PM, Andrew Schaumberg wrote:
> Dear devs,
>
> Chi-squared test takes degrees of freedom to be one fewer than the sample
> count.

You mean one fewer than the number of categories (the common length
of the expected and observed count arrays).
> https://commons.apache.org/proper/commons-math/apidocs/src-html/org/apache/commons/math3/stat/inference/ChiSquareTest.html#line.154
>
> Should be two fewer, right?
> http://courses.wcupa.edu/rbove/Berenson/10th%20ed%20CD-ROM%20topics/section12_5.pdf
>
> d.f. = k - p - 1
> k is sample count
> p is # parameters being estimated (typically 1)

If the expected counts are computed from the same underlying dataset
used to get the observed counts and they depend on parameters
estimated from the data, then you are correct, you need to reduce
the degrees of freedom by the number of parameters estimated from
the data.  The [math] code has no way of knowing where the expected
counts come from.  The test evaluates the null hypothesis stated in
the javadoc - that the observed counts conform to the (fixed)
distribution described by the expected counts.

To handle the case where the expected counts are derived from a
parametric distribution fit from the data, you need to use the
ChiSquare distribution directly to perform the test.  That is a
little inconvenient, so an enhancement request allowing the degrees
of freedom to be passed to the test makes sense.  Feel free to open
a JIRA with this request.
>
> One fewer is typically for T-test, I think.
>
> Not a statistician, thanks for your time, sorry if this is the wrong agent,
> first time here,

Thanks for the question.  These kinds of questions probably belong
on the user list in the future, though.

Welcome to Commons!

Phil
> -A
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org