You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by Frank McQuillan <fm...@pivotal.io> on 2016/02/24 01:38:59 UTC

Chi squared independence test question

Afra Ahmad <af...@patientiq.io> &
Jimmy Skuros <ji...@patientiq.io> &
Matt Gitelis <ma...@patientiq.io>

sent me a question about Chi squared independence test question which I cut
and pasted below, along with my response:

"...regarding an issue we encountered with the "*Chi-squared independence
test*" in Madlib.  We are huge fans of Madlib but are having trouble
implementing this one test.  Can you please confirm that the documentation
below is correct (from most recent docs here:
http://doc.madlib.net/latest/group__grp__stats__tests.html)?

Also, what are we supposed to do to calculate the expected values?  Any
pointers would be greatly appreciated!

Thanks,
Matt"

>From Frank:

"The MADlib software is correct, but just the docs are wrong.  I already
fixed them and made a pull request.  The JIRA is
https://issues.apache.org/jira/browse/MADLIB-895

The correct query for chi square independence test is attached

How to calculate expected value:

The Chi-squared independence test actually uses the Chi-squared
goodness-of-fit function.
The expected value needs to be computed in the SQL and passed
to the goodness-of-fit function. The expected value formula for MADlib is
computed as
sum of rows * sum of columns, for each element of the input matrix. For
e.g., expected value
for element (2,1) would be sum of row 2 * sum of column 1."