You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by Frank McQuillan <fm...@pivotal.io> on 2016/02/24 01:38:59 UTC
Chi squared independence test question
Afra Ahmad <af...@patientiq.io> &
Jimmy Skuros <ji...@patientiq.io> &
Matt Gitelis <ma...@patientiq.io>
sent me a question about Chi squared independence test question which I cut
and pasted below, along with my response:
"...regarding an issue we encountered with the "*Chi-squared independence
test*" in Madlib. We are huge fans of Madlib but are having trouble
implementing this one test. Can you please confirm that the documentation
below is correct (from most recent docs here:
http://doc.madlib.net/latest/group__grp__stats__tests.html)?
Also, what are we supposed to do to calculate the expected values? Any
pointers would be greatly appreciated!
Thanks,
Matt"
>From Frank:
"The MADlib software is correct, but just the docs are wrong. I already
fixed them and made a pull request. The JIRA is
https://issues.apache.org/jira/browse/MADLIB-895
The correct query for chi square independence test is attached
How to calculate expected value:
The Chi-squared independence test actually uses the Chi-squared
goodness-of-fit function.
The expected value needs to be computed in the SQL and passed
to the goodness-of-fit function. The expected value formula for MADlib is
computed as
sum of rows * sum of columns, for each element of the input matrix. For
e.g., expected value
for element (2,1) would be sum of row 2 * sum of column 1."