You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2016/02/24 01:10:18 UTC

[jira] [Commented] (MADLIB-895) Incorrect examples in hypothesis tests documentation

    [ https://issues.apache.org/jira/browse/MADLIB-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159924#comment-15159924 ] 

Frank McQuillan commented on MADLIB-895:
----------------------------------------

The SQL does have a couple mistakes in it, but the result in the current docs is OK.  The docs also need some clarification.

The Chi-squared independence test actually uses the Chi-squared goodness-of-fit function, 
as shown in the example below.  The expected value needs to be computed in the SQL and passed 
to the goodness-of-fit function.  The expected value formula for MADlib is computed as 
<em>sum of rows * sum of columns</em>, for each element of the input matrix.  For e.g., expected value for 
element (2,1) would be <em>sum of row 2 * sum of column 1</em>.  

> Incorrect examples in hypothesis tests documentation
> ----------------------------------------------------
>
>                 Key: MADLIB-895
>                 URL: https://issues.apache.org/jira/browse/MADLIB-895
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Inferential Statistics
>            Reporter: Rahul Iyer
>            Assignee: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.9
>
>
> The SQL and results for the example for Chi-2 tests is wrong. The documentation shows the result as 
> {code}
>      statistic     |       p_value        | df |       phi        | contingency_coef
>  ------------------+----------------------+----+------------------+-------------------
>   138.289841626008 | 2.32528678709871e-25 |  9 | 2.93991753313346 | 0.946730727519112
> {code}
> whereas it should be, 
> {code}
>     statistic     |       p_value        | df |       phi       | contingency_coef
> ------------------+----------------------+----+-----------------+-------------------
>  320.125868955635 | 1.39464882809491e-63 |  9 | 4.4730154045931 | 0.975909209031126
> (1 row)
> {code}
> The SQL also has a couple of errors and does not run as is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)