You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org> on 2012/10/10 15:23:04 UTC

[jira] [Created] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Radoslav Tsvetkov created MATH-878:
--------------------------------------

             Summary: G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
                 Key: MATH-878
                 URL: https://issues.apache.org/jira/browse/MATH-878
             Project: Commons Math
          Issue Type: New Feature
            Reporter: Radoslav Tsvetkov
             Fix For: 3.1


1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)

2. Reference: http://en.wikipedia.org/wiki/G-test

3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 

The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.

For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475237#comment-13475237 ] 

Ted Dunning commented on MATH-878:
----------------------------------

{quote}
2. I added rootLogLikelihoodRatio using your code from mahout. Could you help me with the rationale description comments. Unfortunately the quoted discussion is no longer available in internet. I'll be better perhaps add some info in-line in the comments.
{quote}
There is some more permanent discussion on the root LLR test here:

http://mail-archives.apache.org/mod_mbox/mahout-user/201001.mbox/%3Cc7d45fc71001121120r6b0482aat345014770ed32744@mail.gmail.com%3E

And see the response to Wataru's comment here:

http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html

If I can squeeze some time, I will write you some purpose-built rationale text, but you should be able to lift some of my other comments with small changes.
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484887#comment-13484887 ] 

Radoslav Tsvetkov edited comment on MATH-878 at 10/26/12 12:54 PM:
-------------------------------------------------------------------

I added most of the corrections.
* All corrections in the code part (no underscore in the names etc.)
* Comments should look better now. The condensed form of the linked content is added in the comments. (but the links are left)

For sure there are some more points left to beautify but with the time I guess they'll be perfected.


                
      was (Author: rtsvet):
    I added most corrections.
* All corrections in the code part (no underscore in the names etc.)
* comments should look better now. The condensed form of the linked content is added in the comments. 

For sure there are some more issues left to beautify.


                  
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, MATH-878_gTest_26102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Attachment:     (was: vcs-diff56368.patch)
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Attachment: vcs-diff56368.patch
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch, vcs-diff16294.patch, vcs-diff56368.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Attachment:     (was: vcs-diff16289.patch)
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Attachment: vcs-diff16294.patch
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474933#comment-13474933 ] 

Radoslav Tsvetkov edited comment on MATH-878 at 10/12/12 11:00 AM:
-------------------------------------------------------------------

Thanks Ted for your interest and quick comments. :)
I added rootLogLikelihoodRatio as proposed by you using your code from mahout. I kept the name as it is more commonly in use for this functionality.

On your comments:
1. Usually commons.math has more convenience methods. For example ChiSquare has much more. As I'm your opinion and allowed myself to provide less. Concerning gTestGoodnessOfFit - let not forget that majority of users are not interested at all at p-Values and G-values, all they want to know is: true or false (can they reject the null or not). ChiSquateTEst provides exactly the same functionality and it is in commons since 1.2 - so it seems a good thing.

2. I added rootLogLikelihoodRatio using your code from mahout. Could you help me with the rationale description comments. Unfortunately the quoted discussion is no longer available in internet. I'll be better perhaps add some info in-line in the comments.

3. The G-Tests are fully integrated in the commons TestUtils framework as all other ChiSquarem, Anova etc ... With this patch I added some more test cases.

On request.

Could you provide pls. some reference data for rootLogLikelihoodRatio test?

                
      was (Author: rtsvet):
    Added rootLogLikelihoodRatio as proposed by Ted Dunning and additional framework tests 
                  
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch, vcs-diff16294.patch, vcs-diff56368.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Gilles (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473263#comment-13473263 ] 

Gilles commented on MATH-878:
-----------------------------

Will you provide a patch?

                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Attachment: MATH-878_gTest_12102012.patch
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Attachment: MATH-878_gTest_26102012.patch

I added most corrections.
* All corrections in the code part (no underscore in the names etc.)
* comments should look better now. The condensed form of the linked content is added in the comments. 

For sure there are some more issues left to beautify.


                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, MATH-878_gTest_26102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Attachment:     (was: MATH-878_gTest.patch)
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Environment: Netbeans
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483005#comment-13483005 ] 

Radoslav Tsvetkov commented on MATH-878:
----------------------------------------

I'll add the changes in next 2 days.
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Gilles (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487709#comment-13487709 ] 

Gilles commented on MATH-878:
-----------------------------

bq. think it is not touching any old CM functionality and as such is no risk. Then why bother to keep GTest call away?

I don't understand what you mean. Making the commit as small as they is just a convenience for reviewing code, now and later. Anyways, Phil is taking care of this report; IIUC, there is no request that you modify your contribution at this point.

bq. If the slightly misleading name of the TestUtils.java is to be changed, than that should be done with a dedicated Issue.

Of course.

                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, MATH-878_gTest_26102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Attachment: vcs-diff16289.patch
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473979#comment-13473979 ] 

Radoslav Tsvetkov commented on MATH-878:
----------------------------------------

Tests are form Books on Biological Statistics
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Comment: was deleted

(was: Source Code, Test, Docs and Comments)
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Gilles (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486795#comment-13486795 ] 

Gilles commented on MATH-878:
-----------------------------

As I indicated, could you separate the introduction of the new functionality from calls to it in other parts of CM? The former is the subject of this feature request and should lead to the commit of files "GTest.java" and "GTestTest.java". The latter is the patch to "TestUtils" and "TestUtilsTest".

For new files it's fine to provide plain Java files.

Sorry for the pickyness; I was myself sometimes put off by such requirements but I must admit that they come handy when overviewing large chunks of unfamiliar code...

                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, MATH-878_gTest_26102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phil Steitz resolved MATH-878.
------------------------------

    Resolution: Fixed

In r1408172, I changed method names to match the conventions of the other classes in the inference package: g() returns the g stat, gTest does tests, etc.  I added G-test statistics to TestUtils in r1408173 and updated the user guide in r1408174.

Thanks again for the patch.
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, MATH-878_gTest_26102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473978#comment-13473978 ] 

Radoslav Tsvetkov edited comment on MATH-878 at 10/11/12 9:09 AM:
------------------------------------------------------------------

Some Typos corrected in the Comments and Docs
                
      was (Author: rtsvet):
    Code, Test and Comments and Docs
                  
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Gilles (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480696#comment-13480696 ] 

Gilles commented on MATH-878:
-----------------------------

About the patch, I'm only able to suggest "cosmetic" improvements:
* The comments are not easy to read:
** Enumerated lists should appear as such in the Javadoc (for developers' sake)
** <p>...</p> should be avoided as they almost never necessary (I use <br/> if a new paragraph is really needed.
** I prefer {@code ...} over <code>...</code> (whenever there are no other HTML tags inside them)
** Sometimes you use "The" as the first word of the description for "@param", sometimes not. (I prefer to always skip it).
** Often there is no text for <a> tags: the (sometimes ugly) link will appear in the processed apidocs. (And they are not closed with a </a>.)
** Comments should preferably end with a period (".").
** The Javadoc main description should always come before the various Javadoc tags (i.e. not partly before and partly after).
** I'm wary of the doc to contain links to non-widely used web sites (e.g. blogs and mailing lists archives). (AFAIK no other CM code does this so that it was never discussed whether this is allowed or not.) In any case, it would be fine to just provide an inline summary of the conclusions, and possibly provide the links on the bug tracking system's page of the issue.
** The "@version" tag should read "@version $Id$".
* About the code formatting:
** Some weird alignments.
** An empty constructor is no necessary.
** Redundant sets of parentheses.
** Writing a double constant as "0.0d" is redundant: Either "0.0" or "0d". (I prefer the latter).
And, in a statement where the doubles are already involved, "0" is even clearer (IMO).
** "gTestGoodnessOfFit_pValue" is not a "standard" method name because of the underscore.
** Missing "final" keywords.
** Additions to "TestUtils" might come as a separate patch (first introduced the new functionality, then use it in other parts of CM).
** Name of the unit test methods: the underscore should be removed.

Sorry for the long list, which is certainly the result of _our_ failure to agree on the need to provide comprehensive guidelines to formatting!

                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phil Steitz updated MATH-878:
-----------------------------

    Affects Version/s:     (was: 3.2)
                           (was: 4.0)
                           (was: 3.1)
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, MATH-878_gTest_26102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474336#comment-13474336 ] 

Ted Dunning commented on MATH-878:
----------------------------------

1. The array of convenience methods seems excessive.  For instance, all that 
{code}
boolean gTestGoodnessOfFit(final double[] expected, final long[] observed,
            final double alpha)
{code}
adds is a single comparison against alpha.

2. For all the vast number of convenience routines, you don't include the 2 x 2 test that returns a signed square root of G^2 where the sign indicates higher or lower frequency than expected.

3. This doesn't seem to integrate into the Commons Math framework for this kind of test.

                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Affects Version/s: 3.1
                       3.2
                       4.0
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474934#comment-13474934 ] 

Radoslav Tsvetkov commented on MATH-878:
----------------------------------------

Thanks Ted for your interest and quick comments. 

I added rootLogLikelihoodRatio as proposed by you using your code from 
mahout. I kept the name as it is more commonly in use for this 
functionality.
On your comments:

1. Usually commons.math has more convenience methods. For example 
ChiSquare has much more. As I'm your opinion and allowed myself to 
provide less. Concerning gTestGoodnessOfFit - let not forget that 
majority of users are not interested at all at p-Values and G-values, 
all they want to know is: true or false (can they reject the null or 
not). ChiSquateTEst provides exactly the same functionality and it is in
 commons since 1.2 - so it seems a good thing.

2. I added rootLogLikelihoodRatio using your code from mahout. Could 
you help me with the rationale description comments. Unfortunately the 
quoted discussion is no longer available in internet. I'll be better 
perhaps add some info in-line in the comments.

3. The G-Tests are fully integrated in the commons TestUtils 
framework as all other ChiSquarem, Anova etc ... With this patch I added some more test cases.
On request.
Could you provide pls. some reference data for rootLogLikelihoodRatio test?





                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch, vcs-diff16294.patch, vcs-diff56368.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Attachment:     (was: MATH-878_gTest_15102012.patch)
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490276#comment-13490276 ] 

Phil Steitz commented on MATH-878:
----------------------------------

Implementation code committed in r1405620.

I made no material changes - just javadoc, making a few variables final that could be final and incorporating the MATH-885 changes (externalizing array argument checks)  I also added a few more tests.

I am still working on the TestUtils changes.  Name change there will have to wait until 4.0 if we decide to do it.  I am ambivalent, as the package name .inference is what you would end up logically adding - i.e., InferenceTestUtils - but that would be redundant.  I will add a reference to Ted's paper and other discussion in the User Guide.

I am also wondering whether it may be better to make the entropy methods public and move them to StatUtils.
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, MATH-878_gTest_26102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486940#comment-13486940 ] 

Phil Steitz commented on MATH-878:
----------------------------------

I have this just about ready to be committed. I don't understand exactly what you mean by separating the calls in, Gilles? TestUtils is just a container for static convenience methods executing significance tests.  It is appropriate to include GTests there.  Do you mean these should be done in two separate commits?  I guess that is OK, but both should be associated with this issue.  I can commit the code in two commits, if that is what you mean.  
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, MATH-878_gTest_26102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474933#comment-13474933 ] 

Radoslav Tsvetkov commented on MATH-878:
----------------------------------------

Added rootLogLikelihoodRatio as proposed by Ted Dunning and additional framework tests 
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch, vcs-diff16294.patch, vcs-diff56368.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Comment: was deleted

(was: Thanks Ted for your interest and quick comments. 

I added rootLogLikelihoodRatio as proposed by you using your code from 
mahout. I kept the name as it is more commonly in use for this 
functionality.
On your comments:

1. Usually commons.math has more convenience methods. For example 
ChiSquare has much more. As I'm your opinion and allowed myself to 
provide less. Concerning gTestGoodnessOfFit - let not forget that 
majority of users are not interested at all at p-Values and G-values, 
all they want to know is: true or false (can they reject the null or 
not). ChiSquateTEst provides exactly the same functionality and it is in
 commons since 1.2 - so it seems a good thing.

2. I added rootLogLikelihoodRatio using your code from mahout. Could 
you help me with the rationale description comments. Unfortunately the 
quoted discussion is no longer available in internet. I'll be better 
perhaps add some info in-line in the comments.

3. The G-Tests are fully integrated in the commons TestUtils 
framework as all other ChiSquarem, Anova etc ... With this patch I added some more test cases.
On request.
Could you provide pls. some reference data for rootLogLikelihoodRatio test?




)
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Attachment: MATH-878_gTest_15102012.patch

Hi Ted, 
Signed Root LLR is a really good idea! I added the test (also in TestUtilsTest)

And some comment on signed rLLR:
In some cases of unexpectedly small similar p1 and p2 values 
     * or large anomalies of k11, ... counts it is desired to 
     * get additional information on the rate trough signed root LLR.
     * 
     * Signed root LLR has two advantages over the basic LLR: 
     * a) it is positive where k11 is bigger than expected, negative where it is 
     * lower.  This resolves your current problem. 
     * b) if there is no difference it is asymptotically normally distributed. 
     * This allows people to talk about "number of standard deviations" which is a 
     * more common frame of reference than the chi^2 distribution.
     * 
     * See Discussions at: ....
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473266#comment-13473266 ] 

Radoslav Tsvetkov edited comment on MATH-878 at 10/10/12 2:51 PM:
------------------------------------------------------------------

See the attachment: 

Source Code + Java Doc + Tests

Works fine at my place  although I already noticed a small typo in the comments ;) What it the JIRA URL to use in the Netbeans plugin?


                
      was (Author: rtsvet):
    See the attachment: 

Source Code + Java Doc + Tests

Works fine at my place :-)


                  
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473977#comment-13473977 ] 

Radoslav Tsvetkov commented on MATH-878:
----------------------------------------

Source Code, Test, Docs and Comments
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Attachment: MATH-878_gTest.patch

Source Code + Java Doc + Tests
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476015#comment-13476015 ] 

Radoslav Tsvetkov edited comment on MATH-878 at 10/15/12 8:14 AM:
------------------------------------------------------------------

Hi Ted, 
Signed Root LLR is a really good idea! I added the test (also in TestUtilsTest)

And some comment on signed rLLR:

In some cases of unexpectedly small similar p1 and p2 values 
or large anomalies of k11, ... counts it is desired to 
get additional information on the rate trough signed root LLR.
 
Signed root LLR has two advantages over the basic LLR: 

a) it is positive where k11 is bigger than expected, negative where it is 
lower.  This resolves your current problem. 

b) if there is no difference it is asymptotically normally distributed. 
This allows people to talk about "number of standard deviations" which is a 
more common frame of reference than the chi^2 distribution.
      
See Discussions at: ....
                
      was (Author: rtsvet):
    Hi Ted, 
Signed Root LLR is a really good idea! I added the test (also in TestUtilsTest)

And some comment on signed rLLR:
In some cases of unexpectedly small similar p1 and p2 values 
     * or large anomalies of k11, ... counts it is desired to 
     * get additional information on the rate trough signed root LLR.
     * 
     * Signed root LLR has two advantages over the basic LLR: 
     * a) it is positive where k11 is bigger than expected, negative where it is 
     * lower.  This resolves your current problem. 
     * b) if there is no difference it is asymptotically normally distributed. 
     * This allows people to talk about "number of standard deviations" which is a 
     * more common frame of reference than the chi^2 distribution.
     * 
     * See Discussions at: ....
                  
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487684#comment-13487684 ] 

Radoslav Tsvetkov commented on MATH-878:
----------------------------------------

What I can do is to add the constructor to GTest and remove the calls from TestUtils ... But, when further pound on the topic, 
I think it is not touching any old CM functionality and as such is no risk. Then why bother to keep GTest call away?

Additionally: If the slightly misleading name of the TestUtils.java is to be changed, than that should be done with a dedicated Issue. (with it's own description)
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, MATH-878_gTest_26102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473266#comment-13473266 ] 

Radoslav Tsvetkov edited comment on MATH-878 at 10/10/12 2:43 PM:
------------------------------------------------------------------

See the attachment: 

Source Code + Java Doc + Tests

Works fine at my place :-)


                
      was (Author: rtsvet):
    See the attachment: 
Source Code + Java Doc + Tests

Works fine at my place :)
                  
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473266#comment-13473266 ] 

Radoslav Tsvetkov edited comment on MATH-878 at 10/10/12 3:35 PM:
------------------------------------------------------------------

See the attachment: 

Source Code + Java Doc + Tests

Works fine at my place  although I already noticed a small typo in the comments ;)

                
      was (Author: rtsvet):
    See the attachment: 

Source Code + Java Doc + Tests

Works fine at my place  although I already noticed a small typo in the comments ;) What it the JIRA URL to use in the Netbeans plugin?


                  
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487054#comment-13487054 ] 

Radoslav Tsvetkov commented on MATH-878:
----------------------------------------

1. I'll implement it as a independent functionality it is correct to wait till it is "stable"
2. It.'s very good idea to change the name of the class to Statistic...
:)
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, MATH-878_gTest_26102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473978#comment-13473978 ] 

Radoslav Tsvetkov commented on MATH-878:
----------------------------------------

Code, Test and Comments and Docs
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475216#comment-13475216 ] 

Ted Dunning commented on MATH-878:
----------------------------------

{quote}
Could you provide pls. some reference data for rootLogLikelihoodRatio test?
{quote}
>From Mahout (with a few extras added just now)
{code}
  @Test
  public void testRootLogLikelihood() {
    // positive where k11 is bigger than expected.
    assertTrue(LogLikelihood.rootLogLikelihoodRatio(904, 21060, 1144, 283012) > 0.0);

    // negative because k11 is lower than expected
    assertTrue(LogLikelihood.rootLogLikelihoodRatio(36, 21928, 60280, 623876) < 0.0);

    assertEquals(Math.sqrt(2.772589), LogLikelihood.rootLogLikelihoodRatio(1, 0, 0, 1), 0.000001);
    assertEquals(-Math.sqrt(2.772589), LogLikelihood.rootLogLikelihoodRatio(0, 1, 1, 0), 0.000001);
    assertEquals(Math.sqrt(27.72589), LogLikelihood.rootLogLikelihoodRatio(10, 0, 0, 10), 0.00001);

    assertEquals(Math.sqrt(39.33052), LogLikelihood.rootLogLikelihoodRatio(5, 1995, 0, 100000), 0.00001);
    assertEquals(-Math.sqrt(39.33052), LogLikelihood.rootLogLikelihoodRatio(0, 100000, 5, 1995), 0.00001);

    assertEquals(Math.sqrt(4730.737), LogLikelihood.rootLogLikelihoodRatio(1000, 1995, 1000, 100000), 0.001);
    assertEquals(-Math.sqrt(4730.737), LogLikelihood.rootLogLikelihoodRatio(1000, 100000, 1000, 1995), 0.001);

    assertEquals(Math.sqrt(5734.343), LogLikelihood.rootLogLikelihoodRatio(1000, 1000, 1000, 100000), 0.001);
    assertEquals(Math.sqrt(5714.932), LogLikelihood.rootLogLikelihoodRatio(1000, 1000, 1000, 99000), 0.001);
  }
{code}
                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radoslav Tsvetkov updated MATH-878:
-----------------------------------

    Attachment: MATH-878_gTest_15102012.patch
    
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Radoslav Tsvetkov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473266#comment-13473266 ] 

Radoslav Tsvetkov edited comment on MATH-878 at 10/10/12 2:42 PM:
------------------------------------------------------------------

See the attachment: 
Source Code + Java Doc + Tests

Works fine at my place :)
                
      was (Author: rtsvet):
    Source Code + Java Doc + Tests
                  
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MATH-878) G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference

Posted by "Gilles (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486971#comment-13486971 ] 

Gilles commented on MATH-878:
-----------------------------

In fact, I thought (misread) that it was the "TestUtils" class from the "test" part of the repository.
We might think of making that name (from "main" part, I mean) less ambiguous (e.g. something like "StatisticTestsUtils").

Still, two commits are always clearer, if they can be made independently. No? ;)
If you find that one is fine, no problem for me.

                
> G-Test (Log-Likelihood ratio - LLR test) in math.stat.inference
> ---------------------------------------------------------------
>
>                 Key: MATH-878
>                 URL: https://issues.apache.org/jira/browse/MATH-878
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.1, 3.2, 4.0
>         Environment: Netbeans
>            Reporter: Radoslav Tsvetkov
>              Labels: features, test
>             Fix For: 3.1
>
>         Attachments: MATH-878_gTest_12102012.patch, MATH-878_gTest_15102012.patch, MATH-878_gTest_26102012.patch, vcs-diff16294.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> 1. Implementation of G-Test (Log-Likelihood ratio LLR test for independence and goodnes-of-fit)
> 2. Reference: http://en.wikipedia.org/wiki/G-test
> 3. Reasons-Usefulness: G-tests are tests are increasingly being used in situations where chi-squared tests were previously recommended. 
> The approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson chi-squared tests. In cases where Observed >2*Expected for some cell case, the G-test is always better than the chi-squared test.
> For testing goodness-of-fit the G-test is infinitely more efficient than the chi squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodge and Lehman. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira