You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by "Matthias Hummel (JIRA)" <ji...@apache.org> on 2006/10/16 17:38:34 UTC

[jira] Created: (MATH-160) Chi-Square Test for Comparing two binned Data Sets

Chi-Square Test for Comparing two binned Data Sets
--------------------------------------------------

                 Key: MATH-160
                 URL: http://issues.apache.org/jira/browse/MATH-160
             Project: Commons Math
          Issue Type: New Feature
            Reporter: Matthias Hummel
            Priority: Minor


Current Chi-Square test implementation only supports standard Chi-Square testing with respect to known distribution. We needed testing for comparison of two sample data sets where the distribution can be unknown. For this case the Chi-Square test has to be computed in a different way so that both error contributions (one for each sample data set) are taken into account. See Press et. al, Numerical Recipes, Second Edition, formula 14.3.2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Updated: (MATH-160) Chi-Square Test for Comparing two binned Data Sets

Posted by "Matthias Hummel (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/MATH-160?page=all ]

Matthias Hummel updated MATH-160:
---------------------------------

    Attachment: commons-math.patch

Diff against SVN Revision 463286

> Chi-Square Test for Comparing two binned Data Sets
> --------------------------------------------------
>
>                 Key: MATH-160
>                 URL: http://issues.apache.org/jira/browse/MATH-160
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Matthias Hummel
>            Priority: Minor
>         Attachments: commons-math.patch
>
>
> Current Chi-Square test implementation only supports standard Chi-Square testing with respect to known distribution. We needed testing for comparison of two sample data sets where the distribution can be unknown. For this case the Chi-Square test has to be computed in a different way so that both error contributions (one for each sample data set) are taken into account. See Press et. al, Numerical Recipes, Second Edition, formula 14.3.2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Commented: (MATH-160) Chi-Square Test for Comparing two binned Data Sets

Posted by "Luc Maisonobe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511356 ] 

Luc Maisonobe commented on MATH-160:
------------------------------------

The applied fix added new public methods to the interface. This is considered an incompatible API change by the clirr maven plugin which now fails when comparing with version 1.1.
Should the next version been bumped to 2.0 ? Previous discussions on the version numbering missed the point with this issue.

> Chi-Square Test for Comparing two binned Data Sets
> --------------------------------------------------
>
>                 Key: MATH-160
>                 URL: https://issues.apache.org/jira/browse/MATH-160
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Matthias Hummel
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: commons-math.patch
>
>
> Current Chi-Square test implementation only supports standard Chi-Square testing with respect to known distribution. We needed testing for comparison of two sample data sets where the distribution can be unknown. For this case the Chi-Square test has to be computed in a different way so that both error contributions (one for each sample data set) are taken into account. See Press et. al, Numerical Recipes, Second Edition, formula 14.3.2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Commented: (MATH-160) Chi-Square Test for Comparing two binned Data Sets

Posted by "Matthias Hummel (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/MATH-160?page=comments#action_12442869 ] 
            
Matthias Hummel commented on MATH-160:
--------------------------------------

There is no problem with the code included. It is not copied from Numerical Recipes, but was developed independently.
Nevertheless it is the only reference in English I know of that explains the mathematical background.


> Chi-Square Test for Comparing two binned Data Sets
> --------------------------------------------------
>
>                 Key: MATH-160
>                 URL: http://issues.apache.org/jira/browse/MATH-160
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Matthias Hummel
>            Priority: Minor
>         Attachments: commons-math.patch
>
>
> Current Chi-Square test implementation only supports standard Chi-Square testing with respect to known distribution. We needed testing for comparison of two sample data sets where the distribution can be unknown. For this case the Chi-Square test has to be computed in a different way so that both error contributions (one for each sample data set) are taken into account. See Press et. al, Numerical Recipes, Second Edition, formula 14.3.2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Commented: (MATH-160) Chi-Square Test for Comparing two binned Data Sets

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503276 ] 

Phil Steitz commented on MATH-160:
----------------------------------

With the reference in the last comment replacing the reference in the patch, this looks OK to me.   We also need test cases, ideally validated against R, another package or published results somewhere.  Patches welcome!

> Chi-Square Test for Comparing two binned Data Sets
> --------------------------------------------------
>
>                 Key: MATH-160
>                 URL: https://issues.apache.org/jira/browse/MATH-160
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Matthias Hummel
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: commons-math.patch
>
>
> Current Chi-Square test implementation only supports standard Chi-Square testing with respect to known distribution. We needed testing for comparison of two sample data sets where the distribution can be unknown. For this case the Chi-Square test has to be computed in a different way so that both error contributions (one for each sample data set) are taken into account. See Press et. al, Numerical Recipes, Second Edition, formula 14.3.2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Resolved: (MATH-160) Chi-Square Test for Comparing two binned Data Sets

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phil Steitz resolved MATH-160.
------------------------------

    Resolution: Fixed

Applied a modified version of the patch, along with test cases, verified against DATAPLOT
Modifications:
* Changed input array data type to long[].  This is consistent with other ChiSquare tests and with the specification of the test (i.e., it is not clear what floats as arguments would mean)
* Added weighting as specified in the NIST reference provided to adjust for possibly different bin sums for the two samples. 



> Chi-Square Test for Comparing two binned Data Sets
> --------------------------------------------------
>
>                 Key: MATH-160
>                 URL: https://issues.apache.org/jira/browse/MATH-160
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Matthias Hummel
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: commons-math.patch
>
>
> Current Chi-Square test implementation only supports standard Chi-Square testing with respect to known distribution. We needed testing for comparison of two sample data sets where the distribution can be unknown. For this case the Chi-Square test has to be computed in a different way so that both error contributions (one for each sample data set) are taken into account. See Press et. al, Numerical Recipes, Second Edition, formula 14.3.2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Commented: (MATH-160) Chi-Square Test for Comparing two binned Data Sets

Posted by "aeriform (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/MATH-160?page=comments#action_12445327 ] 
            
aeriform commented on MATH-160:
-------------------------------

Not sure if this would be good enough as I am not sure entirely what you need, but there is a reference to the normalized chi-squared in the following article on Issue 45 of Cytometry page 48:

http://www3.interscience.wiley.com/cgi-bin/fulltext/85011154/PDFSTART

Cytometry
ISSN: 1097-0320 (Online)
ISSN: 0196-4763 (Print)

Published 2001 Wiley-Liss, Inc.†
Cytometry 45:47-55 (2001)

Probability Binning Comparison: A Metric for Quantitating Multivariate Distribution Differences

"This work is a US government work, and as such, is in the public domain in the United States of America." (pg. 47)

Is a reference like this sufficient to develop code from?

> Chi-Square Test for Comparing two binned Data Sets
> --------------------------------------------------
>
>                 Key: MATH-160
>                 URL: http://issues.apache.org/jira/browse/MATH-160
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Matthias Hummel
>            Priority: Minor
>         Attachments: commons-math.patch
>
>
> Current Chi-Square test implementation only supports standard Chi-Square testing with respect to known distribution. We needed testing for comparison of two sample data sets where the distribution can be unknown. For this case the Chi-Square test has to be computed in a different way so that both error contributions (one for each sample data set) are taken into account. See Press et. al, Numerical Recipes, Second Edition, formula 14.3.2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Commented: (MATH-160) Chi-Square Test for Comparing two binned Data Sets

Posted by "Luc Maisonobe (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/MATH-160?page=comments#action_12442651 ] 
            
Luc Maisonobe commented on MATH-160:
------------------------------------

I'm affraid code from any of the Numerical Recipes book cannot be included in commons-math.
See the redistribution conditions in the NR site here: http://www.numerical-recipes.com/infotop.html#distinfo
If the code is a well known algorithm with public references independant from NR, then it is OK. But the comments in your patch directly references the NR book in C++.
Of course, this only my point of view, could anybody else give an advice on this topic ?

> Chi-Square Test for Comparing two binned Data Sets
> --------------------------------------------------
>
>                 Key: MATH-160
>                 URL: http://issues.apache.org/jira/browse/MATH-160
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Matthias Hummel
>            Priority: Minor
>         Attachments: commons-math.patch
>
>
> Current Chi-Square test implementation only supports standard Chi-Square testing with respect to known distribution. We needed testing for comparison of two sample data sets where the distribution can be unknown. For this case the Chi-Square test has to be computed in a different way so that both error contributions (one for each sample data set) are taken into account. See Press et. al, Numerical Recipes, Second Edition, formula 14.3.2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Updated: (MATH-160) Chi-Square Test for Comparing two binned Data Sets

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phil Steitz updated MATH-160:
-----------------------------

    Fix Version/s: 1.2

> Chi-Square Test for Comparing two binned Data Sets
> --------------------------------------------------
>
>                 Key: MATH-160
>                 URL: https://issues.apache.org/jira/browse/MATH-160
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Matthias Hummel
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: commons-math.patch
>
>
> Current Chi-Square test implementation only supports standard Chi-Square testing with respect to known distribution. We needed testing for comparison of two sample data sets where the distribution can be unknown. For this case the Chi-Square test has to be computed in a different way so that both error contributions (one for each sample data set) are taken into account. See Press et. al, Numerical Recipes, Second Edition, formula 14.3.2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Reopened: (MATH-160) Chi-Square Test for Comparing two binned Data Sets

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phil Steitz reopened MATH-160:
------------------------------


Good catch, Luc.  I thought clirr was set up to fail the build when this happens.  In any case, this needs to be fixed somehow.  Probably best to use a separate interface.

> Chi-Square Test for Comparing two binned Data Sets
> --------------------------------------------------
>
>                 Key: MATH-160
>                 URL: https://issues.apache.org/jira/browse/MATH-160
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Matthias Hummel
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: commons-math.patch
>
>
> Current Chi-Square test implementation only supports standard Chi-Square testing with respect to known distribution. We needed testing for comparison of two sample data sets where the distribution can be unknown. For this case the Chi-Square test has to be computed in a different way so that both error contributions (one for each sample data set) are taken into account. See Press et. al, Numerical Recipes, Second Edition, formula 14.3.2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org