You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Gilles Sadowski <gi...@harfang.homelinux.org> on 2012/10/20 12:58:43 UTC

[Math] MATH-878: Feature request with patch

Hello.

  https://issues.apache.org/jira/browse/MATH-878

Would someone well versed in statistics check that contribution?


Best regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-878: Feature request with patch

Posted by Ted Dunning <te...@gmail.com>.
On Sun, Nov 4, 2012 at 11:56 AM, Phil Steitz <ph...@gmail.com> wrote:

> 0) Did you or anyone else ever analyze the bigram data in the paper
> using Fisher's test stats?
>

That bigram data isn't particularly interesting; any text will show similar
effects.

Others have tested Fisher's exact test, but only a few cases turned up
where there was any mileage.  The cost of Fisher's test makes it much less
interesting for the text, genomic, classification and recommendation
applications of G^2.

1) Is the bigram data from [1] available anywhere?
>

I don't think so.  Any small technical text should exhibit similar
characteristics.

You can find more examples in my longer work on the subject:

http://arxiv.org/abs/1207.1847

Most of these examples are based on publicly available data.


>  1) Do you think a direct implementation of Fisher's test for 2x2
> designs and a monte carlo impl for r x c would be useful?  I have
> this in C from years ago and could translate it fairly easily.
>

I have no clue if people want this.   G^2 is pretty well entrenched in text
analysis and recommendations and there have been hundreds of citations to
my original paper, many of which replicated the value of the test.  As
such, I wouldn't expect a lot of value in those applications.

Other areas may well be a different story.  A fully featured implementation
of Fisher's exact test is pretty complex, however, since you have to take
such different tacks at different data scales and with differently shaped
tables.

Re: [Math] MATH-878: Feature request with patch

Posted by Phil Steitz <ph...@gmail.com>.
On 10/22/12 8:15 AM, Ted Dunning wrote:
> On Sun, Oct 21, 2012 at 11:34 PM, Phil Steitz <ph...@gmail.com> wrote:
>
>> On 10/21/12 11:25 PM, Ted Dunning wrote:
>>> What kind of check did you want?
>>>
>>> I checked the code by eye and supplied several test cases.  You might say
>>> that I am versed in statistics since I am the author of the major paper
>> on
>>> this test as applied to computational linguistics.
>> I was going to mention that :)
>>
>> Have you carefully reviewed the code?
>>
> I have pretty high confidence in it.  The algorithm is the simplest that I
> know (increases likelihood of correctness) and he seems to have
> incorporated my test cases.
>
>
>> Thanks in advance if you have time.  I will look at it as well soon
>> and take a stab at moving some of the reference material into the
>> javadoc.  Thanks in any case for helping move this along.
>>
> Thanks for that.
>
Sorry it took me so long to get this committed.  It took me longer
than I expected to get myself educated.  I got a lot out of [1] and
thank you for writing it, Ted.  The bigram example there very nicely
illustrates how ChiSquare stats can be misleading.  You mention at
the end that Fisher's exact test might also be used in these
situations.  I am curious about the following:

0) Did you or anyone else ever analyze the bigram data in the paper
using Fisher's test stats?
1) Is the bigram data from [1] available anywhere?
1) Do you think a direct implementation of Fisher's test for 2x2
designs and a monte carlo impl for r x c would be useful?  I have
this in C from years ago and could translate it fairly easily.

Phil

[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.5962

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-878: Feature request with patch

Posted by Ted Dunning <te...@gmail.com>.
On Sun, Oct 21, 2012 at 11:34 PM, Phil Steitz <ph...@gmail.com> wrote:

> On 10/21/12 11:25 PM, Ted Dunning wrote:
> > What kind of check did you want?
> >
> > I checked the code by eye and supplied several test cases.  You might say
> > that I am versed in statistics since I am the author of the major paper
> on
> > this test as applied to computational linguistics.
>
> I was going to mention that :)
>
> Have you carefully reviewed the code?
>

I have pretty high confidence in it.  The algorithm is the simplest that I
know (increases likelihood of correctness) and he seems to have
incorporated my test cases.


> Thanks in advance if you have time.  I will look at it as well soon
> and take a stab at moving some of the reference material into the
> javadoc.  Thanks in any case for helping move this along.
>

Thanks for that.

Re: [Math] MATH-878: Feature request with patch

Posted by Phil Steitz <ph...@gmail.com>.
On 10/21/12 11:25 PM, Ted Dunning wrote:
> What kind of check did you want?
>
> I checked the code by eye and supplied several test cases.  You might say
> that I am versed in statistics since I am the author of the major paper on
> this test as applied to computational linguistics.

I was going to mention that :)

Have you carefully reviewed the code?

Thanks in advance if you have time.  I will look at it as well soon
and take a stab at moving some of the reference material into the
javadoc.  Thanks in any case for helping move this along.

Phil
>
> On Sun, Oct 21, 2012 at 11:07 PM, Phil Steitz <ph...@gmail.com> wrote:
>
>> On 10/20/12 3:58 AM, Gilles Sadowski wrote:
>>> Hello.
>>>
>>>   https://issues.apache.org/jira/browse/MATH-878
>>>
>>> Would someone well versed in statistics check that contribution?
>> I wanted to get to this this weekend, but was not able to.  I will
>> look at it as soon as I can get some free cycles.
>>
>> Phil
>>>
>>> Best regards,
>>> Gilles
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-878: Feature request with patch

Posted by Ted Dunning <te...@gmail.com>.
On Mon, Oct 22, 2012 at 4:20 AM, Gilles Sadowski <
gilles@harfang.homelinux.org> wrote:

> On Sun, Oct 21, 2012 at 11:25:08PM -0700, Ted Dunning wrote:
> > What kind of check did you want?
>
> Well, I'm seeking to know whether the code can be included in Commons
> Math's
> trunk.
>

Hard for me to say as I am usually out of step with c.m.


> Currently, the answer is a partial "no" (IMHO), because of the remarks
> which
> I formulated on the JIRA page.

[If it were only that, I would have corrected the formatting problems (to my
> taste).]
>

Fair.


> Thus: I'd like people to confirm that the code itself fits with the design
> of the "o.a.c.m.stat" package, and to take the responsibility for
> committing
> the patch (adapted to their taste!). :-)
>

I can't comment on the design.  Only on whether it seems to do what it says
it should.


>
> > I checked the code by eye and supplied several test cases.  You might say
> > that I am versed in statistics since I am the author of the major paper
> on
> > this test as applied to computational linguistics.
>
> Thank you for the _contents_ review. Sorry for the misunderstanding that I
> was talking more about the form.
>

I can't comment on the form.

Re: [Math] MATH-878: Feature request with patch

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
On Sun, Oct 21, 2012 at 11:25:08PM -0700, Ted Dunning wrote:
> What kind of check did you want?

Well, I'm seeking to know whether the code can be included in Commons Math's
trunk.
Currently, the answer is a partial "no" (IMHO), because of the remarks which
I formulated on the JIRA page.
[If it were only that, I would have corrected the formatting problems (to my
taste).]

Thus: I'd like people to confirm that the code itself fits with the design
of the "o.a.c.m.stat" package, and to take the responsibility for committing
the patch (adapted to their taste!). :-)

> I checked the code by eye and supplied several test cases.  You might say
> that I am versed in statistics since I am the author of the major paper on
> this test as applied to computational linguistics.

Thank you for the _contents_ review. Sorry for the misunderstanding that I
was talking more about the form.


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] MATH-878: Feature request with patch

Posted by Ted Dunning <te...@gmail.com>.
What kind of check did you want?

I checked the code by eye and supplied several test cases.  You might say
that I am versed in statistics since I am the author of the major paper on
this test as applied to computational linguistics.

On Sun, Oct 21, 2012 at 11:07 PM, Phil Steitz <ph...@gmail.com> wrote:

> On 10/20/12 3:58 AM, Gilles Sadowski wrote:
> > Hello.
> >
> >   https://issues.apache.org/jira/browse/MATH-878
> >
> > Would someone well versed in statistics check that contribution?
>
> I wanted to get to this this weekend, but was not able to.  I will
> look at it as soon as I can get some free cycles.
>
> Phil
> >
> >
> > Best regards,
> > Gilles
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [Math] MATH-878: Feature request with patch

Posted by Phil Steitz <ph...@gmail.com>.
On 10/20/12 3:58 AM, Gilles Sadowski wrote:
> Hello.
>
>   https://issues.apache.org/jira/browse/MATH-878
>
> Would someone well versed in statistics check that contribution?

I wanted to get to this this weekend, but was not able to.  I will
look at it as soon as I can get some free cycles.

Phil
>
>
> Best regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org