You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@opennlp.apache.org by Jörn Kottmann <ko...@gmail.com> on 2011/08/10 11:23:51 UTC

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

On 8/10/11 2:10 AM, william.colen@gmail.com wrote:
> I think it would be much better, but we have different sample classes (one
> for each tool) and no common parent. As far as I can see there is no way to
> compare two samples without knowing the tool and it makes harder to
> implement the monitor. That is way I avoided using the sample itself and
> added 3 methods that covers different kinds of samples we have.

Ups, accidentally replied to the issues list.

You need to know the sample class, and since they do not have a common
parent you always need to write some custom code to extract the knowledge
from them. This code we have to write somewhere, now it is in the individual
evaluators, but it could also be moved to command line monitors.
Extracting this information in the evaluators itself, might be a bit 
easier since
it is going through the samples anyway.

So going down this road might be a bit more work, but to me it looks 
like the
solution is also much more useable.

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/11/11 7:19 PM, william.colen@gmail.com wrote:
> On Wed, Aug 10, 2011 at 6:23 AM, Jörn Kottmann<ko...@gmail.com>  wrote:
>
>> On 8/10/11 2:10 AM, william.colen@gmail.com wrote:
>>
>>> I think it would be much better, but we have different sample classes (one
>>> for each tool) and no common parent. As far as I can see there is no way
>>> to
>>> compare two samples without knowing the tool and it makes harder to
>>> implement the monitor. That is way I avoided using the sample itself and
>>> added 3 methods that covers different kinds of samples we have.
>>>
>> Ups, accidentally replied to the issues list.
>>
>> You need to know the sample class, and since they do not have a common
>> parent you always need to write some custom code to extract the knowledge
>> from them. This code we have to write somewhere, now it is in the
>> individual
>> evaluators, but it could also be moved to command line monitors.
>> Extracting this information in the evaluators itself, might be a bit easier
>> since
>> it is going through the samples anyway.
>>
>> So going down this road might be a bit more work, but to me it looks like
>> the
>> solution is also much more useable.
>>
> Maybe we can leave it to a major release and we will have more flexibility
> in what we can do. What do you think?

I can also help out here, and if we leave it for later we should maybe 
declare new API
as internal use only.
> Also to me it is more important to improve dictionary creation to avoid that
> errors like the one I was having, so I would choose to spend some efforts
> there instead of this. Is it OK?
>

I worked on making it fail fast, now the POSModel does the same check 
independent
of the constructor which was used to create it.

What do you have in mind to ease up dictionary creation?

I also would like to start making the first release candidate, it 
doesn't matter
if this change does not go into it, this way we could at least start 
testing everything
else.

What do you think?

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Mon, Aug 15, 2011 at 7:20 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/15/11 3:27 PM, Jörn Kottmann wrote:
>
>> I had something like this in mind
>> void misclassified(T gold, T predicted);
>> where T is the generics type for the Sample object.
>>
>> What do you think?
>>
>
> I just had a look at the current code, where we can pass
> the printErrors flag to the XYEvaluator classes.
>
> There we have a number of printError methods in the Evaluator class,
> which are called from the actual implementation from evaluateSample.
>
> I suggest that we create a new abstract EvalutationErrorPrinter class,
> which
> is then sub-classed by the individual error printer classes. In this
> individual
> classes we implement the above proposed misclassified (which needs to be
> part of an interface)
> and call the printError methods.
>
> Alternatively to EvalutationErrorPrinter we could also use static
> imports for the printError methods.
>
> I don't think it is a big change, because we already have all the
> implementations,
> we just need to move them a little around for the proposed error reporting
> API.
>
> Lets see how that could look for Name Finder:
>
> We define the MissclassifiedSampleListener interface,
> with one void missclassified(T reference, T prediction) method.
>
> Then the EvaluationErrorPrinter class, with all the printError methods.
>
> And we have also a NameEvaluationErrorListener class which extends
> EvalutationErrorPrinter
> and implements MissclassifiedSampleListener<**NameSample>.
> This class implements the missclassified(NameSample reference, NameSample
> prediction)
> method according to the interface, and can simply call one of the
> printError methods from it.
>

Thank you, Jörn. I am working on it now.

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/15/11 3:27 PM, Jörn Kottmann wrote:
> I had something like this in mind
> void misclassified(T gold, T predicted);
> where T is the generics type for the Sample object.
>
> What do you think? 

I just had a look at the current code, where we can pass
the printErrors flag to the XYEvaluator classes.

There we have a number of printError methods in the Evaluator class,
which are called from the actual implementation from evaluateSample.

I suggest that we create a new abstract EvalutationErrorPrinter class, which
is then sub-classed by the individual error printer classes. In this 
individual
classes we implement the above proposed misclassified (which needs to be 
part of an interface)
and call the printError methods.

Alternatively to EvalutationErrorPrinter we could also use static
imports for the printError methods.

I don't think it is a big change, because we already have all the 
implementations,
we just need to move them a little around for the proposed error 
reporting API.

Lets see how that could look for Name Finder:

We define the MissclassifiedSampleListener interface,
with one void missclassified(T reference, T prediction) method.

Then the EvaluationErrorPrinter class, with all the printError methods.

And we have also a NameEvaluationErrorListener class which extends 
EvalutationErrorPrinter
and implements MissclassifiedSampleListener<NameSample>.
This class implements the missclassified(NameSample reference, 
NameSample prediction)
method according to the interface, and can simply call one of the 
printError methods from it.

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/15/11 3:39 PM, william.colen@gmail.com wrote:
>> >  I had something like this in mind
>> >  void misclassified(T gold, T predicted);
>> >  where T is the generics type for the Sample object.
>> >
>> >  What do you think?
>> >
>> >  An implementation would then need to figure out the exact
>> >  differences it is interested in.
>> >
>> >  Jörn
>> >
> Should I modify the hierarchy of the sample classes?
>

If we can find a good base class it would make sense.
Most of the classes rely on a tokenized sentence, that could
be part of a base class, e.g. TokenizedSentenceSample.

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Mon, Aug 15, 2011 at 10:27 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/15/11 2:55 PM, william.colen@gmail.com wrote:
>
>> Thanks Jörn, I'm trying the suggested to improve my pos tagger.
>>
>> Now back to the misclassified report interface. I could not find a good
>> design for it because I could not take advantage of the sample classes, so
>> what I proposed was 3 methods to handle different methods:
>>
>> // for the sentence detector
>> void missclassified(Span references[], Span predictions[], String
>> referenceSample, String predictedSample, String sentence)
>>
>> // for namefinder, chunker...
>> void missclassified(Span references[], Span predictions[], String
>> referenceSample, String predictedSample, String[] sentenceTokens)
>>
>> // for pos tagger
>> void missclassified(String references[], String predictions[], String
>> referenceSample, String predictedSample, String[] sentenceTokens)
>>
>
> I had something like this in mind
> void misclassified(T gold, T predicted);
> where T is the generics type for the Sample object.
>
> What do you think?
>
> An implementation would then need to figure out the exact
> differences it is interested in.
>
> Jörn
>

Should I modify the hierarchy of the sample classes?

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/15/11 2:55 PM, william.colen@gmail.com wrote:
> Thanks Jörn, I'm trying the suggested to improve my pos tagger.
>
> Now back to the misclassified report interface. I could not find a good
> design for it because I could not take advantage of the sample classes, so
> what I proposed was 3 methods to handle different methods:
>
> // for the sentence detector
> void missclassified(Span references[], Span predictions[], String
> referenceSample, String predictedSample, String sentence)
>
> // for namefinder, chunker...
> void missclassified(Span references[], Span predictions[], String
> referenceSample, String predictedSample, String[] sentenceTokens)
>
> // for pos tagger
> void missclassified(String references[], String predictions[], String
> referenceSample, String predictedSample, String[] sentenceTokens)

I had something like this in mind
void misclassified(T gold, T predicted);
where T is the generics type for the Sample object.

What do you think?

An implementation would then need to figure out the exact
differences it is interested in.

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Fri, Aug 12, 2011 at 10:46 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/12/11 3:28 PM, william.colen@gmail.com wrote:
>
>> If you know the tags which are causing trouble you might just want to
>> remove
>>
>>> >  all
>>> >  tokens from your dictionary which contain them. Removing a few words
>>> will
>>> >  not
>>> >  make a big difference in accuracy anyway.
>>> >
>>>
>> Doing it during training is not a good idea? I thought it would help other
>> people.
>>
>>
>>
> No, I don't think so, because it makes it difficult to understand what
> is going on and with the current system you really need enough training
> data to cover all the tags.
> If one tag is only mentioned 5 or 6 times I doubt that an an accurate
> detection
> is possible.
>
> As said before it might be possible to create a POS Tagger which can deal
> better
> with less training data, but the one we have right now seems to have it
> limits when
> you want to use a tag dict.
>
> Jörn
>

Thanks Jörn, I'm trying the suggested to improve my pos tagger.

Now back to the misclassified report interface. I could not find a good
design for it because I could not take advantage of the sample classes, so
what I proposed was 3 methods to handle different methods:

// for the sentence detector
void missclassified(Span references[], Span predictions[], String
referenceSample, String predictedSample, String sentence)

// for namefinder, chunker...
void missclassified(Span references[], Span predictions[], String
referenceSample, String predictedSample, String[] sentenceTokens)

// for pos tagger
void missclassified(String references[], String predictions[], String
referenceSample, String predictedSample, String[] sentenceTokens)

Can you help me with a better design?

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/12/11 3:28 PM, william.colen@gmail.com wrote:
> If you know the tags which are causing trouble you might just want to remove
>> >  all
>> >  tokens from your dictionary which contain them. Removing a few words will
>> >  not
>> >  make a big difference in accuracy anyway.
>> >
> Doing it during training is not a good idea? I thought it would help other
> people.
>
>

No, I don't think so, because it makes it difficult to understand what
is going on and with the current system you really need enough training
data to cover all the tags.
If one tag is only mentioned 5 or 6 times I doubt that an an accurate 
detection
is possible.

As said before it might be possible to create a POS Tagger which can 
deal better
with less training data, but the one we have right now seems to have it 
limits when
you want to use a tag dict.

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Fri, Aug 12, 2011 at 8:04 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/12/11 12:53 PM, william.colen@gmail.com wrote:
>
>> Should I iterate over the training data or do it after model training? I
>> thought that not every tag would be in the outcome list because of the
>> cutoff. Also it would be difficult to preview which tags would be at the
>> outcome list while performing cross validation because we train with a
>> subset of the corpus.
>>
>
> Well there you got two points. You can try to use the perceptron, that is
> usually
> trained without a cutoff. Anyway that doesn't really help you for the cross
> validation.
> Maybe you can add a little training data to your corpus, so you are
> covering all tags?
>

That is a good idea, but I would have to strategically distribute the the
sentences around the corpus to make sure the training partition of cross
validation will use these sentences. I'll probably need to build a better
corpus anyway.

If you know the tags which are causing trouble you might just want to remove
> all
> tokens from your dictionary which contain them. Removing a few words will
> not
> make a big difference in accuracy anyway.
>

Doing it during training is not a good idea? I thought it would help other
people.


>
> Sorry for not having a better answer.
>
> Our current POS Tagger is completely statistical, to improve your situation
> we would
> need an hybrid approach, where we it can fallback to some rules in case the
> statistical
> decision is not plausible according to a tag dict, or other rules.
>
> We also had a user here, who wanted to define short sequences in a tag
> dict, to fix mistakes
> he observed in the output of the tagger.
>
> Maybe both things could be done for 1.6. What do you think?
>

Yes, an hybrid approach would add some flexibility. We can discuss it for
1.6.

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/12/11 12:53 PM, william.colen@gmail.com wrote:
> Should I iterate over the training data or do it after model training? I
> thought that not every tag would be in the outcome list because of the
> cutoff. Also it would be difficult to preview which tags would be at the
> outcome list while performing cross validation because we train with a
> subset of the corpus.

Well there you got two points. You can try to use the perceptron, that 
is usually
trained without a cutoff. Anyway that doesn't really help you for the 
cross validation.
Maybe you can add a little training data to your corpus, so you are 
covering all tags?

If you know the tags which are causing trouble you might just want to 
remove all
tokens from your dictionary which contain them. Removing a few words 
will not
make a big difference in accuracy anyway.

Sorry for not having a better answer.

Our current POS Tagger is completely statistical, to improve your 
situation we would
need an hybrid approach, where we it can fallback to some rules in case 
the statistical
decision is not plausible according to a tag dict, or other rules.

We also had a user here, who wanted to define short sequences in a tag 
dict, to fix mistakes
he observed in the output of the tagger.

Maybe both things could be done for 1.6. What do you think?

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Fri, Aug 12, 2011 at 6:18 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/12/11 4:25 AM, william.colen@gmail.com wrote:
>
>> If the text I am processing has any occurrence of a verb present second
>> person singular it will crash the tagger!
>>
>
> This should be fixed now, if there are any tags in the dict which are not
> maxent model outcomes, the model package validation code will fail to load
> it. So now it is at least fail fast.
>
>
>  To fix that I am thinking about optionally filter the dictionary entries
>> according to the known outcomes, that will be only available after having
>> the model trained by our training tool or by the cross validator. So after
>> training we could iterate over the entries and remove the tags that are
>> unknown by the model. But I am not sure if it is the best approach.
>>
> You can easily iterate over the training data, and create a set which
> contains
> all tags which are in the model and then use this set to create/filter your
> tag dict.
>

Should I iterate over the training data or do it after model training? I
thought that not every tag would be in the outcome list because of the
cutoff. Also it would be difficult to preview which tags would be at the
outcome list while performing cross validation because we train with a
subset of the corpus.

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/12/11 4:25 AM, william.colen@gmail.com wrote:
> If the text I am processing has any occurrence of a verb present second
> person singular it will crash the tagger!

This should be fixed now, if there are any tags in the dict which are not
maxent model outcomes, the model package validation code will fail to load
it. So now it is at least fail fast.

> To fix that I am thinking about optionally filter the dictionary entries
> according to the known outcomes, that will be only available after having
> the model trained by our training tool or by the cross validator. So after
> training we could iterate over the entries and remove the tags that are
> unknown by the model. But I am not sure if it is the best approach.
You can easily iterate over the training data, and create a set which 
contains
all tags which are in the model and then use this set to create/filter 
your tag dict.

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Thu, Aug 11, 2011 at 11:09 PM, James Kosin <ja...@gmail.com> wrote:

> On 8/11/2011 1:19 PM, william.colen@gmail.com wrote:
>
>> On Wed, Aug 10, 2011 at 6:23 AM, Jörn Kottmann<ko...@gmail.com>
>>  wrote:
>>
>>  On 8/10/11 2:10 AM, william.colen@gmail.com wrote:
>>>
>>>  I think it would be much better, but we have different sample classes
>>>> (one
>>>> for each tool) and no common parent. As far as I can see there is no way
>>>> to
>>>> compare two samples without knowing the tool and it makes harder to
>>>> implement the monitor. That is way I avoided using the sample itself and
>>>> added 3 methods that covers different kinds of samples we have.
>>>>
>>>>  Ups, accidentally replied to the issues list.
>>>
>>> You need to know the sample class, and since they do not have a common
>>> parent you always need to write some custom code to extract the knowledge
>>> from them. This code we have to write somewhere, now it is in the
>>> individual
>>> evaluators, but it could also be moved to command line monitors.
>>> Extracting this information in the evaluators itself, might be a bit
>>> easier
>>> since
>>> it is going through the samples anyway.
>>>
>>> So going down this road might be a bit more work, but to me it looks like
>>> the
>>> solution is also much more useable.
>>>
>>>  Maybe we can leave it to a major release and we will have more
>> flexibility
>> in what we can do. What do you think?
>> Also to me it is more important to improve dictionary creation to avoid
>> that
>> errors like the one I was having, so I would choose to spend some efforts
>> there instead of this. Is it OK?
>>
>>  William,
>
> If you could change the changes I've already made, I'd be very
> appreciative.  I'm going to try and expand the testing we are doing now on
> the dictionary; but, I'd like some real feedback if at all possible.
>

Thank you, James. I''ll be able to get back to the dictionary and tagger in
a couple of days.

The issues I have now is related to model outcomes and the tagset supported
by the dictionary.
If I use my full dictionary there will be words associated with tags that
are not in the model's outcome.

It happens when I am using a corpus that don't cover all the range of tags.
For example the 4k sentences news corpus I am using does not include any
occurrence of a verb present second person singular. That is because of the
journalistic style.

If the text I am processing has any occurrence of a verb present second
person singular it will crash the tagger!

To fix that I am thinking about optionally filter the dictionary entries
according to the known outcomes, that will be only available after having
the model trained by our training tool or by the cross validator. So after
training we could iterate over the entries and remove the tags that are
unknown by the model. But I am not sure if it is the best approach.

William

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by James Kosin <ja...@gmail.com>.

On 8/11/2011 1:19 PM, william.colen@gmail.com wrote:
> On Wed, Aug 10, 2011 at 6:23 AM, Jörn Kottmann<ko...@gmail.com>  wrote:
>
>> On 8/10/11 2:10 AM, william.colen@gmail.com wrote:
>>
>>> I think it would be much better, but we have different sample classes (one
>>> for each tool) and no common parent. As far as I can see there is no way
>>> to
>>> compare two samples without knowing the tool and it makes harder to
>>> implement the monitor. That is way I avoided using the sample itself and
>>> added 3 methods that covers different kinds of samples we have.
>>>
>> Ups, accidentally replied to the issues list.
>>
>> You need to know the sample class, and since they do not have a common
>> parent you always need to write some custom code to extract the knowledge
>> from them. This code we have to write somewhere, now it is in the
>> individual
>> evaluators, but it could also be moved to command line monitors.
>> Extracting this information in the evaluators itself, might be a bit easier
>> since
>> it is going through the samples anyway.
>>
>> So going down this road might be a bit more work, but to me it looks like
>> the
>> solution is also much more useable.
>>
> Maybe we can leave it to a major release and we will have more flexibility
> in what we can do. What do you think?
> Also to me it is more important to improve dictionary creation to avoid that
> errors like the one I was having, so I would choose to spend some efforts
> there instead of this. Is it OK?
>
William,

If you could change the changes I've already made, I'd be very 
appreciative.  I'm going to try and expand the testing we are doing now 
on the dictionary; but, I'd like some real feedback if at all possible.

James

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Wed, Aug 10, 2011 at 6:23 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/10/11 2:10 AM, william.colen@gmail.com wrote:
>
>> I think it would be much better, but we have different sample classes (one
>> for each tool) and no common parent. As far as I can see there is no way
>> to
>> compare two samples without knowing the tool and it makes harder to
>> implement the monitor. That is way I avoided using the sample itself and
>> added 3 methods that covers different kinds of samples we have.
>>
>
> Ups, accidentally replied to the issues list.
>
> You need to know the sample class, and since they do not have a common
> parent you always need to write some custom code to extract the knowledge
> from them. This code we have to write somewhere, now it is in the
> individual
> evaluators, but it could also be moved to command line monitors.
> Extracting this information in the evaluators itself, might be a bit easier
> since
> it is going through the samples anyway.
>
> So going down this road might be a bit more work, but to me it looks like
> the
> solution is also much more useable.
>

Maybe we can leave it to a major release and we will have more flexibility
in what we can do. What do you think?
Also to me it is more important to improve dictionary creation to avoid that
errors like the one I was having, so I would choose to spend some efforts
there instead of this. Is it OK?

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Wed, Aug 10, 2011 at 6:23 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/10/11 2:10 AM, william.colen@gmail.com wrote:
>
>> I think it would be much better, but we have different sample classes (one
>> for each tool) and no common parent. As far as I can see there is no way
>> to
>> compare two samples without knowing the tool and it makes harder to
>> implement the monitor. That is way I avoided using the sample itself and
>> added 3 methods that covers different kinds of samples we have.
>>
>
> Ups, accidentally replied to the issues list.
>
> You need to know the sample class, and since they do not have a common
> parent you always need to write some custom code to extract the knowledge
> from them. This code we have to write somewhere, now it is in the
> individual
> evaluators, but it could also be moved to command line monitors.
> Extracting this information in the evaluators itself, might be a bit easier
> since
> it is going through the samples anyway.
>
> So going down this road might be a bit more work, but to me it looks like
> the
> solution is also much more useable.
>

Maybe we can leave it to a major release and we will have more flexibility
in what we can do. What do you think?
Also to me it is more important to improve dictionary creation to avoid that
errors like the one I was having, so I would choose to spend some efforts
there instead of this. Is it OK?