You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@opennlp.apache.org by "William Colen (JIRA)" <ji...@apache.org> on 2011/07/14 17:41:00 UTC

[jira] [Created] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Evaluators should allow tools to register a misclassified report interface
--------------------------------------------------------------------------

                 Key: OPENNLP-226
                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
             Project: OpenNLP
          Issue Type: New Feature
          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
    Affects Versions: tools-1.5.2-incubating
            Reporter: William Colen
            Assignee: William Colen
            Priority: Minor
             Fix For: tools-1.5.2-incubating


OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081515#comment-13081515 ] 

Jörn Kottmann commented on OPENNLP-226:
---------------------------------------

Should this go into 1.5.2, or should we defer it?

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "William Colen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081604#comment-13081604 ] 

William Colen commented on OPENNLP-226:
---------------------------------------

I will do it now. It should be easy. If we don't do that now we will have to deprecate the Evaluator constructor that takes the boolean printErrors in the future.

I will add the interface opennlp.tools.util.eval.EvaluationMonitor that has the method:
void printMissclassified(String)

I will create the default implementation opennlp.tools.util.eval.DefaultEvaluationMonitor (or should it go in cmdline package?)

Also will replace the Evaluator constructor that takes the boolean printErrors with one that takes an EvaluationMonitor.

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "William Colen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085910#comment-13085910 ] 

William Colen commented on OPENNLP-226:
---------------------------------------

Ops, sorry! It wasn't clear.
What I tried to say is that we should let the FMeasure.updateScores return boolean. The Evaluator.evaluateSample checks it while processing the sample and notifies the listeners.

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081611#comment-13081611 ] 

Jörn Kottmann commented on OPENNLP-226:
---------------------------------------

The proposed EvaluationMonitor interface makes it difficult for other use cases to track what was changed. To "understand" the change an implementor would need to parse the provided string.

I think we should make interfaces which can report the individual errors in a more structured way, it will be easy to make one for the F-Measure classes, and for the cases where we use an accuracy score.

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "William Colen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085937#comment-13085937 ] 

William Colen commented on OPENNLP-226:
---------------------------------------

That was exactly mine initial approach, but I notice that some sample implementations does not override equals.
Maybe I should implement the equals at XYSample if it is missing.

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "William Colen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085876#comment-13085876 ] 

William Colen commented on OPENNLP-226:
---------------------------------------

As discussed in the dev list I will implement the following:

- We define the MissclassifiedSampleListener interface, with one void missclassified(T reference, T prediction) method. 
Implementations of this interface will be passed to our evaluators (for example TokenNameFinderEvaluator), that will use it to notify when a prediction fails.

- We will have one default MissclassifiedSampleListener implementation for each tool, for example NameEvaluationErrorListener. The default implementation will print errors to System.out. The default implementation will be used by our command line interface.
- We will move the printErrors methods the util class EvaluationErrorPrinter.

Other things:
we need to know when to notify the listener while executing Evaluator.evaluateSample(..) method. There we are already using the FMeasure class, that can easily check if there is an error in the prediction. Maybe we should simply change the return type of method updateScores from void to boolean, that would return true if there is an error and we should notify the listener.

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (OPENNLP-226) Evaluators should allow tools to register a report interface

Posted by "William Colen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Colen closed OPENNLP-226.
---------------------------------

    Resolution: Fixed

Work is finished for this release. Later we will improve it by refactoring the Cross Validator classes.

> Evaluators should allow tools to register a report interface
> ------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085934#comment-13085934 ] 

Jörn Kottmann commented on OPENNLP-226:
---------------------------------------

I would do it like this:
if (this.sampleListener != null) { 
      NameSample predicted = new NameSample(sentence, predictedNames, 
          reference.isClearAdaptiveDataSet());
      if (!reference.equals(predicted)) {
            this.sampleListener.missclassified(reference, predicted);
      }
}

Than it is easier to move it up to Evaluator later.

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (OPENNLP-226) Evaluators should allow tools to register a report interface

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jörn Kottmann updated OPENNLP-226:
----------------------------------

    Summary: Evaluators should allow tools to register a report interface  (was: Evaluators should allow tools to register a misclassified report interface)

> Evaluators should allow tools to register a report interface
> ------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "William Colen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087050#comment-13087050 ] 

William Colen commented on OPENNLP-226:
---------------------------------------

The revision #1159270 modified the listeners. Now it is more general.

- renamed the listener to EvaluationSampleListener and add two new methods. The method list would be:

  void correctlyClassified(T reference, T prediction);
  void missclassified(T reference, T prediction);

- The EvaluationErrorPrinter now is abstract and implements the EvaluationSampleListener, adding a default implementation for correctlyClassified. The missclassified should be implemented by the subclass, like ChunkEvaluationErrorListener.

The abstract class Evaluator will now has the following new methods:

  void notifyCorrectlyClassified(T reference, T prediction);
  void notifyMissclassified(T reference, T prediction);
  void addListener(EvaluationSampleListener<T> listener);
  void removeListener(EvaluationSampleListener<T> listener);
  T processSample

The method processSample should be implemented by subclasses. In the future it will be made abstract. The default implementation of evaluateSample calls the notifyCorrectlyClassified and notifyMissclassified according to the result of processSample.

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085913#comment-13085913 ] 

Jörn Kottmann commented on OPENNLP-226:
---------------------------------------

Lets look at the name fidner again. In the TokenNameFinderEvalutator.evaluateSample method we construct a NameSample object for the prediction, the gold/reference one already exists.

A simple equals should now be able to figure out if there is a difference between the two, if so, it must be because of a mistake, and then we call the listener interfaces misclassified method.

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "William Colen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085926#comment-13085926 ] 

William Colen commented on OPENNLP-226:
---------------------------------------

Yes, thank you! I couldn't see that it was that simple. It should be something like this:

  public void evaluateSample(NameSample reference) {

    String[] sentence = reference.getSentence();
    Span predictedNames[] = nameFinder.find(sentence);
    Span references[] = reference.getNames();
    fmeasure.updateScores(references, predictedNames);

    if (this.sampleListener != null && !Arrays.equals(references, predictedNames)) {
      NameSample predicted = new NameSample(sentence, predictedNames,
          reference.isClearAdaptiveDataSet());
      this.sampleListener.missclassified(reference, predicted);
    }
  }

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085907#comment-13085907 ] 

Jörn Kottmann commented on OPENNLP-226:
---------------------------------------

+1 to let evaluateSample return a boolean. Its not a backward compatible change, but it will only hit people which have implemented the Evaluator interface. Code which is calling one of the evaluateSample methods will not break.

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/11/11 7:19 PM, william.colen@gmail.com wrote:
> On Wed, Aug 10, 2011 at 6:23 AM, Jörn Kottmann<ko...@gmail.com>  wrote:
>
>> On 8/10/11 2:10 AM, william.colen@gmail.com wrote:
>>
>>> I think it would be much better, but we have different sample classes (one
>>> for each tool) and no common parent. As far as I can see there is no way
>>> to
>>> compare two samples without knowing the tool and it makes harder to
>>> implement the monitor. That is way I avoided using the sample itself and
>>> added 3 methods that covers different kinds of samples we have.
>>>
>> Ups, accidentally replied to the issues list.
>>
>> You need to know the sample class, and since they do not have a common
>> parent you always need to write some custom code to extract the knowledge
>> from them. This code we have to write somewhere, now it is in the
>> individual
>> evaluators, but it could also be moved to command line monitors.
>> Extracting this information in the evaluators itself, might be a bit easier
>> since
>> it is going through the samples anyway.
>>
>> So going down this road might be a bit more work, but to me it looks like
>> the
>> solution is also much more useable.
>>
> Maybe we can leave it to a major release and we will have more flexibility
> in what we can do. What do you think?

I can also help out here, and if we leave it for later we should maybe 
declare new API
as internal use only.
> Also to me it is more important to improve dictionary creation to avoid that
> errors like the one I was having, so I would choose to spend some efforts
> there instead of this. Is it OK?
>

I worked on making it fail fast, now the POSModel does the same check 
independent
of the constructor which was used to create it.

What do you have in mind to ease up dictionary creation?

I also would like to start making the first release candidate, it 
doesn't matter
if this change does not go into it, this way we could at least start 
testing everything
else.

What do you think?

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Mon, Aug 15, 2011 at 7:20 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/15/11 3:27 PM, Jörn Kottmann wrote:
>
>> I had something like this in mind
>> void misclassified(T gold, T predicted);
>> where T is the generics type for the Sample object.
>>
>> What do you think?
>>
>
> I just had a look at the current code, where we can pass
> the printErrors flag to the XYEvaluator classes.
>
> There we have a number of printError methods in the Evaluator class,
> which are called from the actual implementation from evaluateSample.
>
> I suggest that we create a new abstract EvalutationErrorPrinter class,
> which
> is then sub-classed by the individual error printer classes. In this
> individual
> classes we implement the above proposed misclassified (which needs to be
> part of an interface)
> and call the printError methods.
>
> Alternatively to EvalutationErrorPrinter we could also use static
> imports for the printError methods.
>
> I don't think it is a big change, because we already have all the
> implementations,
> we just need to move them a little around for the proposed error reporting
> API.
>
> Lets see how that could look for Name Finder:
>
> We define the MissclassifiedSampleListener interface,
> with one void missclassified(T reference, T prediction) method.
>
> Then the EvaluationErrorPrinter class, with all the printError methods.
>
> And we have also a NameEvaluationErrorListener class which extends
> EvalutationErrorPrinter
> and implements MissclassifiedSampleListener<**NameSample>.
> This class implements the missclassified(NameSample reference, NameSample
> prediction)
> method according to the interface, and can simply call one of the
> printError methods from it.
>

Thank you, Jörn. I am working on it now.

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/15/11 3:27 PM, Jörn Kottmann wrote:
> I had something like this in mind
> void misclassified(T gold, T predicted);
> where T is the generics type for the Sample object.
>
> What do you think? 

I just had a look at the current code, where we can pass
the printErrors flag to the XYEvaluator classes.

There we have a number of printError methods in the Evaluator class,
which are called from the actual implementation from evaluateSample.

I suggest that we create a new abstract EvalutationErrorPrinter class, which
is then sub-classed by the individual error printer classes. In this 
individual
classes we implement the above proposed misclassified (which needs to be 
part of an interface)
and call the printError methods.

Alternatively to EvalutationErrorPrinter we could also use static
imports for the printError methods.

I don't think it is a big change, because we already have all the 
implementations,
we just need to move them a little around for the proposed error 
reporting API.

Lets see how that could look for Name Finder:

We define the MissclassifiedSampleListener interface,
with one void missclassified(T reference, T prediction) method.

Then the EvaluationErrorPrinter class, with all the printError methods.

And we have also a NameEvaluationErrorListener class which extends 
EvalutationErrorPrinter
and implements MissclassifiedSampleListener<NameSample>.
This class implements the missclassified(NameSample reference, 
NameSample prediction)
method according to the interface, and can simply call one of the 
printError methods from it.

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/15/11 3:39 PM, william.colen@gmail.com wrote:
>> >  I had something like this in mind
>> >  void misclassified(T gold, T predicted);
>> >  where T is the generics type for the Sample object.
>> >
>> >  What do you think?
>> >
>> >  An implementation would then need to figure out the exact
>> >  differences it is interested in.
>> >
>> >  Jörn
>> >
> Should I modify the hierarchy of the sample classes?
>

If we can find a good base class it would make sense.
Most of the classes rely on a tokenized sentence, that could
be part of a base class, e.g. TokenizedSentenceSample.

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Mon, Aug 15, 2011 at 10:27 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/15/11 2:55 PM, william.colen@gmail.com wrote:
>
>> Thanks Jörn, I'm trying the suggested to improve my pos tagger.
>>
>> Now back to the misclassified report interface. I could not find a good
>> design for it because I could not take advantage of the sample classes, so
>> what I proposed was 3 methods to handle different methods:
>>
>> // for the sentence detector
>> void missclassified(Span references[], Span predictions[], String
>> referenceSample, String predictedSample, String sentence)
>>
>> // for namefinder, chunker...
>> void missclassified(Span references[], Span predictions[], String
>> referenceSample, String predictedSample, String[] sentenceTokens)
>>
>> // for pos tagger
>> void missclassified(String references[], String predictions[], String
>> referenceSample, String predictedSample, String[] sentenceTokens)
>>
>
> I had something like this in mind
> void misclassified(T gold, T predicted);
> where T is the generics type for the Sample object.
>
> What do you think?
>
> An implementation would then need to figure out the exact
> differences it is interested in.
>
> Jörn
>

Should I modify the hierarchy of the sample classes?

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/15/11 2:55 PM, william.colen@gmail.com wrote:
> Thanks Jörn, I'm trying the suggested to improve my pos tagger.
>
> Now back to the misclassified report interface. I could not find a good
> design for it because I could not take advantage of the sample classes, so
> what I proposed was 3 methods to handle different methods:
>
> // for the sentence detector
> void missclassified(Span references[], Span predictions[], String
> referenceSample, String predictedSample, String sentence)
>
> // for namefinder, chunker...
> void missclassified(Span references[], Span predictions[], String
> referenceSample, String predictedSample, String[] sentenceTokens)
>
> // for pos tagger
> void missclassified(String references[], String predictions[], String
> referenceSample, String predictedSample, String[] sentenceTokens)

I had something like this in mind
void misclassified(T gold, T predicted);
where T is the generics type for the Sample object.

What do you think?

An implementation would then need to figure out the exact
differences it is interested in.

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Fri, Aug 12, 2011 at 10:46 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/12/11 3:28 PM, william.colen@gmail.com wrote:
>
>> If you know the tags which are causing trouble you might just want to
>> remove
>>
>>> >  all
>>> >  tokens from your dictionary which contain them. Removing a few words
>>> will
>>> >  not
>>> >  make a big difference in accuracy anyway.
>>> >
>>>
>> Doing it during training is not a good idea? I thought it would help other
>> people.
>>
>>
>>
> No, I don't think so, because it makes it difficult to understand what
> is going on and with the current system you really need enough training
> data to cover all the tags.
> If one tag is only mentioned 5 or 6 times I doubt that an an accurate
> detection
> is possible.
>
> As said before it might be possible to create a POS Tagger which can deal
> better
> with less training data, but the one we have right now seems to have it
> limits when
> you want to use a tag dict.
>
> Jörn
>

Thanks Jörn, I'm trying the suggested to improve my pos tagger.

Now back to the misclassified report interface. I could not find a good
design for it because I could not take advantage of the sample classes, so
what I proposed was 3 methods to handle different methods:

// for the sentence detector
void missclassified(Span references[], Span predictions[], String
referenceSample, String predictedSample, String sentence)

// for namefinder, chunker...
void missclassified(Span references[], Span predictions[], String
referenceSample, String predictedSample, String[] sentenceTokens)

// for pos tagger
void missclassified(String references[], String predictions[], String
referenceSample, String predictedSample, String[] sentenceTokens)

Can you help me with a better design?

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/12/11 3:28 PM, william.colen@gmail.com wrote:
> If you know the tags which are causing trouble you might just want to remove
>> >  all
>> >  tokens from your dictionary which contain them. Removing a few words will
>> >  not
>> >  make a big difference in accuracy anyway.
>> >
> Doing it during training is not a good idea? I thought it would help other
> people.
>
>

No, I don't think so, because it makes it difficult to understand what
is going on and with the current system you really need enough training
data to cover all the tags.
If one tag is only mentioned 5 or 6 times I doubt that an an accurate 
detection
is possible.

As said before it might be possible to create a POS Tagger which can 
deal better
with less training data, but the one we have right now seems to have it 
limits when
you want to use a tag dict.

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Fri, Aug 12, 2011 at 8:04 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/12/11 12:53 PM, william.colen@gmail.com wrote:
>
>> Should I iterate over the training data or do it after model training? I
>> thought that not every tag would be in the outcome list because of the
>> cutoff. Also it would be difficult to preview which tags would be at the
>> outcome list while performing cross validation because we train with a
>> subset of the corpus.
>>
>
> Well there you got two points. You can try to use the perceptron, that is
> usually
> trained without a cutoff. Anyway that doesn't really help you for the cross
> validation.
> Maybe you can add a little training data to your corpus, so you are
> covering all tags?
>

That is a good idea, but I would have to strategically distribute the the
sentences around the corpus to make sure the training partition of cross
validation will use these sentences. I'll probably need to build a better
corpus anyway.

If you know the tags which are causing trouble you might just want to remove
> all
> tokens from your dictionary which contain them. Removing a few words will
> not
> make a big difference in accuracy anyway.
>

Doing it during training is not a good idea? I thought it would help other
people.


>
> Sorry for not having a better answer.
>
> Our current POS Tagger is completely statistical, to improve your situation
> we would
> need an hybrid approach, where we it can fallback to some rules in case the
> statistical
> decision is not plausible according to a tag dict, or other rules.
>
> We also had a user here, who wanted to define short sequences in a tag
> dict, to fix mistakes
> he observed in the output of the tagger.
>
> Maybe both things could be done for 1.6. What do you think?
>

Yes, an hybrid approach would add some flexibility. We can discuss it for
1.6.

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/12/11 12:53 PM, william.colen@gmail.com wrote:
> Should I iterate over the training data or do it after model training? I
> thought that not every tag would be in the outcome list because of the
> cutoff. Also it would be difficult to preview which tags would be at the
> outcome list while performing cross validation because we train with a
> subset of the corpus.

Well there you got two points. You can try to use the perceptron, that 
is usually
trained without a cutoff. Anyway that doesn't really help you for the 
cross validation.
Maybe you can add a little training data to your corpus, so you are 
covering all tags?

If you know the tags which are causing trouble you might just want to 
remove all
tokens from your dictionary which contain them. Removing a few words 
will not
make a big difference in accuracy anyway.

Sorry for not having a better answer.

Our current POS Tagger is completely statistical, to improve your 
situation we would
need an hybrid approach, where we it can fallback to some rules in case 
the statistical
decision is not plausible according to a tag dict, or other rules.

We also had a user here, who wanted to define short sequences in a tag 
dict, to fix mistakes
he observed in the output of the tagger.

Maybe both things could be done for 1.6. What do you think?

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Fri, Aug 12, 2011 at 6:18 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/12/11 4:25 AM, william.colen@gmail.com wrote:
>
>> If the text I am processing has any occurrence of a verb present second
>> person singular it will crash the tagger!
>>
>
> This should be fixed now, if there are any tags in the dict which are not
> maxent model outcomes, the model package validation code will fail to load
> it. So now it is at least fail fast.
>
>
>  To fix that I am thinking about optionally filter the dictionary entries
>> according to the known outcomes, that will be only available after having
>> the model trained by our training tool or by the cross validator. So after
>> training we could iterate over the entries and remove the tags that are
>> unknown by the model. But I am not sure if it is the best approach.
>>
> You can easily iterate over the training data, and create a set which
> contains
> all tags which are in the model and then use this set to create/filter your
> tag dict.
>

Should I iterate over the training data or do it after model training? I
thought that not every tag would be in the outcome list because of the
cutoff. Also it would be difficult to preview which tags would be at the
outcome list while performing cross validation because we train with a
subset of the corpus.

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/12/11 4:25 AM, william.colen@gmail.com wrote:
> If the text I am processing has any occurrence of a verb present second
> person singular it will crash the tagger!

This should be fixed now, if there are any tags in the dict which are not
maxent model outcomes, the model package validation code will fail to load
it. So now it is at least fail fast.

> To fix that I am thinking about optionally filter the dictionary entries
> according to the known outcomes, that will be only available after having
> the model trained by our training tool or by the cross validator. So after
> training we could iterate over the entries and remove the tags that are
> unknown by the model. But I am not sure if it is the best approach.
You can easily iterate over the training data, and create a set which 
contains
all tags which are in the model and then use this set to create/filter 
your tag dict.

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Thu, Aug 11, 2011 at 11:09 PM, James Kosin <ja...@gmail.com> wrote:

> On 8/11/2011 1:19 PM, william.colen@gmail.com wrote:
>
>> On Wed, Aug 10, 2011 at 6:23 AM, Jörn Kottmann<ko...@gmail.com>
>>  wrote:
>>
>>  On 8/10/11 2:10 AM, william.colen@gmail.com wrote:
>>>
>>>  I think it would be much better, but we have different sample classes
>>>> (one
>>>> for each tool) and no common parent. As far as I can see there is no way
>>>> to
>>>> compare two samples without knowing the tool and it makes harder to
>>>> implement the monitor. That is way I avoided using the sample itself and
>>>> added 3 methods that covers different kinds of samples we have.
>>>>
>>>>  Ups, accidentally replied to the issues list.
>>>
>>> You need to know the sample class, and since they do not have a common
>>> parent you always need to write some custom code to extract the knowledge
>>> from them. This code we have to write somewhere, now it is in the
>>> individual
>>> evaluators, but it could also be moved to command line monitors.
>>> Extracting this information in the evaluators itself, might be a bit
>>> easier
>>> since
>>> it is going through the samples anyway.
>>>
>>> So going down this road might be a bit more work, but to me it looks like
>>> the
>>> solution is also much more useable.
>>>
>>>  Maybe we can leave it to a major release and we will have more
>> flexibility
>> in what we can do. What do you think?
>> Also to me it is more important to improve dictionary creation to avoid
>> that
>> errors like the one I was having, so I would choose to spend some efforts
>> there instead of this. Is it OK?
>>
>>  William,
>
> If you could change the changes I've already made, I'd be very
> appreciative.  I'm going to try and expand the testing we are doing now on
> the dictionary; but, I'd like some real feedback if at all possible.
>

Thank you, James. I''ll be able to get back to the dictionary and tagger in
a couple of days.

The issues I have now is related to model outcomes and the tagset supported
by the dictionary.
If I use my full dictionary there will be words associated with tags that
are not in the model's outcome.

It happens when I am using a corpus that don't cover all the range of tags.
For example the 4k sentences news corpus I am using does not include any
occurrence of a verb present second person singular. That is because of the
journalistic style.

If the text I am processing has any occurrence of a verb present second
person singular it will crash the tagger!

To fix that I am thinking about optionally filter the dictionary entries
according to the known outcomes, that will be only available after having
the model trained by our training tool or by the cross validator. So after
training we could iterate over the entries and remove the tags that are
unknown by the model. But I am not sure if it is the best approach.

William

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by James Kosin <ja...@gmail.com>.

On 8/11/2011 1:19 PM, william.colen@gmail.com wrote:
> On Wed, Aug 10, 2011 at 6:23 AM, Jörn Kottmann<ko...@gmail.com>  wrote:
>
>> On 8/10/11 2:10 AM, william.colen@gmail.com wrote:
>>
>>> I think it would be much better, but we have different sample classes (one
>>> for each tool) and no common parent. As far as I can see there is no way
>>> to
>>> compare two samples without knowing the tool and it makes harder to
>>> implement the monitor. That is way I avoided using the sample itself and
>>> added 3 methods that covers different kinds of samples we have.
>>>
>> Ups, accidentally replied to the issues list.
>>
>> You need to know the sample class, and since they do not have a common
>> parent you always need to write some custom code to extract the knowledge
>> from them. This code we have to write somewhere, now it is in the
>> individual
>> evaluators, but it could also be moved to command line monitors.
>> Extracting this information in the evaluators itself, might be a bit easier
>> since
>> it is going through the samples anyway.
>>
>> So going down this road might be a bit more work, but to me it looks like
>> the
>> solution is also much more useable.
>>
> Maybe we can leave it to a major release and we will have more flexibility
> in what we can do. What do you think?
> Also to me it is more important to improve dictionary creation to avoid that
> errors like the one I was having, so I would choose to spend some efforts
> there instead of this. Is it OK?
>
William,

If you could change the changes I've already made, I'd be very 
appreciative.  I'm going to try and expand the testing we are doing now 
on the dictionary; but, I'd like some real feedback if at all possible.

James

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Wed, Aug 10, 2011 at 6:23 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/10/11 2:10 AM, william.colen@gmail.com wrote:
>
>> I think it would be much better, but we have different sample classes (one
>> for each tool) and no common parent. As far as I can see there is no way
>> to
>> compare two samples without knowing the tool and it makes harder to
>> implement the monitor. That is way I avoided using the sample itself and
>> added 3 methods that covers different kinds of samples we have.
>>
>
> Ups, accidentally replied to the issues list.
>
> You need to know the sample class, and since they do not have a common
> parent you always need to write some custom code to extract the knowledge
> from them. This code we have to write somewhere, now it is in the
> individual
> evaluators, but it could also be moved to command line monitors.
> Extracting this information in the evaluators itself, might be a bit easier
> since
> it is going through the samples anyway.
>
> So going down this road might be a bit more work, but to me it looks like
> the
> solution is also much more useable.
>

Maybe we can leave it to a major release and we will have more flexibility
in what we can do. What do you think?
Also to me it is more important to improve dictionary creation to avoid that
errors like the one I was having, so I would choose to spend some efforts
there instead of this. Is it OK?

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Wed, Aug 10, 2011 at 6:23 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/10/11 2:10 AM, william.colen@gmail.com wrote:
>
>> I think it would be much better, but we have different sample classes (one
>> for each tool) and no common parent. As far as I can see there is no way
>> to
>> compare two samples without knowing the tool and it makes harder to
>> implement the monitor. That is way I avoided using the sample itself and
>> added 3 methods that covers different kinds of samples we have.
>>
>
> Ups, accidentally replied to the issues list.
>
> You need to know the sample class, and since they do not have a common
> parent you always need to write some custom code to extract the knowledge
> from them. This code we have to write somewhere, now it is in the
> individual
> evaluators, but it could also be moved to command line monitors.
> Extracting this information in the evaluators itself, might be a bit easier
> since
> it is going through the samples anyway.
>
> So going down this road might be a bit more work, but to me it looks like
> the
> solution is also much more useable.
>

Maybe we can leave it to a major release and we will have more flexibility
in what we can do. What do you think?
Also to me it is more important to improve dictionary creation to avoid that
errors like the one I was having, so I would choose to spend some efforts
there instead of this. Is it OK?

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/10/11 2:10 AM, william.colen@gmail.com wrote:
> I think it would be much better, but we have different sample classes (one
> for each tool) and no common parent. As far as I can see there is no way to
> compare two samples without knowing the tool and it makes harder to
> implement the monitor. That is way I avoided using the sample itself and
> added 3 methods that covers different kinds of samples we have.

Ups, accidentally replied to the issues list.

You need to know the sample class, and since they do not have a common
parent you always need to write some custom code to extract the knowledge
from them. This code we have to write somewhere, now it is in the individual
evaluators, but it could also be moved to command line monitors.
Extracting this information in the evaluators itself, might be a bit 
easier since
it is going through the samples anyway.

So going down this road might be a bit more work, but to me it looks 
like the
solution is also much more useable.

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/10/11 2:10 AM, william.colen@gmail.com wrote:
> I think it would be much better, but we have different sample classes (one
> for each tool) and no common parent. As far as I can see there is no way to
> compare two samples without knowing the tool and it makes harder to
> implement the monitor. That is way I avoided using the sample itself and
> added 3 methods that covers different kinds of samples we have.

Ups, accidentally replied to the issues list.

You need to know the sample class, and since they do not have a common
parent you always need to write some custom code to extract the knowledge
from them. This code we have to write somewhere, now it is in the individual
evaluators, but it could also be moved to command line monitors.
Extracting this information in the evaluators itself, might be a bit 
easier since
it is going through the samples anyway.

So going down this road might be a bit more work, but to me it looks 
like the
solution is also much more useable.

Jörn

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "william.colen@gmail.com" <wi...@gmail.com>.

On Tue, Aug 9, 2011 at 8:31 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 8/9/11 6:58 PM, William Colen (JIRA) wrote:
>
>> What about the methods from Evaluator?
>>
>> void missclassified(Span references[], Span predictions[], String
>> referenceSample, String predictedSample, String sentence)
>> void missclassified(Span references[], Span predictions[], String
>> referenceSample, String predictedSample, String[] sentenceTokens)
>> void missclassified(String references[], String predictions[], String
>> referenceSample, String predictedSample, String[] sentenceTokens)
>>
>> Or do you think we should take advantage of some structure provided by
>> F-Measure classes? I can't see it yet.
>>
>
> Don't we have the samples?
>
> The evaluator knows that a sample was incorrectly classified.
>
> It could provide the original gold sample, and the predicted sample,
> this way a report tool can calculate the difference between the two samples
> and output/mark it,
> or compute statistics about mistakes.
>
> What do you think?
>
> Jörn
>

I think it would be much better, but we have different sample classes (one
for each tool) and no common parent. As far as I can see there is no way to
compare two samples without knowing the tool and it makes harder to
implement the monitor. That is way I avoided using the sample itself and
added 3 methods that covers different kinds of samples we have.

Re: [jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by Jörn Kottmann <ko...@gmail.com>.

On 8/9/11 6:58 PM, William Colen (JIRA) wrote:
> What about the methods from Evaluator?
>
> void missclassified(Span references[], Span predictions[], String referenceSample, String predictedSample, String sentence)
> void missclassified(Span references[], Span predictions[], String referenceSample, String predictedSample, String[] sentenceTokens)
> void missclassified(String references[], String predictions[], String referenceSample, String predictedSample, String[] sentenceTokens)
>
> Or do you think we should take advantage of some structure provided by F-Measure classes? I can't see it yet.

Don't we have the samples?

The evaluator knows that a sample was incorrectly classified.

It could provide the original gold sample, and the predicted sample,
this way a report tool can calculate the difference between the two 
samples and output/mark it,
or compute statistics about mistakes.

What do you think?

Jörn

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "William Colen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081759#comment-13081759 ] 

William Colen commented on OPENNLP-226:
---------------------------------------

What about the methods from Evaluator?

void missclassified(Span references[], Span predictions[], String referenceSample, String predictedSample, String sentence)
void missclassified(Span references[], Span predictions[], String referenceSample, String predictedSample, String[] sentenceTokens)
void missclassified(String references[], String predictions[], String referenceSample, String predictedSample, String[] sentenceTokens)

Or do you think we should take advantage of some structure provided by F-Measure classes? I can't see it yet.

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085920#comment-13085920 ] 

Jörn Kottmann commented on OPENNLP-226:
---------------------------------------

Do you think that will work?

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085939#comment-13085939 ] 

Jörn Kottmann commented on OPENNLP-226:
---------------------------------------

Yes, that would be nice.

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-226) Evaluators should allow tools to register a misclassified report interface

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/OPENNLP-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085909#comment-13085909 ] 

Jörn Kottmann commented on OPENNLP-226:
---------------------------------------

Ups, I missed something here. You have been speaking about a different method.
I mistakenly thought you wanted to put this call in Evaluator.evaluateSample, and just let it call some abstract method to do the actual work in a sub-class. Or something like this.

In this case I think we should just make the call from the individual evaluateSample methods, we can refactor that later into the base class.

> Evaluators should allow tools to register a misclassified report interface
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-226
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-226
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Chunker, Command Line Interface, Name Finder, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> OPENNLP-220 introduced the -misclassified argument that enables evaluators to print misclassified items while using the command line evaluators. We should expand it to allow any other tool that uses evaluators to register an interface to get that information.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira