You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Jim Piliouras (Created) (JIRA)" <ji...@apache.org> on 2012/04/04 17:19:26 UTC

[jira] [Created] (OPENNLP-494) Merging results from several name-finders

Merging results from several name-finders
-----------------------------------------

                 Key: OPENNLP-494
                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
             Project: OpenNLP
          Issue Type: New Feature
          Components: Name Finder
    Affects Versions: tools-1.5.3
         Environment: Ubuntu oneiric x64 Java 7 update 3
            Reporter: Jim Piliouras
             Fix For: tools-1.5.3


Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment:     (was: TokenNameFinderEvaluator.java)
    
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment: UnitTest.java

This is the main method turned into a JUnit test...
Basically, with the data provided all 3 statistics should be 1.0. At the end of the source file you will find the test-set (a small paragraph) and the dictionary (the one entry maxent can't find)...The model I used will be attached to this jira issue. Alternatively, if you don't want to depend on my model you can create a regex name finder with hardcoded patterns for all the entities except "Folic acid" and use that instead to achieve perfect scores.
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, UnitTest.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250627#comment-13250627 ] 

Jim Piliouras edited comment on OPENNLP-494 at 4/10/12 1:01 PM:
----------------------------------------------------------------

I noticed this while examining the 'FMeasure' class: (updateScores method)
-------------------------------------------------------------------------------------------
/*inside updateScores() method*/

for (int referenceIndex = 0; referenceIndex < references.length;
        referenceIndex++) {

      Object referenceName = references[referenceIndex];

      for (int predictedIndex = 0; predictedIndex < predictions.length;
          predictedIndex++) {
        if (referenceName.equals(predictions[predictedIndex])) {
          truePositives++;
        }
      }
    }
---------------------------------------------------------------------------------------------

It appears that for each reference there is another loop that looks through all predictions. That sounds rather inefficient at first, but simplifies things a lot as far as the merging of results is concerned. Let me explain:

I went through more trouble than needed for the AggregateNameFinder simply because i tried keep the predicted span array length less than the whole sentence length and also sort the predictions from earliest to latest (span.getStart()). Now, looking at how each reference is checked against all predictions what i did seems redundant. If I understand correctly, with the current setup, simply merging all the predictions from several name-finders into a large span array would work out of the box! Each reference will be checked against all predictions regardless of where they came from or what is the start index of the predicted span. But that makes me wonder...If this is the case, then it sounds ridiculously easy to aggregate the results - i mean so easy that i find it hard to believe that no one had already seen it or suggested it!!! 

If my rationale is correct i can make some changes to the AggregateNameFinder.java (mainly  get rid of the comparators) so it simply merges the results of .find() without being concerned with ''time". I was under the impression that the predictions had to be in the order they were predicted, if they are to be evaluated correctly!

Jim

                
      was (Author: jim-85):
    I noticed this while examining the 'FMeasure' class: (updateScores method)
-------------------------------------------------------------------------------------------
/*inside updateScores() method*/

for (int referenceIndex = 0; referenceIndex < references.length;
        referenceIndex++) {

      Object referenceName = references[referenceIndex];

      for (int predictedIndex = 0; predictedIndex < predictions.length;
          predictedIndex++) {
        if (referenceName.equals(predictions[predictedIndex])) {
          truePositives++;
        }
      }
    }
---------------------------------------------------------------------------------------------

It appears that for each reference there is another loop that looks through all predictions. That sounds rather inefficient at first, but simplifies things a lot as far as the merging of results is concerned. Let me explain:

I went through more trouble than needed for the AggregateNameFinder simply because i tried keep the predicted span array length less than the whole sentence length and also sort the predictions from earliest to latest (span.getStart()). Now, looking at how each reference is checked against all predictions what i did seems redundant. If I understand correctly, with the current setup, simply merging all the predictions from several name-finders into a large span array would work out of the box! Each reference will be checked against all predictions regardless of where they came from or what is the start index of the predicted span. But that makes me wonder...If this is the case, then it sounds ridiculously easy to aggregate the results - i mean so easy that i find it hard to believe that no one had already seen it or suggested it!!! 

If my rationale is correct i can make some changes to the AggregateNameFinder.java (mainly  get rid of the comparators) so it simply merges the results of .find() without being concerned with ''time". I was under the impression that the predictions had to be in the order they were predicted, if they are to be evaluated correctly!

Jim









                  
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann updated OPENNLP-494:
-----------------------------------

    Fix Version/s:     (was: tools-1.5.3)
    
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>         Attachments: AggregateNameFinder.java, FINALLY4.bin, UnitTest.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment:     (was: AggregateNameFinder2.java)
    
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253364#comment-13253364 ] 

Jim Piliouras commented on OPENNLP-494:
---------------------------------------

What i suggested this morning worked like a charm!!! It makes perfect sense to have the Dictionary itself to store what type it is. IN other words the patch you applied yesterday should move to the Dictionary class instead of the DictionaryNameFinder. This allows us to create a name finder that takes more than one dictionaries, possibly of different types, thus giving it the freedom to detect multi-type entities without any help from maxent model. All i did was move the patch from the finder to the dictionary. Now each dictionary has its own type that the finder can then 'get'. Same code different place...
also now the DictionaryNameFinder looks a lot like the one before applying yesterday's patch (there is no type variable-no extra constructor) except the fact that its constructor can take more than one dictionary. As a result the Dictionary mDictionary; variable became List<Dictionary> dictionaries; That is all...produced a jar file and used it with 3 different small dictionaries (drug,protein and gene)  and it worked the first time i pressed enter...no refactoring needed at all!!! this rarely happens!!! Should i open a new Jira for this? It is a different improvement that what is described on the title...
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, FINALLY4.bin, UnitTest.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252474#comment-13252474 ] 

Joern Kottmann commented on OPENNLP-494:
----------------------------------------

There is no test I could use to run/debug it. We need a test anyway to ensure that the code works, please provide one.
And use here patch files as well. A patch can also contain the diff of multiple files. Eclipses "Create Patch..." or
svn diff are able to produce one easily.
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, AggregateNameFinder2.java, TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246425#comment-13246425 ] 

Joern Kottmann commented on OPENNLP-494:
----------------------------------------

I am just saying that this code will be needed in multiple places and that it should be reusable.
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246421#comment-13246421 ] 

Jim Piliouras commented on OPENNLP-494:
---------------------------------------

Why is that? Doesn't the UIMA name-finder implement TokenNameFinder? If it does it should work, if not why?
The truth is I've not had a look at the UIMA wrappers, i don't know what's happening there...

Ok i'll give it a go tomorrow morning.

Jim
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246374#comment-13246374 ] 

Joern Kottmann commented on OPENNLP-494:
----------------------------------------

As said on the user list we should make a proxy TokenNameFinder which calls a set of name finder and outputs the merged names. Such a class will be useful for many users, because names usually must be merged to be used.
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment: AggregateNameFinder2.java
                AggregateNameFinder.java

After noticing that the updateScores() in the FMeasure class checks each reference against all predictions i realized that all the trouble i went through to sort the merged predictions according to the start offset is a bit reduntant. All i need to do is merge the predictions from subsequent calls to find() and remove duplicates. 
The code is a lot simpler now and apart from the flattening where there will be some overhead if dealing with millions of preditions, there shouldn't be any inefficiencies anywhere else. Please have a look someone and let me knwo what you think...does the approach make sense? 
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, AggregateNameFinder2.java, TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246432#comment-13246432 ] 

Jim Piliouras edited comment on OPENNLP-494 at 4/4/12 4:29 PM:
---------------------------------------------------------------

Ok what if we make a "super evaluator" rather than a "super name-finder"? Then the code will be contained in a separate class...I still think the namefind package is the wrong place for something related only to evaluation...

In any case i will try it out tomorrow. Then you guys can review both approaches and decide accordingly...

                
      was (Author: jim-85):
    Ok what if we make a "super evaluator" rather than a "super name-finder"? Then the code will be contained in a separate class...I still think the namefind package is the wrong place for something related only to evaluation...

In any case i will try it out tomorrow. Then you guys can review both approaches and decide accordingly...

Jim
                  
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment: FINALLY4.bin

This is the model i used for the unit test...It will find everything on that paragraph except "Folic acid"...
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, FINALLY4.bin, UnitTest.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment: TokenNameFinderEvaluator.java

This is the patch i wrote...Let me knwo what you think! Jorn has already suggested to sort of do the same thing but on the name finder. IN  other words create a super-namefinder that acts as a proxy to the regular ones and merges their results there...
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment: AggregateNameFinder2.java

minor formatting changes mainly to conform to openNLP code formatting conventions.
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, AggregateNameFinder2.java, TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment: TokenNameFinderEvaluator.java

Last post for the day...

I tried to run it and I seem to be getting an unjustifiable array-store exception when i'm doing System.arraycopy...I am 100% sure that there is no type-mismatch (both src and dest are Span[] )...maybe someone with more experience with generic types and arrays can help here...apart from this exception everything else is sorted in this last attachment. You can just throw it in the namefind package and run it.

Jim

p.s. i will try to do something similar to the name-finder either tomorrow or the day after
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment: AggregateNameFinder.java

Good news!!! I just managed to use the aggregate name-finder to merge the results and get 100% on all statistics. I'm posting the entire class which has a main method at the end for testing.If you want me to include the actual test paragraph, the dictionary xml file or the model please ask... This is ready to be commited i think..
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252501#comment-13252501 ] 

Joern Kottmann commented on OPENNLP-494:
----------------------------------------

Which strategy do you use to eliminate overlapping names of same/different type?

We should have a test class before we commit it. Maybe you can turn the main method into a unit test.
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment:     (was: AggregateNameFinder2.java)
    
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252477#comment-13252477 ] 

Jim Piliouras commented on OPENNLP-494:
---------------------------------------

But this is a brand new class - i'm not patching anything...i did my tests from clojure interactively and thus i don't have any java code for this. I was hoping you would have some...anyway, i will write one so i can see the stackTrace at least...
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, AggregateNameFinder2.java, TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment:     (was: AggregateNameFinder2.java)
    
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment: AggregateNameFinder.java

I could not really sleep so I did a bit of coding instead...Now i'm ready to sleep!!!

Ok so i tried to do what Jorn recommended...It turned out to be harder than i expected mainly because when one tries to do that some serious questions come up like: 
--In what order do we use the name-finders?
--What if 2 name-finders have different opinions about the same token(s)? Which one do we trust?
-- etc ect...

Anyway the code is commented, i hope someone can take a look and give some feedback...What happens now is the following:

1) the AggregateNameFinder implements TokenNameFinder takes any number of TokenNameFinders as constructor args and adds them to a global List 
2)the find method calls the 'find' method of all the name-finders and puts the resulting Span[] in a list (and a hash-map with the class name of the namefinder as key)
3)now we sort the list in decreasing order of array lengths in order to get our hero name-finder (the one who found the most entities - first in the sorted list)
4)all that is needed now is to see whether we can add to (not change) the hero's predictions using the rest name-finders.
5)we do that one by one...first we improve the hero with the second best, then the improved hero with the 3rd best and so forth. we keep a list of real improvements that we were able to do.
6)finally we merge all the real improvements with the hero's prediction and we sort by increasing order of start offsets of the spans. That way we know the natural ordering of the english sentence has been preserved (smaller start offsets will come first).

Opinions, feedback,critique etc etc are welcome...Just trying to help make opennLP better

Jim



                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252956#comment-13252956 ] 

Joern Kottmann commented on OPENNLP-494:
----------------------------------------

Just eliminating duplicates will not work well for most users, because what you want it to have is some smart thing which can merge the names for you. We might need to have different merging strategies depending on the use case.

A user wants to get names which do not intersect or overlap usually, otherwise he could just take the output of various name finders directly.
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, FINALLY4.bin, UnitTest.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252511#comment-13252511 ] 

Jim Piliouras commented on OPENNLP-494:
---------------------------------------

I don't deal with different types...It is really odd to try to do multiple types when some name-finders don't support it (eg. dictionary) isn't it? 
as far as overlapping spans are concerned i think each name-finder does it for its own findings. I'm only making sure the merged findings have no duplicates...It worked fine on my big corpus - got back roughly 9% increase in my statistics... 

Unit test eh? i've not done one of these in Java for a while!!! I know that's pathetic but Java is already very verbose ;-)

I'll give it a go...I will only include a small test paragraph and only a subset of my dictionary though.
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253300#comment-13253300 ] 

Jim Piliouras commented on OPENNLP-494:
---------------------------------------

Could you provide an example sentence where overlapping or intersecting spans would occur? Maybe i can test your concerns if you provide an example. I spent a couple of hours training a 3 type model (drug, protein, gene) and then combined it with the drug dictionary and it worked just fine. Of course the dictionary only "helped" with the drug type not the rest but this is expected. What i'm saying is that no overlapping spans occurred again!

It just occurred to me that we could have the Dictionary storing the type (NOT the DictionaryNameFinder as patched yesterday) and have the DictionaryNameFinder take a variable number of dictionaries each with its own type. Then the dictionary finder will be able to deal with multi-type entities as well (assuming the user supplied several dictionaries). Of course that means tracking which dictionary the prediction came from so the appropriate type can be attached to the span.   
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, FINALLY4.bin, UnitTest.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment: AggregateNameFinder2.java

Updated slightly the code of AggregateNameFinder2 so it is easily readable...I still don't know what to do with the array-store exception though! I've done everything in my power to make sure the same types when copying and debugging has confirmed it!!! Don't know what else to do, honestly...I had a look in other name finders code and the same idea is applied there as well..."Start with a List, do all the manipulation and shove them in an array using list.toArray(new Span[list.size()])"...THat is exactly what i'm doing! Cannot proceed wihtout some feedback...Actually, i am 95% sure that if the exception goes away the merging of results will work. The only problem then is making the Dictionary smarter so it can deal with other tags apart from the default one...I think i can do that as well...

Jim
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, AggregateNameFinder2.java, TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246410#comment-13246410 ] 

Jim Piliouras commented on OPENNLP-494:
---------------------------------------

I really don't see the difference between what i did and what you're proposing...it is the same idea in different places! the question is 'whose job is to merge the results? ' the name-finder's or the evaluator's? In my opinion it is the evaluator's job...depending on how many name-finders it is initialized with it should use them all for reporting the statistics...The rationale behind the code i patched is to be able to pass any number of name-finders...the merging loop has a magicMarker which helps with setting the offset when copying the arrays...It can deal 1 or more name-finders...

Unfortunately i cannot code what you suggested today - maybe tomorrow though...
BTW, all the users that would benefit from having the "super-name-finder" will also benefit from the new evaluator. The only problem is the 'default' tag the dictionary name-finer needs.

What do other people think?
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment:     (was: TokenNameFinderEvaluator.java)
    
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246447#comment-13246447 ] 

Joern Kottmann commented on OPENNLP-494:
----------------------------------------

Its not only needed for evaluation, its a common post-processing step after multiple name finders detected names in a document/article.
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment:     (was: AggregateNameFinder.java)
    
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment:     (was: AggregateNameFinder.java)
    
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246418#comment-13246418 ] 

Joern Kottmann commented on OPENNLP-494:
----------------------------------------

The point is, when you do the merging inside the Evaluator it is not possible to reuse the logic in other places, e.g. the UIMA Name Finder AE.
That is why I believe it should go into a separate reusable class.
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment: TokenNameFinderEvaluator.java

spotted a bug and refreshed the attachment...

the magicMarker needs to keep accumulating lengths for the array offset to work correctly.
changed '=' to '+='
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252980#comment-13252980 ] 

Jim Piliouras commented on OPENNLP-494:
---------------------------------------

I'm not sure i follow... but i think i had the exact same views with you before looking at the updateScores method in Fmeasure. I too thought we needed a strategy to sort and merge the spans but it turns out that each reference will be checked against each prediction so it makes no sense to try and fiddle around with the results...just merge them and say "look here are my predictions". So what if 2 spans overlap? They can't both be correct (realistically)...Either one is correct or none. I'm telling you it works like a charm for my big corpus as well...having the dictionary boosts my recall by 24% which is simply unbelievable! I'm getting fmeasure = 96% which is more than most state-of-the art systems give you. I'm not sure how overlapping/intersecting spans can occur and why i'm not getting any of those even on my bog corpus... This is a killer feature and i don't know of any other text-mining APIs that let you do that.
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, FINALLY4.bin, UnitTest.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment: TokenNameFinderEvaluator.java

OK my  bad, this is the final one...it compiles and everything! Just build the whole project...
i haven't tried to use it yet cos i'm not sure whether it's worth the effort. Since the dictionary cannot deal with the 'drug' tag during evaluation that means i have to re-train my maxent model with the 'default' tag inorder to notice any difference in my results. This last version includes the original constructor that accepts a single name-finder so it can still be used as normal. Nothing is broken (fingers crossed)...
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250627#comment-13250627 ] 

Jim Piliouras commented on OPENNLP-494:
---------------------------------------

I noticed this while examining the 'FMeasure' class: (updateScores method)
-------------------------------------------------------------------------------------------
/*inside updateScores() method*/

for (int referenceIndex = 0; referenceIndex < references.length;
        referenceIndex++) {

      Object referenceName = references[referenceIndex];

      for (int predictedIndex = 0; predictedIndex < predictions.length;
          predictedIndex++) {
        if (referenceName.equals(predictions[predictedIndex])) {
          truePositives++;
        }
      }
    }
---------------------------------------------------------------------------------------------

It appears that for each reference there is another loop that looks through all predictions. That sounds rather inefficient at first, but simplifies things a lot as far as the merging of results is concerned. Let me explain:

I went through more trouble than needed for the AggregateNameFinder simply because i tried keep the predicted span array length less than the whole sentence length and also sort the predictions from earliest to latest (span.getStart()). Now, looking at how each reference is checked against all predictions what i did seems redundant. If I understand correctly, with the current setup, simply merging all the predictions from several name-finders into a large span array would work out of the box! Each reference will be checked against all predictions regardless of where they came from or what is the start index of the predicted span. But that makes me wonder...If this is the case, then it sounds ridiculously easy to aggregate the results - i mean so easy that i find it hard to believe that no one had already seen it or suggested it!!! 

If my rationale is correct i can make some changes to the AggregateNameFinder.java (mainly  get rid of the comparators) so it simply merges the results of .find() without being concerned with ''time". I was under the impression that the predictions had to be in the order they were predicted, if they are to be evaluated correctly!

Jim









                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java, TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246432#comment-13246432 ] 

Jim Piliouras commented on OPENNLP-494:
---------------------------------------

Ok what if we make a "super evaluator" rather than a "super name-finder"? Then the code will be contained in a separate class...I still think the namefind package is the wrong place for something related only to evaluation...

In any case i will try it out tomorrow. Then you guys can review both approaches and decide accordingly...

Jim
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: TokenNameFinderEvaluator.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252497#comment-13252497 ] 

Jim Piliouras commented on OPENNLP-494:
---------------------------------------

Can someone please commit the class and close the issue? I've tested it with 2 different corpus and it works from Java and from Clojure as well...
Amazing news yes? The class itself contains the test - please remove it before commiting... 
                
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: AggregateNameFinder.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment:     (was: TokenNameFinderEvaluator.java)
    
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-494) Merging results from several name-finders

Posted by "Jim Piliouras (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Piliouras updated OPENNLP-494:
----------------------------------

    Attachment:     (was: TokenNameFinderEvaluator.java)
    
> Merging results from several name-finders
> -----------------------------------------
>
>                 Key: OPENNLP-494
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-494
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>    Affects Versions: tools-1.5.3
>         Environment: Ubuntu oneiric x64 Java 7 update 3
>            Reporter: Jim Piliouras
>              Labels: patch
>             Fix For: tools-1.5.3
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Made some small changes to the TokenNameFinderEvaluator class which hopefully allow merging of results from several name-finders. It just does that by calling the find method of all supplied name-finders. The only break is the fact that Java does not allow varargs anywhere but at the end of the argument list so i could not use "TokenNameFinder... namefinders" as the first parameter in the constructor - i had to pass an array instead. I think it's worth reversing the order of arguments  but that is a break too...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira