You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Ayeshmantha (JIRA)" <ji...@apache.org> on 2018/11/15 10:56:00 UTC
[jira] [Commented] (STANBOL-320) Named Entity detection engine
should filter out some obviously wrong text annotations
[ https://issues.apache.org/jira/browse/STANBOL-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687822#comment-16687822 ]
Ayeshmantha commented on STANBOL-320:
-------------------------------------
Hi [~rafaharo]
Any one looking at this
> Named Entity detection engine should filter out some obviously wrong text annotations
> -------------------------------------------------------------------------------------
>
> Key: STANBOL-320
> URL: https://issues.apache.org/jira/browse/STANBOL-320
> Project: Stanbol
> Issue Type: Improvement
> Components: Enhancement Engines
> Reporter: Olivier Grisel
> Assignee: Rafa Haro
> Priority: Major
>
> OpenNLP tend to return really weird results from time to time. For instance:
> "The researchers found the liver expresses higher levels of the gene encoding "selenoprotein P" (SEPP1) in people with type 2 diabetes - those with more insulin resistance." outputs a Person TextAnnotation for the mention 'P "' => note the double quote that is included as part the mention and the additional whitespace separator probably inserted by a confused detokenizer.
> Here is another example:
> "We are all very excited for Rahm as he takes on a new challenge for which he is extraordinarily well qualified," said the president. Obama appointed political consultant and senior advisor Pete Rouse as interim chief, calling Rouse "a skillful problem-solver" and a "wise, skillful and long-time counselor." => outputs 'Rouse "' as a Person annotation as well. This is again a confusion with a bad handling of quotation marks.
> I would like to use this jira issue to collect most common annotation mistake that could be filtered using ad-hoc java code directly inside the enhancement engine.
> For the too previous cases, removing the quotation marks and filtering single letter names should be enough. There might be other cases that don't match this simple pattern though.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)