You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2014/01/21 10:24:19 UTC
[jira] [Created] (STANBOL-1262) Change/Improve processing of Chunks
by EntityLinking
Rupert Westenthaler created STANBOL-1262:
--------------------------------------------
Summary: Change/Improve processing of Chunks by EntityLinking
Key: STANBOL-1262
URL: https://issues.apache.org/jira/browse/STANBOL-1262
Project: Stanbol
Issue Type: Improvement
Affects Versions: 0.12.0
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler
The first step of EntityLinking (applies to all EntityLinkingEngines incl. the Lucene FST Linking Engine) is that it classifies Tokens as "linkable", "matchable" and "others". In addition it determines "processible" chunks Tokens are contained in.
This issue is about changing the way how "processible" chunks are determined if the AnalyzedText contains multiple overlapping chunks.
A typical case where this can happen is if both a Noun Phrase Detection and a Named Entity Recognition is contained in the Chain. The chunks selected by Named Entities will typically be smaller as the corresponding Noun Phrase. There are even situations where the Named Entity does not even include all Nouns contained in a Noun Phrase.
Here an Example taken from [1]:
After a disappointing start against an Everton side who led through Kevin Mirallas's first-half goal ...
While "Everton" is detected as Organization by NER, the Noun Phrase "an Everton side" also include 'side' as an 2nd noun. Therefore 'Everton' is not considered for linking as it only matches a 1/2 matchable tokens within a 'processible phrase'
This is because EntityLinking currently merges overlapping processible phrase together. A semantic that is - no longer - an optimal for EntityLinking.
To avoid recall problems like described the intersection instead of the union of multiple processible chunks need to be used.
For the given example this would result in
- an [other]: an Everton side
- Everton [linkable]: Everton
- side [matchable]: an Everton side
So 'Everton' would get correctly linked to an Entity with the label Everton but 'side' would not get linked to an Entity with the label Side, as it is in a Phrase with an other linkable/matchable token.
[1] http://www.theguardian.com/football/2014/jan/20/west-bromwich-albion-everton-premier-league-match-report
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)