You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2014/02/19 16:34:20 UTC

[jira] [Created] (STANBOL-1285) FST Linking Engine / Linkable Token Filter should consider Chunks

Rupert Westenthaler created STANBOL-1285:
--------------------------------------------

             Summary: FST Linking Engine / Linkable Token Filter should consider Chunks
                 Key: STANBOL-1285
                 URL: https://issues.apache.org/jira/browse/STANBOL-1285
             Project: Stanbol
          Issue Type: Improvement
          Components: Enhancement Engines
    Affects Versions: 0.12.0
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler
            Priority: Minor
             Fix For: 1.0.0, 0.12.1


The LinkableTokenFilter a Solr TokenFilter is used by the FST linking engine to add the TaggingAttribute (supported by the Solr Text Tagger library) to tokens that should be looked up in the FST - the vocabulary.

This implementation can be improved by taking chunks into consideration that are

* chunks representing named entities
* processable (typically Noun phrases but no Verb phrases ...) AND
    * have a linkable token in the chunk OR
    * have two or more matchable tokens in the chunk

All tokens in such chunks should be classified as tagable by setting the TaggingAttribute to true.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)