You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by rw...@apache.org on 2013/06/10 07:38:01 UTC

svn commit: r1491339 - /stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext

Author: rwesten
Date: Mon Jun 10 05:38:00 2013
New Revision: 1491339

URL: http://svn.apache.org/r1491339
Log:
minor: formatting

Modified:
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext

Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext?rev=1491339&r1=1491338&r2=1491339&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext Mon Jun 10 05:38:00 2013
@@ -155,7 +155,7 @@ __Token level Parameters:__
 * __lc__ {name}::LexicalCategory - The linked _Token Categories_. Valid values include the name's of members of the LexicalCategory enumeration (e.g. "Noun", "Verb", "Adjective", "Adposition", …). Typical configurations include "lc=Noun" or an empty list ("lc" or "lc=") to deactivate all categories and provide more fine granular Pos or Tag level configuration.
 * __pos__ {name}::Pos - This linked _Pos Types_. Valid values include the name's of members of the Pos enumeration (e.g. "ProperNoun", "CommonNoun", "Infinitive", "Gerund", "PresentParticiple" and ~150 others). This parameter can be used to provide a very fine granular configuration. It is e.g. used by the _Link ProperNouns only_ setting to define that only "pos=ProperNoun" are linked.
 * __tag__ {tag}::String - The linked _Pos Tags_. This parameter allows to configure POS tags as used by the POS tagger. This is useful if those Tags are not mapped to LexicalCategories or Pos types.
-*__prob__ [0..1)::double - the _Min PosTag Probability_. This parameter replaces the formally used _Min POS tag probability_ _(org.apache.stanbol.enhancer.engines.keywordextraction.minPosTagProbability)_ property. It defines the minimum confidence so that a POS annotation is accepted for linkable and matchable tokens ('value/2' is sufficient for rejecting none linked/matched tokens).
+* __prob__ [0..1)::double - the _Min PosTag Probability_. This parameter replaces the formally used _Min POS tag probability_ _(org.apache.stanbol.enhancer.engines.keywordextraction.minPosTagProbability)_ property. It defines the minimum confidence so that a POS annotation is accepted for linkable and matchable tokens ('value/2' is sufficient for rejecting none linked/matched tokens).
 * __uc__ {NONE/MATCH/LINK}::string - the _Upper Case Token Mode_ allows to configure how upper case words are treated. There are three possible modes: (1) NONE: defines that they are not specially treated; (2) MATCH defines that they are considered as matchable tokens (independent of the POS tag or the token length; (3) LINK: defines that they are in any case linked with the vocabulary. The default is "LINK" - as upper case words often represent named entities - with the exception of German ('de') where the mode is set to MATCH - as all Nouns in German are upper case.
 
 NOTE: that tokens are linked if any of "lc", "pos" or "tag" match the configuration. This means that adding "lc=Noun" will render "pos=ProperNoun" useless as the Pos type ProperNoun is already included in the LexicalCategory Noun.
@@ -169,9 +169,10 @@ The default configuration for the Entity
     es;lc=Noun
     nl;lc=Noun
 
-The first line enable _Link Multiple Matchable Tokens in Phrases_ and linking of upper case tokens for all languages. In addition it sets the minimum probabilities for Pos- and Phrase annotations to 0.75 (what would be also the default). The following three lines provide additional language specific defaults. For German the upper case mode is reset to MATCH as in German all Nouns use upper case. For Spain and Dutch linking for the LexicalCategory Noun is enabled. This is because the OpenNLP POS tagger for those languages does not support ProperNoun's and therefore the Engine would not link any tokens if _Link ProperNouns only_ is enabled. The same configuration in the OSGI '.config' file syntax would look like follows
+The first line enable _Link Multiple Matchable Tokens in Phrases_ and linking of upper case tokens for all languages. In addition it sets the minimum probabilities for Pos- and Phrase annotations to 0.75 (what would be also the default). The following three lines provide additional language specific defaults. For German the upper case mode is reset to MATCH as in German all Nouns use upper case. For Spain and Dutch linking for the LexicalCategory Noun is enabled. This is because the OpenNLP POS tagger for those languages does not support ProperNoun's and therefore the Engine would not link any tokens if _Link ProperNouns only_ is enabled. The same configuration in the OSGI '.config' file syntax would look like follows _(NOTE: please exclude the line break used here for better formatting)_
 
-    enhancer.engines.linking.processedLanguages=["*;lmmtip;uc\=LINK;prop\=0.75;pprob\=0.75","de;uc\=MATCH","es;lc\=Noun","nl;lc\=Noun"]
+    enhancer.engines.linking.processedLanguages=
+        ["*;lmmtip;uc\=LINK;prop\=0.75;pprob\=0.75","de;uc\=MATCH","es;lc\=Noun","nl;lc\=Noun"]
 
 The 2nd example shows how to define default settings without using the wildcard '*' that would enable processing of all languages. The following example shows an configuration that only enables English and ignores text in all other languages.