You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by rw...@apache.org on 2012/11/23 14:34:34 UTC

svn commit: r1412885 - in /stanbol/site/trunk/content/docs/trunk/components/enhancer/engines: opennlpchunker.mdtext opennlppos.mdtext opennlpsentence opennlptokenizer.mdtext

Author: rwesten
Date: Fri Nov 23 13:34:33 2012
New Revision: 1412885

URL: http://svn.apache.org/viewvc?rev=1412885&view=rev
Log:
STANBOL-733 - minor formatting related changes

Modified:
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpchunker.mdtext
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlppos.mdtext
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpsentence
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlptokenizer.mdtext

Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpchunker.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpchunker.mdtext?rev=1412885&r1=1412884&r2=1412885&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpchunker.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpchunker.mdtext Fri Nov 23 13:34:33 2012
@@ -1,13 +1,13 @@
 title: OpenNLP Chunker Engine
 
-The OpenNLP Chunker Engine support the detection of Phrases (Noun, Verb, ...) within the parsed Text. For that it uses the OpenNLP Chunker feature. Detected Phrases are added as _Chunk_s to the _[AnalyzedText](../nlp/analyzedtext)_ content part. In addition added _Chunk_s are annotated with an [Phrase Annotation](../nlp/nlpannotations#phrase-annotations) providing the type of the Phrase represented by the _Chunk_.
+The OpenNLP Chunker Engine support the detection of Phrases (Noun, Verb, ...) within the parsed Text. For that it uses the OpenNLP Chunker feature. Detected Phrases are added as _Chunks_ to the _[AnalyzedText](../nlp/analyzedtext)_ content part. In addition added _Chunks_ are annotated with an [Phrase Annotation](../nlp/nlpannotations#phrase-annotations) providing the type of the Phrase represented by the _Chunk_.
 
 
 ## Consumed information
 
 * __Language__ (required): The language of the text needs to be available. It is read as specified by [STANBOL-613](https://issues.apache.org/jira/browse/STANBOL-613) from the metadata of the ContentItem. Effectively this means that any Stanbol Language Detection engine will need to be executed before the OpenNLP POS Tagging Engine.
 * __Tokens with POS annotations__ (required): This Engine needs the Text to be tokenized and POS tagged. Even more the POS tags need to be compatible with the POS tags used to train the Chunker model. This effectively means that this Engine will only work as expected if the POS tagging was done by the OpenNLP POS Tagging Engine configured with a POS model using the same POS tag set as used for training the chunker model.
-* __Sentences__ (optional): In case _Sentence_s are available in the _AnalyzedText_ content part the tokenization of the text is done sentence by sentence. Otherwise the whole text is tokenized at once.
+* __Sentences__ (optional): In case _Sentences_ are available in the _AnalyzedText_ content part the tokenization of the text is done sentence by sentence. Otherwise the whole text is tokenized at once.
 
 ## Configuration
 

Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlppos.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlppos.mdtext?rev=1412885&r1=1412884&r2=1412885&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlppos.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlppos.mdtext Fri Nov 23 13:34:33 2012
@@ -5,7 +5,7 @@ POS tagging Engine using the [AnalyzedTe
 ## Consumed information
 
 * __Language__ (required): The language of the text needs to be available. It is read as specified by [STANBOL-613](https://issues.apache.org/jira/browse/STANBOL-613) from the metadata of the ContentItem. Effectively this means that any Stanbol Language Detection engine will need to be executed before the OpenNLP POS Tagging Engine.
-* __Sentences__ (optional): In case _Sentence_s are available in the _AnalyzedText_ content part the tokenization of the text is done sentence by sentence. If no _Sentence_s are available this engine detects sentences if a sentence detection model is available for that language (see below for more information). If no _Sentence_s are present and no OpenNLP sentence detection model is available for the language of the processed text, than the whole text is processed as a single sentence.
+* __Sentences__ (optional): In case _Sentences_ are available in the _AnalyzedText_ content part the tokenization of the text is done sentence by sentence. If no _Sentences_ are available this engine detects sentences if a sentence detection model is available for that language (see below for more information). If no _Sentences_ are present and no OpenNLP sentence detection model is available for the language of the processed text, than the whole text is processed as a single sentence.
 * __Tokens__ (optional): Foe POS tagging the Text needs to be tokenized. This Engine tries to consume _Tokens_ from the _AnalyzedText_ content part. If no Tokens are available it uses the OpenNLP tokenizer to tokenize the text (see below for more information).
 
 ## POS Tagging
@@ -42,18 +42,18 @@ The OpenNLP Pos Tagging engine supports 
 * Spanish: based on the PAROLE TagSet mapping to the [OLiA Ontology](http://nlp2rdf.lod2.eu/olia/) ([annotation model](http://purl.org/olia/parole_es_cat.owl))
 * Danish: mappings for the PAROLE Tagset as described by [this paper](http://korpus.dsl.dk/paroledoc_en.pdf).
 * Portuguese: mappings based on the [PALAVRAS tag set](http://beta.visl.sdu.dk/visl/pt/symbolset-floresta.html)
-* Dutch: mappings based on the WOTAN Tagset for Dutch as described by _"WOTAN: Een automatische grammatikale tagger voor het Nederlands", doctoral dissertation, Department of language & Speech, Nijmegen University (renamed to Radboud University), december 1994."_. _NOTE_ that this TagSet does NOT distinguish between _ProperNoun_s and _CommonNoun_s.
+* Dutch: mappings based on the WOTAN Tagset for Dutch as described by _"WOTAN: Een automatische grammatikale tagger voor het Nederlands", doctoral dissertation, Department of language & Speech, Nijmegen University (renamed to Radboud University), december 1994."_. _NOTE_ that this TagSet does NOT distinguish between _ProperNouns_ and _CommonNoun_s.
 * Swedish: based on the [Lexical categories in MAMBA](http://w3.msi.vxu.se/users/nivre/research/MAMBAlex.html)
 
 __TODO:__ Currently the Engine is limited to those TagSets as it is not yet possible to extend this by additional one.
 
 ## Tokenizing and Sentence Detection Support
 
-The OpenNLP POS Tagging engine implicitly supports tokenizing and sentence detection. That means if the _[AnalyzedText](../nlp/analysedtext)_ is not present or does not contain _Token_s than this engine will use the OpenNLP Tokenizer to tokenize the text. If no language specific OpenNLP tokenizer model is available, than it will use the SIMPLE_TOKENIZER.
+The OpenNLP POS Tagging engine implicitly supports tokenizing and sentence detection. That means if the _[AnalyzedText](../nlp/analysedtext)_ is not present or does not contain _Tokens_ than this engine will use the OpenNLP Tokenizer to tokenize the text. If no language specific OpenNLP tokenizer model is available, than it will use the SIMPLE_TOKENIZER.
 
-Sentence detection is only done if no _Sentence_s are present in the _AnalyzedText_ AND if a language specific sentence detection model is available.
+Sentence detection is only done if no _Sentences_ are present in the _AnalyzedText_ AND if a language specific sentence detection model is available.
 
-__NOTE__: Support for Tokenizing and Sentence Detection is not a replacement for explicitly adding a Tokenizing and Sentence Detection Engine to a Enhancement Chain as this Engine does not guarantee that _Token_s or _Sentence_s are added to the _AnalyzedText_ content part. If no POS model is available for a language or a language is not configured to be processed there will be no _Token_s nor _Sentence_s added. Chains the relay on _Token_s and/or _Sentence_s MUST explicitly include a Tokenizing and Sentence detection engine!
+__NOTE__: Support for Tokenizing and Sentence Detection is not a replacement for explicitly adding a Tokenizing and Sentence Detection Engine to a Enhancement Chain as this Engine does not guarantee that _Tokens_ or _Sentences_ are added to the _AnalyzedText_ content part. If no POS model is available for a language or a language is not configured to be processed there will be no _Tokens_ nor _Sentences_ added. Chains the relay on _Tokens_ and/or _Sentences_ MUST explicitly include a Tokenizing and Sentence detection engine!
 
 
 ## Configuration
@@ -91,10 +91,12 @@ The OpenNLP POS annotation engine suppor
 
 The syntax for parameters is as follows
 
+    :::text
     {language};{param-name}={param-value}
 
 So to use the "my-de-pos-model.zip" for POS tagging German texts one can use a configuration like follows
 
+    :::text
     de;model=my-de-pos-model.zip
     *
 

Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpsentence
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpsentence?rev=1412885&r1=1412884&r2=1412885&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpsentence (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpsentence Fri Nov 23 13:34:33 2012
@@ -1,6 +1,6 @@
 title: OpenNLP Sentence Detection Engine
 
-The OpenNLP Sentence Detection Engine adds _Sentence_s to the _[AnalyzedText](../nlp/analyzedtext)_ content part. If the _AnalyzedText_ content part is not yet present it is created by this engine.
+The OpenNLP Sentence Detection Engine adds _Sentences_ to the _[AnalyzedText](../nlp/analyzedtext)_ content part. If the _AnalyzedText_ content part is not yet present it is created by this engine.
 
 ## Consumed information
 
@@ -8,7 +8,7 @@ The OpenNLP Sentence Detection Engine ad
 
 ## Configuration
 
-The OpenNLP Sentence Detector Engine provides a default service instance (configuration policy is optional). This instance processes all languages and adds _Sentence_s for all languages where a OpenNLP sentence detection model is available. This Engine instance uses the name 'opennlp-sentence' and has a service ranking of '-100'.
+The OpenNLP Sentence Detector Engine provides a default service instance (configuration policy is optional). This instance processes all languages and adds _Sentences_ for all languages where a OpenNLP sentence detection model is available. This Engine instance uses the name 'opennlp-sentence' and has a service ranking of '-100'.
 
 This engine supports the default configuration for Enhancement Engines including the __name__ _(stanbol.enhancer.engine.name)_ and the __ranking__ _(service.ranking)_ In addition it is possible to configure the __processed languages__ _(org.apache.stanbol.enhancer.sentence.languages)_ and an parameter to specify the name of the sentence detection model used for a language.
 
@@ -41,10 +41,12 @@ The OpenNLP Sentence Detection engine su
 
 The syntax for parameters is as follows
 
+    :::text
     {language};{param-name}={param-value}
 
 So to use the "my-de-sentence-model.zip" for detecting sentences in German texts one can use a configuration like follows
 
+    :::text
     de;model=my-de-sentence-model.zip
     *
 

Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlptokenizer.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlptokenizer.mdtext?rev=1412885&r1=1412884&r2=1412885&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlptokenizer.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlptokenizer.mdtext Fri Nov 23 13:34:33 2012
@@ -1,11 +1,11 @@
 title: OpenNLP Tokenizer Engine
 
-The OpenNLP Tokenizer Engine adds _Token_s to the _AnalyzedText_ content part. If this content part is not yet present it adds it to the ContentItem.
+The OpenNLP Tokenizer Engine adds _Tokens_ to the _AnalyzedText_ content part. If this content part is not yet present it adds it to the ContentItem.
 
 ## Consumed information
 
 * __Language__ (required): The language of the text needs to be available. It is read as specified by [STANBOL-613](https://issues.apache.org/jira/browse/STANBOL-613) from the metadata of the ContentItem. Effectively this means that any Stanbol Language Detection engine will need to be executed before the OpenNLP POS Tagging Engine.
-* __Sentences__ (optional): In case _Sentence_s are available in the _AnalyzedText_ content part the tokenization of the text is done sentence by sentence. Otherwise the whole text is tokenized at once.
+* __Sentences__ (optional): In case _Sentences_ are available in the _AnalyzedText_ content part the tokenization of the text is done sentence by sentence. Otherwise the whole text is tokenized at once.
 
 ## Configuration
 
@@ -40,15 +40,18 @@ The OpenNLP Tokenizer engine supports th
 
 The syntax for parameters is as follows
 
+    :::text
     {language};{param-name}={param-value}
 
 So to use the "my-de-tokenizer-model.zip" for tokenizing German texts one can use a configuration like follows
 
+    :::text
     de;model=my-de-tokenizer-model.zip
     *
 
 To configure that the SimpleTokenizer should be used for a given language the 'model' parameter needs to be set to 'SIMPLE' as shown in the following example
 
+    :::text
     de;model=SIMPLE
     *