You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by rw...@apache.org on 2012/11/23 18:00:32 UTC

svn commit: r1412971 - in /stanbol/site/trunk/content/docs/trunk/components/enhancer/engines: keywordlinkingengine.mdtext list.mdtext namedentityextractionengine.mdtext opennlpner.mdtext

Author: rwesten
Date: Fri Nov 23 17:00:31 2012
New Revision: 1412971

URL: http://svn.apache.org/viewvc?rev=1412971&view=rev
Log:
initial documentation for STANBOL-733

Added:
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/opennlpner.mdtext
      - copied unchanged from r1405302, stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/namedentityextractionengine.mdtext
Removed:
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/namedentityextractionengine.mdtext
Modified:
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/keywordlinkingengine.mdtext
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext

Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/keywordlinkingengine.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/keywordlinkingengine.mdtext?rev=1412971&r1=1412970&r2=1412971&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/keywordlinkingengine.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/keywordlinkingengine.mdtext Fri Nov 23 17:00:31 2012
@@ -1,5 +1,11 @@
 Title: The Keyword Linking Engine: custom vocabularies and multiple languages
 
+---
+
+__WARNING:__ This engine is deprecated. Users are encouraged to use the [EntityhubLinkingEngine](entityhublinking) engine instead.
+
+---
+
 The KeywordLinkingEngine is intended to be used to extract occurrences of Entities part of a Controlled Vocabulary in content parsed to the Stanbol Enhancer. To do this words appearing within the text are compared with labels of entities. The Stanbol Entityhub is used to lookup Entities based on their labels.
 
 This documentation first provides information about the configuration options of this engine. This section is mainly intended for users of this engine. The remaining part of this document is rather technical and intended to be read by developers that want to extend this engine or want to know the technical details.

Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext?rev=1412971&r1=1412970&r2=1412971&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext Fri Nov 23 17:00:31 2012
@@ -4,12 +4,6 @@ This provides an overview about all [Enh
 
 ## Preprocessing
 
-* __[Language Identification Engine](langidengine.html)__
-	* language detection for textual content utilizing [Apache Tika](http://tika.apache.org/)
-
-* __[Language Detection Engine](langdetectengine.html)__
-	* language detection for textual content utilizing [language-detection](http://code.google.com/p/language-detection/) Project
-	
 * __[Tika Engine](tikaengine.html)__ (based on [Apache Tika](http://tika.apache.org/))
 	* content type detection
 	* text extraction from various document formats
@@ -18,44 +12,160 @@ This provides an overview about all [Enh
 * __[Metaxa Engine](metaxaengine.html)__
 	* text extraction from various document formats
 	* extraction of metadata from document formats
-	
-## Natural Language Processing
+	* _NOTE_ this engine is not includes in the default Stanbol Launchers
+
+
+## Natural Language Processing (NLP)
+
+This does contain Engines the process textual content sent to the Stanbol Enhancer
+
+### Language Detection
+
+Language detection engines add Language annotations as defined by [STANBOL-613](https://issues.apache.org/jira/browse/STANBOL-613) to the metadata of the [ContentItem](../contentitem)
+
+* __[Language Identification Engine](langidengine.html)__
+	* language detection for textual content utilizing [Apache Tika](http://tika.apache.org/)
+
+* __[Language Detection Engine](langdetectengine.html)__
+	* language detection for textual content utilizing [language-detection](http://code.google.com/p/language-detection/) Project
 
-* __[Named Entity Extraction Enhancement Engine](namedentityextractionengine.html)__ 
+* __CELI language detection Engine__: This engine is part of the CELI enhancement engines (see [STANBOL-583](https://issues.apache.org/jira/browse/STANBOL-583))
+	* Language detected based on a linguagrid.org server hosted by CELI
+
+### Sentence Detection
+
+Sentence detection engines add _Sentences_ to the [AnalyzedText](../nlp/analyzedtext) content part 
+
+* __[OpenNLP Sentence Detection Engine](opennlpsentence)
+	* Sentence Detection based on [OpenNLP](http://opennlp.apache.org)
+
+### Tokenizer Engines
+
+The responsibility of Tokenizer Engines is to add _Tokens_ to the [AnalyzedText](../nlp/analyzedtext) content part
+
+* __[OpenNLP Tokenizer Detection Engine](opennlptoken)
+	* Tokenizer implementation based on [OpenNLP](http://opennlp.apache.org)
+
+### Part of Speech (POS) Tagging
+
+POS tagging engines do add [Part-of-Speech annotations](../nlp/nlpannotations#part-of-speech-pos-annotations) to _Tokens_ present in the [AnalyzedText](../nlp/analyzedtext) content part
+
+* __[OpenNLP POS Tagging Engine](opennlppos)
+	* POS tagger implementation based on [OpenNLP](http://opennlp.apache.org)
+
+### Chunk/Phrase detection
+
+Chunker (or Phrase Detection) Engines do add detected _Chunks_ to the [AnalyzedText](../nlp/analyzedtext) content part. They also annotate added _Chunks_ with the [type of the detected phrase](../nlp/nlpannotations#phrase-annotations)
+
+* __[OpenNLP Chunker Engine](opennlpchunker)
+	* Chunker implementation based on [OpenNLP](http://opennlp.apache.org)
+
+### Named Entity Recognition (NER) Engines
+
+NER engines need to write detected Named Entities as '[fise:TextAnnotation](../enhancementstructure.html#fisetextannotation)'s to the metadata of the [ContentItem](../contentitem). In addition they may also add [NER annotations](../nlp/nlpannotations#name-entity-ner-annotations) to _Chunks_ in the [AnalyzedText](../nlp/analyzedtext) content part
+
+* __[OpenNLP NER Engine](opennlpner)__ 
 	* NLP processing using OpenNLP NER
 	* detects occurrences of persons, places and organizations only
+	* supports [NER annotations](../nlp/nlpannotations#name-entity-ner-annotations)
 
 * __[Custom NER Model Extraction Enhancement Engine](customnermodelengine.html)__ 
 	* NLP processing using OpenNLP NER
-	* uses custom NameFinder modles (user configured)
+	* uses custom NameFinder models (user configured)
 	* supports custom Named Entity types (other than persons, places and organizations
-	
-* __[KeywordLinkingEngine](keywordlinkingengine.html)__
-	* NLP processing using OpenNLP
-	* supports multiple languages
-	* detects occurrences of untyped entities as concepts, takes local taxonomies as linking target	
 
-## Linking Suggestions
+* __CELI NER engine__: This engine is part of the CELI enhancement engines (see [STANBOL-583](https://issues.apache.org/jira/browse/STANBOL-583))
+	* NER based on a linguagrid.org server hosted by CELI
+	* detects occurrences of persons, places and organizations and some other types
+
+* __[OpenCalais Enhancement Engine](opencalaisengine.html)__
+ 	* integrates service from Open Calais. (Note: You need to provide a key in order to use this engine)
+	* can be configured to do only NER and no EntityLinking
+
+
+### Morphological Analysis
+
+This includes Engines that perform some sort of morphological analyses (e.g. lemmatization)
 
-* __[Named Entity Tagging Engine](namedentitytaggingengine.html)__
+* __CELI AnalyzedText Lemmatizer Engine: This engine is part of the CELI enhancement engines (see [STANBOL-583](https://issues.apache.org/jira/browse/STANBOL-583) and [STANBOL-739](https://issues.apache.org/jira/browse/STANBOL-739))
+	* lemmatization support for "it", "da", "de", "ru", "ro"
+
+
+## Linking / Suggestions
+
+This category covers enhancement engines that suggest Entities for features present in the parsed content. An Entity is an uniquely identified resource. Typically it provides (or links to) further information such as the type, a description (text, pictures, videos …), spatial and/or temporal context, links to other entities … . 
+
+* __[Named Entity Linking Engine](namedentitytaggingengine)__
 	* suggest links to several Linked Data Sources (e.g. DBpedia)
 
-* __[Geonames Enhancement Engine](geonamesengine.html)__ 
+* __[Entityhub Linking Engine](entityhublinking)__
+	* [EntityLinkingEngine](entity linking) configuration for the Stanbol Entityhub
+	* consumes NLP processing results form the [AnalyzedText](../nlp/analyzedtext) content part
+	* Links Entities managed by the Entityhub, ReferencedSites or ManagedSites
+	* Supports any language however quality/performance depends on NLP processing support
+
+* __DBpedia Spotlight Annotation Engine__: Integration of the DBpedia Spotlight with the Stanbol Enhancer (see [STANBOL-706](https://issues.apache.org/jira/browse/STANBOL-706))
+	* includes NLP, Entity Linking and Disambiguation of Entities using [DBpedia](http://dbpedia.org) as knowledge base
+	* accesses a remote service
+
+* __[Geonames Enhancement Engine](geonamesengine)__ 
 	* suggests links to geonames.org
 	* provides hierarchical links for locations
+	* accesses a remote service, requires a user account
 
-* __[OpenCalais Enhancement Engine](opencalaisengine.html)__
+* __[OpenCalais Enhancement Engine](opencalaisengine)__
  	* integrates service from Open Calais. (Note: You need to provide a key in order to use this engine)
+	* provides both NER and Entity Linking
+	* accesses a remote service, requires a user account
 
-* __[Zemanta Enhancement Engine](zemantaengine.html)__
+* __[Zemanta Enhancement Engine](zemantaengine)__
 	* integrates the Zemanta services. (Note: You need to provide a key in order to use this engine)
+	* provides both NLP and Entity Linking
+	* accesses a remote service, requires a user account
+
+
+* _[KeywordLinkingEngine](keywordlinkingengine)_ __depreacted__ use [EntityhubLinkingEngine](entityhublinking) instead!
+	* NLP processing using OpenNLP
+	* supports multiple languages
+	* detects occurrences of untyped entities as concepts, takes local taxonomies as linking target	
+
+
+### Sentiment Analyses
+
+This includes Engines that perform word/chunk level sentiment classifications on the [AnalyzedText](../nlp/analyzedtext) content part as well as Engines that summarize those lower level annotations to Sentiments for sentences, sections or the whole text. Sentiment summarizations are represented as 'fise:SentimentAnnotation's (TODO: not yet fully specified (see [STANBOL-760](https://issues.apache.org/jira/browse/STANBOL-760)).
+
+* __Sentiment WordClassifier Engine__: This engine annotates _Tokens_ of the  [AnalyzedText](../nlp/analyzedtext) content part with sentiment annotations (a double value in the range [-1..1]
+	* supports de and en
+	* can be extended to support additional languages by implementing the _SentimentClassifier_ interface
+
+* _Sentiment Summarization Engine_: __under development__ (see [STANBOL-760](https://issues.apache.org/jira/browse/STANBOL-760))
+	* summarizes sentiments on word level to chunks, sentences and the whole text
+	* create 'fise:SentimentAnnotations'
 
+### Disambiguation
+
+Enhancement Engines in this category can disambiguate Entities based on contextual information (e.g. if "Apple" in a sentence refers to the fruit or the company). Based on that such engines can adjust existing Entity suggestions or also create new one.
+
+* __DBpedia Spotlight Disambiguation Engine__: (see [STANBOL-706](https://issues.apache.org/jira/browse/STANBOL-706))
+	* consumes existing fise:TextAnnotations and disambiguate them by using DBpedia Spotlight
+	* create Entity suggestions (fise:EntityAnnotations) for the processed fise:TextAnnotations
+	* accesses a remote service
+
+* _Solr More-like-This Disambiguation Engine_: __under development_ (see [STANBOL-723](https://issues.apache.org/jira/browse/STANBOL-723))
+	* disambiguates Entities managed by the Stanbol Entityhub by using Solr MLT queries
+	* only available via the [disambiguation-engine](http://svn.apache.org/repos/asf/stanbol/branches/disambiguation-engine/) branch
+	* adjusts the fise:confidence of existing fise:EntityAnnotations
 
 
 ## Postprocessing / Other
 
-* _CachingDereferencerEngine_ (deprecated, see dereferencing support of individual engines as well as  [STANBOL-336](https://issues.apache.org/jira/browse/STANBOL-336))
-	* retrieves additional content for presenting the enhancement results.
-	
-* __[Refactor Engine](refactorengine.html)__
+* _NLP 2 RDF Engine_: __under development__ (see [STANBOL-741](https://issues.apache.org/jira/browse/STANBOL-741))
+	* converts NLP processing results stored in the [AnalyzedText](../nlp/analyzedtext) content part to RDF and adds them to the metadata of the [ContentItem](../contentitem)
+	* generated RDF uses the NIF (NLP Interchange Format)
+
+* __[Refactor Engine](refactorengine)__
 	* transforms enhancements according to a target ontology, requires KRES launcher.
+
+
+* _CachingDereferencerEngine_ __deprecated__ (see dereferencing support of individual engines as well as  [STANBOL-336](https://issues.apache.org/jira/browse/STANBOL-336))
+	* retrieves additional content for presenting the enhancement results.