You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by rw...@apache.org on 2012/06/14 07:46:12 UTC

svn commit: r1350093 - /incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructure.mdtext

Author: rwesten
Date: Thu Jun 14 05:46:11 2012
New Revision: 1350093

URL: http://svn.apache.org/viewvc?rev=1350093&view=rev
Log:
first version of EntityTagging

Modified:
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructure.mdtext

Modified: incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructure.mdtext
URL: http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructure.mdtext?rev=1350093&r1=1350092&r2=1350093&view=diff
==============================================================================
--- incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructure.mdtext (original)
+++ incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/enhancementstructure.mdtext Thu Jun 14 05:46:11 2012
@@ -137,7 +137,55 @@ TopicAnnotation are used to categorize/c
 
 ## Entity Tagging
 
-TODO: Work in progress
+Entity Tagging is about suggesting Users Entities instead of Strings to tag their Documents. The difference is very easy to explain. Lets assume a Blogger that uses the tag "Bob Marley" to tag a blog entry. Tagging is all about structuring content - so by tagging it with "Bob Marley" he can not easily find all Documents that uses that tag. However most likely he would also want to create a category of Documents about Reggae music and most likely he would like that Documents tagged with "Bob Marley" are part of that group. 
+
+But while the knowledge that "Bob Marley" is related to "Reggae music" might be obvious for the Blogger it can not be known by the Blgging Tool he uses. So typically the only way to active this is that the Blogger tags the document with both tags.
+
+Entity Tagging tries to work around that by linking Documents with Entities defined by a knowledge base. The fact that Bob Marley is related to Reggae music is nothing novel. [DBpedia](http://dbpedia.org) - the Wikipedia database - does know that and a lot more about - the Entity - [dbpedia:Bob_Marley](dbpedia.org/resource/Bob_Marley). So if the blogger tags his Document with "dbpedia:Bob_Marley" he does not only tag it with "Bob Marley" but also with all the other contextual information provided by DBPedia - including the fact that Bob_Marley was an Reggae interpret.
+
+But this does not only work with famous people, big cities … nowadays the web [links data](http://linked-data.org) of different domains. However this is not only about the Web - it works even better if you also can use Entities relevant to yourself and/or your working environment (Products, CRM information, …).
+
+### Suggest Entities with the Stanbol Enhancer
+
+Requesting the Stanbol Enhancer to analyze a text requires to send an POST the the [RESTful API](enhancerrest.html) of the Stanbol Enhancer.
+
+    curl -X POST -H "Accept: application/rdf+xml" -H "Content-type: text/plain" \
+     --data "The Stanbol enhancer can detect famous cities such as \
+             Paris and people such as Bob Marley." http://{host}:{port}/enhancer
+
+As response you will receive the enhancement results formatted as RDF graph in the serialization specified by the "Accept" header ('application/rdf+xml' in the above example request). This RDF graph contains the information about the Entities extracted from the parsed content. 
+
+The following Figure shows how extracted entities are described in the enhancement results. 
+!['fise:EntityAnnotation' example](es_entityannotation.png "This Example shown an EntityAnnotation that suggests the Entity 'dbpedia:Bob_Marley' for the TextAnnotation")
+
+In principle there are two Resources that are of interest for the Entity tagging use case:
+
+1. EntityAnnotations: Resources with the 'rdf:type' 'fise:EntityAnnotation' do represent the entity suggestions by the Stanbol Enhancer. This resources provide the label, type and most important the URI of the extracted Entity. In addition the value of the fise:confidence' [0..1] can be used as indication how certain the Stanbol Enhancer is about this Entity. 
+2. Entities: This refers to all resources with an incoming 'fise:entity-reference' relation (such as 'dbpedia:Bob_Marley' in the above example). Enhancement Engines can be configured to "dereference" suggested entities - meaning to use the URI of the entity to retrieve additional information. In this case additional information about suggested Entities will be available in the Enhancement results. If this in not the case users will need to dereference suggested entities themselves.
+
+The following steps are typically needed to acquire the information needed to implement an entity tagging user interface:
+
+1. Iterate over all suggested Entities: This are all resources such as "{entity-annotation} rdf:type fise:EntityAnnotation"
+2. Basic information: Those are available directly via the {entity-annotation} to ensure there availability even if the {entity} itself in not not included - dereferenced - in the enhancement results.
+    * URI of the suggested Entity: {entity-annotation} fise:entity-reference {entity}
+    * Label: The value of the fise:entity-label is typically the label via that the Entity was recognized in the analyzed content. Additional labels are typically available via the {entity}
+    * Types: Tha value of the fise:entity-type property of the {entity-annotation} are  the same as the rdf:type values of the {entity}.
+    * Confidence: The 'fise:confidence' value represent how confident the Stanbol Enhancer is about this suggestion. Values are in the range [0..1] where 0 means very uncertain and 1 represent a high certainly.
+3. Dereferenced {entity}: Some EnhancementEngines support to add also information about suggested Entities to the enhancement results - in other words: to dereference suggested entities. In this case additional information about the {entity} can be retrieved directly from the enhancement results. Most important those information include all available labels (in all languages) of the Entity.
+4. Dereferencing suggested Entities: If the suggested Entity is available via the Stanbol Entityhub the {entity-anntotation} does have the 'entityhub:site' property. The value of this property is the name of the ReferencedSite of the Entityhub. To dereference the Entity a GET request to "{stanbol-root-URL}/entityhub/site/{site-name}/entity?id={entity}" need to be used. The "Accept" header of the request need to be set to the according RDF serialization (e.g. "application/rdf+json").
+
+### Content Categorizations:
+
+'fise:TopicAnnotation' instances are used to formally represent categories assigned to the parsed Content. The main difference between extracted Entities and assigned Categories is that extracted Entities do have one or more explicit mentions within the text while assigned Categories are suggested based on the document as a whole - typically they are not explicitly mentioned in the text.
+
+Typically a entity tagging UI will want to distinguish between Categories and Entities because:
+
+* Categories are used to group Content (e.g. Blog posts about Work and private things)
+* Entities are used to search/suggest Blog posts about specific topics (e.g. A blog about some feature implemented with "Apache Solr", a nice event in the "Sternbräu" in "Salzburg")
+
+The usage of 'fise:TopicAnnotation' is similar to EntityAnnotation. They do use the exact same properties ('fise:entity-referene','fise:entity-label',fise:entity-type', 'fise:confidence','entityhub:site'). The only difference is that one need to iterate over '{topic-anntoation} rdf:type fise:TopicAnnotaion'. So typically clients will want to use the exact same code to process {entity-annotation} and {topic-annotation} instances.
+
+In the next section "Entity Disambiguation" an improved version of Entity Tagging is described that allows users to: (1) accept/decline a spotted Entity and than (2) select one of several suggested Entities.
 
 ## Entity Disambiguation