You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by wk...@apache.org on 2011/06/30 10:22:03 UTC

svn commit: r1141435 - /incubator/stanbol/trunk/enhancer/engines/opencalais/README.md

Author: wkasper
Date: Thu Jun 30 08:22:02 2011
New Revision: 1141435

URL: http://svn.apache.org/viewvc?rev=1141435&view=rev
Log:
README.md added

Added:
    incubator/stanbol/trunk/enhancer/engines/opencalais/README.md

Added: incubator/stanbol/trunk/enhancer/engines/opencalais/README.md
URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/opencalais/README.md?rev=1141435&view=auto
==============================================================================
--- incubator/stanbol/trunk/enhancer/engines/opencalais/README.md (added)
+++ incubator/stanbol/trunk/enhancer/engines/opencalais/README.md Thu Jun 30 08:22:02 2011
@@ -0,0 +1,93 @@
+# OpenCalais Enhancement Engine
+
+The **OpenCalais Enhancement Engine** provides an interface to the [OpenCalais
+Webservice](http://www.opencalais.com/) for Named Entity Recognition (NER).
+
+## Technical description
+
+The engine will send the text of content item to the OpenCalais service and
+retrieve the NER annotations in RDF format.  The OpenCalais annotations are
+added to the content item's metadata as Stanbol text enhancement structures.
+
+The engine natively supports the mime types *text/plain* and
+*text/html*. Additionally, text can be processed that is provided in the content
+item's metadata as value of the property
+
+    http://www.semanticdesktop.org/ontologies/2007/01/19/nie#plainTextContent
+
+Supported languages are
+
+* English (en)
+* French (fr)
+* Spanish (es)
+
+## Requirements for use and configuration options
+
+The use of this component requires an API key from OpenCalais. Without
+providing an API key, the engine will not do anything.  Such a key can be
+obtained from [http://www.opencalais.com/APIkey](http://www.opencalais.com/APIkey).
+
+In the OSGi configuration the key is set as value of the property
+
+    org.apache.stanbol.enhancer.engines.opencalais.license
+
+
+Also, the unit tests require the API key. Without the key some tests will be
+skipped. For Maven the key can be set as a system property on the command line:
+
+    mvn -Dorg.apache.stanbol.enhancer.engines.opencalais.license=YOUR_API_KEY [install|test]
+
+
+The following configuration properties are defined:
+
+* <tt>org.apache.stanbol.enhancer.engines.opencalais.license</tt>
+
+    The OpenCalais license key that **must** be defined.
+
+* <tt>org.apache.stanbol.enhancer.engines.opencalais.url</tt>
+
+    The URL of the OpenCalais RESTful service. That needs only be changed
+    when OpenCalais should change its web service address.
+
+* <tt>org.apache.stanbol.enhancer.engines.opencalais.typeMap</tt>
+    
+    The value is the name
+    of a file for mapping the NER types from OpenCalais to other types. By
+    default, a mapping to the DBPedia types is provided in order to achieve
+    compatibility with the Stanbol OpenLNLP-NER engine.  If no mapping is
+    desired one might pass an empty mapping file. Types for which no
+    mapping is defined are passed as is to the metadata.  The syntax of the
+    mapping table is similar to that of Java property files. Each entry
+    takes the form
+    
+    CalaisTypeURI=TargetTypeURI
+    
+* <tt>org.apache.stanbol.enhancer.engines.opencalais.NERonly</tt>
+
+    A Boolean property to
+    specify whether in addition to the NER enhancements also the OpenCalais
+    Linked Data references are included as entity references. By default,
+    these are omitted.
+
+## Usage
+
+Assuming that the Stanbol endpoint with the full launcher is running at
+
+    http://localhost:8080
+
+the license key has been defined and the engine is activated, from the
+command line commands like this can be used for submitting some text file as content item:
+
+* stateless interface
+
+    curl -i -X PUT -H "Content-Type:text/plain" -T testfile.txt http://localhost:8080/engines
+
+* stateful interface
+
+    curl -i -X PUT -H "Content-Type:text/plain" -T testfile.txt http://localhost:8080/contenthub/content/someFileId
+
+Alternatively, the Stanbol web interface can be used for submitting documents
+and viewing the metadata at
+
+    http://localhost:8080/contenthub
+