You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2012/06/23 10:01:37 UTC

svn commit: r822837 - in /websites/staging/stanbol/trunk/content: ./ stanbol/docs/trunk/building.html stanbol/docs/trunk/customvocabulary.html stanbol/docs/trunk/tutorial.html

Author: buildbot
Date: Sat Jun 23 08:01:35 2012
New Revision: 822837

Log:
Staging update by buildbot for stanbol

Modified:
    websites/staging/stanbol/trunk/content/   (props changed)
    websites/staging/stanbol/trunk/content/stanbol/docs/trunk/building.html
    websites/staging/stanbol/trunk/content/stanbol/docs/trunk/customvocabulary.html
    websites/staging/stanbol/trunk/content/stanbol/docs/trunk/tutorial.html

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sat Jun 23 08:01:35 2012
@@ -1 +1 @@
-1352665
+1353090

Modified: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/building.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/building.html (original)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/building.html Sat Jun 23 08:01:35 2012
@@ -98,7 +98,7 @@
 <li>the <em>"kres launcher"</em> activates the rules machinery only.</li>
 </ul>
 <p>You can launch the server with:</p>
-<div class="codehilite"><pre><span class="nv">$</span> <span class="nv">java</span> <span class="o">-</span><span class="n">Xmx1g</span> <span class="o">-</span><span class="n">jar</span> <span class="n">full</span><span class="sr">/target/o</span><span class="n">rg</span><span class="o">.</span><span class="n">apache</span><span class="o">.</span><span class="n">stanbol</span><span class="o">.</span><span class="n">launchers</span><span class="o">.</span><span class="n">full</span><span class="o">-</span><span class="mf">0.9</span><span class="o">-</span><span class="n">SNAPSHOT</span><span class="o">.</span><span class="n">jar</span>
+<div class="codehilite"><pre><span class="nv">$</span> <span class="nv">java</span> <span class="o">-</span><span class="n">Xmx1g</span> <span class="o">-</span><span class="n">XX:MaxPermSize</span><span class="o">=</span><span class="mi">256</span><span class="n">m</span> <span class="o">-</span><span class="n">jar</span> <span class="n">full</span><span class="sr">/target/o</span><span class="n">rg</span><span class="o">.</span><span class="n">apache</span><span class="o">.</span><span class="n">stanbol</span><span class="o">.</span><span class="n">launchers</span><span class="o">.</span><span class="n">full</span><span class="o">-</span><span class="mf">0.9</span><span class="o">-</span><span class="n">SNAPSHOT</span><span class="o">.</span><span class="n">jar</span>
 </pre></div>
 
 

Modified: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/customvocabulary.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/customvocabulary.html (original)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/customvocabulary.html Sat Jun 23 08:01:35 2012
@@ -88,84 +88,146 @@
 <p>Creating your own indexes is the preferred way of working with custom vocabularies. Small vocabularies can also be uploaded to the Entityhub as ontologies, directly. A downside to this approach is that only one ontology per installation is supported.</p>
 <p>If you want to use multiple datasets in parallel, you have to create a local index for these datasets and configure the Entityhub to use them. In the following we will focuses on the main case, which is: Creating and using a local <a href="http://lucene.apache.org/solr/">Apache Solr</a> index of a custom vocabulary, e.g. a SKOS thesaurus or taxonomy of your domain.</p>
 <h2 id="creating-and-working-with-custom-local-indexes">Creating and working with custom local indexes</h2>
-<p>Apache Stanbol provides the machinery to start with vocabularies in standard languages such as <a href="http://www.w3.org/2004/02/skos/">SKOS</a> or <a href="http://www.w3.org/TR/rdf-primer/">RDF</a> encoded data sets. The Apache Stanbol components, which are needed for this functionality are the Entityhub and its indexing tool for creating and managing the index and <a href="enhancer/engines/list.html">enhancement engines</a> that make use of the indexes during the enhancement process.</p>
-<h3 id="a-create-your-own-index">A. Create your own index</h3>
-<p><strong>Step 1 : Compile and assemble the indexing tool</strong></p>
+<p>Apache Stanbol provides the machinery to start with vocabularies in standard languages such as <a href="http://www.w3.org/2004/02/skos/">SKOS</a> or <a href="http://www.w3.org/TR/rdf-primer/">RDF</a> encoded data sets. The Apache Stanbol components, which are needed for this functionality are the Entityhub and its indexing tool for creating and managing the index and <a href="enhancer/engines">enhancement engines</a> that make use of the indexes during the enhancement process.</p>
+<p>To create and import your own vocabulary to the Apache Stanbol Entityhub you need to follow the following Steps</p>
+<h3 id="step-1-compile-and-assemble-the-indexing-tool">Step 1 : Compile and assemble the indexing tool</h3>
 <p>The indexing tool provides a default configuration for creating an <a href="http://lucene.apache.org/solr/">Apache Solr</a> index of RDF files (e.g. a SKOS export of a thesaurus or a set of foaf files).</p>
-<p>If not yet built during the Apache Stanbol build process of the Entityhub call</p>
-<div class="codehilite"><pre><span class="p">{</span><span class="n">root</span><span class="p">}</span><span class="o">/</span><span class="n">entityhub</span> <span class="nv">$</span> <span class="nv">mvn</span> <span class="n">install</span>
+<p>To build the indexing tool from source - recommended - you will need to checkout Apache Stanbol form SVN (or <a href="../../downloads">download</a> a source-release). Instructions for this can be found <a href="tutorial.html">here</a>. However if you want to skip this you can also obtain a <a href="http://dev.iks-project.eu/downloads/stanbol-launchers/">binary version</a> from the IKS development server (search the sub-folders of the different versions for a file named like "<code>org.apache.stanbol.entityhub.indexing.genericrdf-*-jar-with-dependencies.jar</code>").</p>
+<p>In case you downloaded or "svn co" the source to {stanbol-source} and successfully build the source as described in the <a href="tutorial.html">Tutorial</a> you still need to assembly the indexing tool by</p>
+<div class="codehilite"><pre><span class="o">{</span>stanbol-source<span class="o">}</span>/entityhub/indexing/genericrdf/ <span class="nv">$ </span>mvn assembly:single
 </pre></div>
 
 
-<p>and then</p>
-<div class="codehilite"><pre><span class="p">{</span><span class="n">root</span><span class="p">}</span><span class="sr">/entityhub/i</span><span class="n">ndexing</span><span class="sr">/genericrdf/</span> <span class="nv">$</span> <span class="nv">mvn</span> <span class="n">assembly:single</span>
+<p>and move the assembled indexing tool from</p>
+<div class="codehilite"><pre><span class="o">{</span>stanbol-source<span class="o">}</span>/entityhub/indexing/genericrdf/target/org.apache.stanbol.entityhub.indexing.genericrdf-*-jar-with-dependencies.jar
 </pre></div>
 
 
-<p>Move the generated tool from</p>
-<div class="codehilite"><pre><span class="p">{</span><span class="n">root</span><span class="p">}</span><span class="sr">/entityhub/i</span><span class="n">ndexing</span><span class="sr">/genericrdf/</span><span class="n">target</span><span class="o">/</span><span class="n">org</span><span class="o">.</span><span class="n">apache</span><span class="o">.</span><span class="n">stanbol</span><span class="o">.</span><span class="n">entityhub</span><span class="o">.</span><span class="n">indexing</span><span class="o">.</span><span class="n">genericrdf</span><span class="o">-*-</span><span class="n">jar</span><span class="o">-</span><span class="n">with</span><span class="o">-</span><span class="n">dependencies</span><span class="o">.</span><span class="n">jar</span>
-</pre></div>
-
-
-<p>into a new directory. We will refer to this new directory as {indexroot}.</p>
-<p><strong>Step 2 : Create the index</strong></p>
+<p>into a the directory you plan to use for the indexing process. We will refer to this directory as {indexing-working-dir}.</p>
+<h3 id="step-2-create-the-index">Step 2 : Create the index</h3>
 <p>Initialize the tool with</p>
-<div class="codehilite"><pre><span class="p">{</span><span class="n">indexroot</span><span class="p">}</span> <span class="nv">$</span> <span class="nv">java</span> <span class="o">-</span><span class="n">jar</span> <span class="n">org</span><span class="o">.</span><span class="n">apache</span><span class="o">.</span><span class="n">stanbol</span><span class="o">.</span><span class="n">entityhub</span><span class="o">.</span><span class="n">indexing</span><span class="o">.</span><span class="n">genericrdf</span><span class="o">-*-</span><span class="n">jar</span><span class="o">-</span><span class="n">with</span><span class="o">-</span><span class="n">dependencies</span><span class="o">.</span><span class="n">jar</span> <span class="n">init</span>
+<div class="codehilite"><pre><span class="o">{</span>indexing-working-dir<span class="o">}</span> <span class="nv">$ </span>java -jar org.apache.stanbol.entityhub.indexing.genericrdf-*-jar-with-dependencies.jar init
 </pre></div>
 
 
-<p>This will create a directory for the configuration files with a default configuration, another directory for the sources, and a distribution directory for the resulting files. Make sure, that you adapt the default configuration with at least </p>
+<p>This will create/initialize the default configuration for the Indexing Tool including (relative to {indexing-working-dir}):</p>
 <ul>
-<li>the id/name and license information of your data and </li>
-<li>namespaces and properties mapping you want to include in the index (see example of a <a href="examples/anl-mappings.txt">mappings.txt</a> including default and specific mappings for one dataset)</li>
+<li><code>/indexing/config</code>: Folder containing the default configuration including the "indexing.properties" and "mappings.txt" file.</li>
+<li><code>/indexing/resources</code>: Folder with the source files used for indexing including the "rdfdata" folder where you will need to copy the RDF files to be indexed</li>
+<li><code>/indexing/destination</code>: Folder used to write the data during the indexing process.</li>
+<li><code>/indexing/dist</code>: Folder where you will find the <code>{name}.solrindex.zip</code> and <code>org.apache.stanbol.data.site.{name}-{version}.jar</code> files needed to install your index to the Apache Stanbol Entityhub.</li>
 </ul>
-<p>Then, copy your source files into the source directory <code>indexing/resources/rdfdata</code>. The Entityhub indexing tool supports several standard formats for RDF, multiple files and archives of them as source input. </p>
-<p><em>For more details about possible configurations, please consult the <a href="https://github.com/apache/stanbol/blob/trunk/entityhub/indexing/genericrdf/README.md">README</a>.</em></p>
+<p>After the initialization you will need to provide the following configurations in files located in the configuration folder (<code>{indexing-working-dir}/indexing/config</code>)</p>
+<ul>
+<li>Within the <code>indexing.properties</code> file you need to set the {name} of your index by changing the value of the "name" property. In addition you should also provide a "description". At the end of the indexing.properties file you can also specify the license and attribution for the data you index. The Apache Entityhub will ensure that those information will be included with any entity data returned for requests.</li>
+<li>If the data you index do use some none common namespaces you will need to add those to the <code>mapping.txt</code> file (here is an <a href="examples/anl-mappings.txt">example</a>  including default and specific mappings for one dataset)</li>
+</ul>
+<p>Finally you will also need to copy your source files into the source directory <code>{indexing-working-dir}/indexing/resources/rdfdata</code>. All files within this directory will be indexed. THe indexing tool support most common RDF serialization. You can also directly index compressed RDF files.</p>
+<p>For more details about possible configurations, please consult the <a href="https://github.com/apache/stanbol/blob/trunk/entityhub/indexing/genericrdf/README.md">README</a>.</p>
 <p>Once all source files are in place, you can start the index process by running</p>
-<div class="codehilite"><pre><span class="p">{</span><span class="n">indexroot</span><span class="p">}</span> <span class="nv">$</span> <span class="nv">java</span> <span class="o">-</span><span class="n">Xmx1024m</span> <span class="o">-</span><span class="n">jar</span> <span class="n">org</span><span class="o">.</span><span class="n">apache</span><span class="o">.</span><span class="n">stanbol</span><span class="o">.</span><span class="n">entityhub</span><span class="o">.</span><span class="n">indexing</span><span class="o">.</span><span class="n">genericrdf</span><span class="o">-*-</span><span class="n">jar</span><span class="o">-</span><span class="n">with</span><span class="o">-</span><span class="n">dependencies</span><span class="o">.</span><span class="n">jar</span> <span class="nb">index</span>
+<div class="codehilite"><pre><span class="o">{</span>indexing-working-dir<span class="o">}</span> <span class="nv">$ </span>java -Xmx1024m -jar org.apache.stanbol.entityhub.indexing.genericrdf-*-jar-with-dependencies.jar index
 </pre></div>
 
 
 <p>Depending on your hardware and on complexity and size of your sources, it may take several hours to built the index. As a result, you will get an archive of an <a href="http://lucene.apache.org/solr/">Apache Solr</a> index together with an OSGI bundle to work with the index in Stanbol.</p>
-<p><strong>Step 3 : Initialize the index within Apache Stanbol</strong></p>
-<p>We assume that you already have a running Apache Stanbol instance. Copy the ZIP archive into the <code>datafiles</code> folder of that instance. Now open the OSGi administration console of your in a web browser. Navigate to the "Bundles" tab and start the newly created bundle named <code>org.apache.stanbol.data.site.{name}-{version}.jar</code>.</p>
-<h3 id="b-configure-and-use-the-index-with-enhancement-engines">B. Configure and use the index with enhancement engines</h3>
-<p>Before you can make use of the custom vocabulary you need to decide, which kind of enhancements you want to support. If your enhancements are named entities in its strict sense (e.g. persons, locations, organizations), then you may use the standard NER engine in combination with the EntityLinkingEngine to configure the link destinations of the found entities.</p>
-<p>In case, you want to match all kinds of named entities and concepts from your custom vocabulary, you should work with the <a href="enhancer/engines/keywordlinkingengine.html">KeywordLinkingEngine</a> to both, find occurrences and to link them to custom entities. In this case, you'll get only results, if there is a match, while in the case above, you even get entities, where you don't find exact links. This approach will have its advantages when you need to have a high recall rate on your custom entities.</p>
-<p>In the following the configuration options are described briefly.</p>
-<p><strong>Use the KeywordLinkingEngine only</strong></p>
-<p>(1) To make sure, that the enhancement process uses the KeywordLinkingEngine only, deactivate the "standard NLP" enhancement engines, especially the NamedEntityExtractionEnhancementEngine (NER) and the EntityLinkingEngine before to work with the TaxonomyLinkingEngine.</p>
-<p>(2) Open the configuration console at http://localhost:8080/system/console/configMgr and navigate to the KeywordLinkingEngine. Its main options are configurable via the UI.</p>
-<ul>
-<li>Referenced Site: {put the id/name of your index}</li>
-<li>Label Field: {the property to search for}</li>
-<li>Type Field: {types of matched entries} </li>
-<li>Redirect Field: {redirection links}</li>
-<li>Redirect Mode: {ignore, follow, add values}</li>
-<li>Min Token Length: {set minimal token length}</li>
-<li>Suggestions: {maximum number of suggestions}</li>
-<li>Languages: {languages to use}</li>
-</ul>
-<p><em>Full details on the engine and its configuration are available <a href="enhancer/engines/keywordlinkingengine.html">here</a>.</em></p>
-<p><strong>Use several instances of the KeywordLinkingEngine</strong></p>
-<p>To work at the same time with different instances of the KeywordLinkingEngine ... FIXME</p>
-<p>This can be useful in cases, where you have two or more distinct custom vocabularies/indexes and/or if you want to combine your specific domain vocabulary with general purpose datasets such as dbpedia or others.</p>
-<p><strong>Use the KeywordLinkingEngine together with the NER engine and the EntityLinkingEngine</strong></p>
-<p>If your text corpus contains common entities as well a enterprise specific entities and you are interested getting enhancements for both, you may also use the KeywordLinkingEngine for your custom thesaurus and the NER engine in combination with the EntityLinkingEngine, targeting at e.g. dbpedia, at the same time. </p>
-<h2 id="examples">Examples</h2>
-<p>You can find guidance for the following indexers in the README files at <code>{root}/entityhub/indexing/{name-for-indexer}</code></p>
+<p><em>IMPORTANT NOTES:</em> </p>
 <ul>
-<li><a href="http://dbpedia.org/">dbpedia</a> dataset (Wikipedia data)
- For dbpedia, there is also a <a href="http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/indexing/dbpedia/fetch_prepare.sh">script</a> available, which helps in generating your own dbpedia index.</li>
-<li><a href="http://www.geonames.org">geonames.org</a> dataset (geolocation data)</li>
-<li><a href="http://dblp.uni-trier.de/">DBLP</a> dataset (scientific bibliography data)</li>
+<li>
+<p>The import of the RDF files to the Jena TDB triple store - used as source for the indexing - takes a lot of time. Because of that imported data are reused for multiple runs of the indexing tool. This has two important effects users need to be aware of:</p>
+<ol>
+<li>Already imported RDF files should be removed from the <code>{indexing-working-dir}/indexing/resources/rdfdata</code> to avoid to re-import them on every run of the tool</li>
+<li>If the RDF data change you will need to delete the Jena TDB store so that those changes are reflected in the created index. To do this delete the <code>{indexing-working-dir}/indexing/resources/tdb</code> folder</li>
+</ol>
+</li>
+<li>
+<p>Also the destination folder <code>{indexing-working-dir}/indexing/destination</code> is NOT deleted between multiple calls to index. This has the effect that Entities indexed by previous indexing calls are not deleted. While this allows to index a dataset in multiple steps - or even to combine data of multiple datasets in a single index - this also means that you will need to delete the destination folder if the RDF data you index have changed - especially if some Entities where deleted. </p>
+</li>
 </ul>
-<h2 id="demos-and-resources">Demos and Resources</h2>
+<h3 id="step-3-initialize-the-index-within-apache-stanbol">Step 3 : Initialize the index within Apache Stanbol</h3>
+<p>We assume that you already have a running Apache Stanbol instance at http://{stanbol-host} and that {stanbol-working-dir} is the working directory of that instance on the local hard disk. To install the created index you need to </p>
 <ul>
-<li>The full <a href="http://dev.iks-project.eu:8081/">demo</a> IKS installation of Apache Stanbol is configured to also work with an environmental thesaurus - if you test it with unstructured text from the domain, you should get enhancements with additional results for specific concepts.</li>
-<li>Download custom test indexes and installer bundles for Apache Stanbol from <a href="http://dev.iks-project.eu/downloads/stanbol-indices/">here</a> (e.g. for GEMET environmental thesaurus, or a big dbpedia index).</li>
-<li>Another example using metadata from the Austrian National Library is described <a href="http://blog.iks-project.eu/using-custom-vocabularies-with-apache-stanbol/">here</a>.</li>
+<li>copy the "{name}.solrindex.zip" file to the <code>{stanbol-working-dir}/stanbol/datafiles</code> directory (NOTE if you run the 0.9.0-incubating version the path is <code>{stanbol-working-dir}/sling/datafiles</code>.</li>
+<li>install the <code>org.apache.stanbol.data.site.{name}-{version}.jar</code> to the OSGI environment of your Stanbol instance e.g. by using the Bundle tab of the Apache Felix web console at </code>http://{stanbol-host}/system/console/bundles</code></li>
 </ul>
+<p>You find both files in the <code><code>{indexing-working-dir}/indexing/dist/</code> folder.</p>
+<p>After the installation your data will be available at</p>
+<div class="codehilite"><pre><span class="n">http:</span><span class="sr">//</span><span class="p">{</span><span class="n">stanbol</span><span class="o">-</span><span class="n">instance</span><span class="p">}</span><span class="sr">/entityhub/si</span><span class="n">te</span><span class="o">/</span><span class="p">{</span><span class="n">name</span><span class="p">}</span>
+</pre></div>
+
+
+<p>You can use the Web UI of the Stanbol Enhancer to explore your vocabulary. Note that in case of big vocabulary it might take some time until the site becomes functional.</p>
+<h2 id="b-configure-and-use-the-index-with-the-stanbol-enhancer">B. Configure and use the index with the Stanbol Enhancer</h2>
+<p>This section covers how to configure the Stanbol Enhancer to recognize and link Entities of your custom vocabulary with processed Documents.</p>
+<p>Generally there are two possible ways you can use to recognize Entities of your Vocabulary:</p>
+<ol>
+<li><strong>Named Entity Linking</strong>: This first uses Named Entity Recoqunition (NER) for spotting "Named Entities" in the text and second try to link those "Named Entities" with Entities defined in your Vocabulary. This approach is limited to Entities with the type Person, Organization and Places. So if your vocabulary contains entities of other types they will not be recognized. In addition it also requires the availability of NER for the language(s) of the processed documents.</li>
+<li><strong>Keyword Linking</strong>: This uses the labels of Entities in your vocabulary for the recognition and linking process. Natural Language Processing (NLP) techniques such as Part-of-Speach (POS) detection can be used to improve performance and results but this works also without NLP support. As extraction and linking is based on labels mentioned in the analyzed content this method has no restrictions regarding the types of your Entities.</li>
+</ol>
+<p>For more information about this you might also have a look at the introduction of the <a href="multilingual">multi lingual</a> Usage scenario.</p>
+<p><em>TIP</em>: If you are unsure about what to use you can also start with configuring both options to give it a try. </p>
+<p>Depending on if you want to use Named Entity Linking or Keyword Linking the configuration of the <a href="enhancer/chains">Enhancement Chain</a> and the <a href="enhancer/engine">Enhancement Engine</a> making use of your vocabulary will be different.</p>
+<h3 id="configuring-named-entity-linking">Configuring Named Entity Linking</h3>
+<p>In case "Named Entity Linking" is used the linking with the custom vocabulary is done by the <a href="enhencer/engines/namedentitytaggingengine.html">Named Entity Tagging Engine</a>.
+For the configuration of this engine you need to provide the following parameters</p>
+<ol>
+<li>The "name" of the enhancement engine. It is recommended to use "{name}Linking" - where {name} is the name of your vocabulary as used in part A. of this scenario</li>
+<li>The name of the Referenced Site holding your vocabulary. Here you have to configure the {name}</li>
+<li>Enable/Disable Persons, Organizations and Places and if enabled configure the <code>rdf:type</code> used by your vocabulary for those type. If you do not want to restrict the type you can also leave the type field empty.</li>
+<li>Define the property used to match against the "Named Entities" detected by the used NER engine(s)</li>
+</ol>
+<p>For detailed information please see the documentation of the <a href="enhencer/engines/namedentitytaggingengine.html">Named Entity Tagging Engine</a>.</p>
+<p>Note that for using Named Entity Linking you need also ensure that an Enhancement Engine that provides NER is available in the <a href="enhancer/chains">Enhancement Chain</a>. By default Apache Stanbol includes three different Engines that provide this feature: (1) <a href="enhancer/engines/namedentityextractionengine.html">Named Entity Extraction Enhancement Engine</a> based on <a href="http://opennlp.apache.org">OpenNLP</a>, (2) CELI NER engine based on the <a href="http://Linguagrid.org">linguagrid.org</a> service and (3) OpenCalais Enhancement Engine](enhancer/engines/ opencalaisengine.html) based on <a href="http://opencalais.com">OpenCalais</a>. Note that the later two options will require to send your content to the according services.</p>
+<p>So a typical <a href="enhancer/chains">Enhancement Chain</a> for Named Entity Linking with your vocabulary might look like</p>
+<ul>
+<li>"langid" - <a href="enhancer/engines/langidengine.html">Language Identification Engine</a> - to detect the language of the parsed content - a pre requirement of all NER engines</li>
+<li>"ner" - for NER support in English, Spanish and Dutch via OpenNLP</li>
+<li>"celiNer" - for NER support in French and Italien via the CELI NER engine</li>
+<li>"{name}Linking - the <a href="enhencer/engines/namedentitytaggingengine.html">Named Entity Tagging Engine</a> for your vocabulary as configured above.</li>
+</ul>
+<p>Both the <a href="enhancer/chains/weightedchain.html">Weighted Chain</a> and the <a href="enhancer/chains/listchain.html">List Chain</a> can be used for the configuration of such a chain.</p>
+<h3 id="configure-keyword-linking">Configure Keyword Linking</h3>
+<p>In case you want to use "Keyword Linking" to extract and link Entities of your vocabulary you will need to configure the <a href="enhancer/engines/keywordlinkingengine.html">Keyword Linking Engine</a> accordingly.</p>
+<p>Here are the most important configuration options provided by the Keyword Linking Enginewhen configured via the <a href="http://localhost:8080/system/console/configMgr">Configuration Tab</a> of the Apache Felix WebConsole - http://{host}:{port}/system/console/configMgr. For the full list and detailed information please see the <a href="enhancer/engines/keywordlinkingengine.html">documentation</a>).</p>
+<ol>
+<li>The "Name" of the enhancement engine. It is recommended to use "{name}Keyword" - where {name} is the name of your vocabulary as used in part A. of this scenario</li>
+<li>The name of the "Referenced Site" holding your vocabulary. Here you have to configure the {name}</li>
+<li>The "Label Field" is the URI of the property in your vocabulary providing the labels used for matching. You can only use a single field. If you want to use values of several fields you have two options: (1) to adapt your indexing configuration to copy the values of those fields to a single one (e.g. the values of "skos:prefLabel" and "skos:altLabel" are copied to "rdfs:label" in the default configuration of the Entityhub indexing tool (see {indexing-working-dir}/indexing/config/mappings.txt) (2) to configure multiple Keyword Linking Engine(s) - one for each label field. Option (1) is preferable as long as you do not need to use different configurations for the different labels.</li>
+<li>The "Type Mappings" might be interesting for you if your vocabulary contains custom types as those mappings can be used to map 'rdf:type's of Entities in your Vocabulary to 'dc:type's used for 'fise:TextAnnotation's - created by the Stanbol Enhancer to annotate occurrences of extracted Entities in the parsed text. See the <a href="enhancer/engines/keywordlinkingengine.html#type-mappings-syntax">Type Mapping Syntax</a> and the <a href="enhancementusage.html#entity-tagging-with-disambiguation-support">Usage Scenario for the Stanbol Enhancement Structure</a> for details.</li>
+</ol>
+<p>A typical <a href="enhancer/chains">Enhancement Chain</a> for Named Entity Linking with your vocabulary might look like</p>
+<ul>
+<li>"langid" - <a href="enhancer/engines/langidengine.html">Language Identification Engine</a> - to detect the language of the parsed content - a pre requirement of the Keyword Linking Engine.</li>
+<li>"{name}Keyword - the <a href="enhancer/engines/keywordlinkingengine.html">Keyword Linking Engine</a> for your vocabulary as configured above.</li>
+</ul>
+<p>Both the <a href="enhancer/chains/weightedchain.html">Weighted Chain</a> and the <a href="enhancer/chains/listchain.html">List Chain</a> can be used for the configuration of such a chain.</p>
+<h3 id="how-to-use-enhancementchains">How to use EnhancementChains</h3>
+<p>In the default configuration the Stanbol Enhancer provides two Enhancement chains: (1) a "default" chain that includes all currently active <a href="enhancer/engines">Enhancement Engine</a>s and (2) the "language" Chain that is intended to be used to detect the language of parsed content.</p>
+<p>As soon as Stanbol users start to add own vocabularies to the Stanbol Entityhub and configure <a href="enhencer/engines/namedentitytaggingengine.html">Named Entity Tagging Engine</a> or <a href="enhancer/engines/keywordlinkingengine.html">Keyword Linking Engine</a> for them the default chain - that includes all active engines - becomes unprintable very soon. This is because those users might most likely want to deactivate the "default" chain and configure their own - as described above. This section provides more information on how to do that.</p>
+<p><strong>Deactivate the Chain of all active Enhancement Engines</strong></p>
+<p>Users that add additional EnhancementEngines might need to deactivate the Enhancement Chain that includes all active engines. This can be done in the configuration tab of the Felic Webconsole - <a href="http://localhost:8080/system/console/configMgr">http://{stabol-host}/system/console/configMgr</a>. Open the configuration dialog of the "Apache Stanbol Enhancer Chain: Default Chain" component and deactivate it.</p>
+<p><strong>Change the Enhancement Chain bound to "/enhancer"</strong></p>
+<p>The Enhancement Chain bound to </p>
+<div class="codehilite"><pre><span class="n">http:</span><span class="sr">//</span><span class="p">{</span><span class="n">stanbol</span><span class="o">-</span><span class="n">host</span><span class="p">}</span><span class="o">/</span><span class="n">enhancer</span>
+</pre></div>
+
+
+<p>is determined by the following rules</p>
+<ol>
+<li>the Chain with the name "default". If more than one Chain is present with that name, than the above rules for resolving name conflicts apply. If none,</li>
+<li>the Chain with the highest "service.ranking". If several have the same ranking,</li>
+<li>the Chain with the lowest "service.id".</li>
+</ol>
+<p>So you can change this by configuring the names and/or the "service.ranking" of the Enhancements Chians. NOTE also that (2) and (3) are also used to resolve name conflicts of chains. So if you configure two Enhancement Chains with the same name only the one with the highest "service.ranking" and lowest "service.id" will be accessible via the restful API. </p>
+<h2 id="examples">Examples</h2>
+<p>While this usage scenario provides the basic information about how to index/use custom vocabularies there are a lot of additional possibilities to configure the indexing process and the enhancement engines.</p>
+<p>If you are interested in the more advanced options the following resources/examples might be of interest to you.</p>
+<ul>
+<li><a href="https://github.com/apache/stanbol/tree/17db70cb87ae5bbb905c1dbe76fbe4c0ca1dd90d/entityhub/indexing/genericrdf">Readme</a> of the generic RDF indexing tool (see also "{stanbol-source-root}/entityhub/indexing/genericrdf" if you have obtained the source code of Apache Stanbol).</li>
+<li><a href="https://github.com/apache/stanbol/tree/17db70cb87ae5bbb905c1dbe76fbe4c0ca1dd90d/demos/ehealth">eHealth</a> example: This provides an indexing and enhancement engine configuration for 4 datasets of the life science domain. It goes into some of the details - such as customized Solr schema.xml configuration for the Apache Stanbol Entityhub; Keyword Linking Engine configurations optimized for extracting alpha-numeric IDs; using LD-Path to merge information of different datasets by following owl:sameAs relations; ... (see also "{stanbol-trunk}/demo/ehealth" if you have checked out the trunk Apache Stanbol). In addition this example may also provide some information on how to automate some of the steps described here by using shell scripts and maven. </li>
+<li>In addition to the Generic RDF Indexing Tool there are also two other special versions for <a href="http://dbpedia.org/">dbpedia</a> ("{stanbol-trunk}/entityhub/indexing/dbpedia") and <a href="http://dblp.uni-trier.de/">DBLP</a> ("{stanbol-trunk}/entityhub/indexing/dblp"). While you will not want to use this version to index your vocabularies the default configurations of those tools might still provide some valuable information.</li>
+</ul>
+<h2 id="demos-and-resources">Demos and Resources</h2>
+<p>The IKS development server runs an rather advanced configuration of Stanbol with a lot of custom datasets and according Enhancement Chain configuration. You can access this Stanbol instance at <a href="http://dev.iks-project.eu:8081/">http://dev.iks-project.eu:8081/</a>. In addition this server also hosts a set of <a href="http://dev.iks-project.eu/downloads/stanbol-indices/">prebuilt indexes</a></p>
   </div>
   
   <div id="footer">

Modified: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/tutorial.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/tutorial.html (original)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/tutorial.html Sat Jun 23 08:01:35 2012
@@ -133,7 +133,7 @@
 <h2 id="advanced-explore-apache-stanbol-full-launcher">Advanced: Explore Apache Stanbol "full" launcher</h2>
 <p>The full (including experimental) features of Apache Stanbol can be accessed via Apache Stanbol's "full launcher". See the <a href="components.html">list of all available components</a> and their features.</p>
 <p>To start the full launcher, you just have to execute its JAR via the following command:</p>
-<div class="codehilite"><pre><span class="c">% java -Xmx1g -jar full/target/org.apache.stanbol.launchers.full-{snapshot-version}-SNAPSHOT.jar</span>
+<div class="codehilite"><pre><span class="c">% java -Xmx1g -XX:MaxPermSize=256m -jar full/target/org.apache.stanbol.launchers.full-{snapshot-version}-SNAPSHOT.jar</span>
 </pre></div>
   </div>