You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2013/10/03 15:02:37 UTC

svn commit: r881015 - in /websites/staging/stanbol/trunk/content: ./ docs/trunk/components/enhancer/engines/fstengine-config-fstfolder.png docs/trunk/components/enhancer/engines/lucenefstlinking.html

Author: buildbot
Date: Thu Oct  3 13:02:37 2013
New Revision: 881015

Log:
Staging update by buildbot for stanbol

Modified:
    websites/staging/stanbol/trunk/content/   (props changed)
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-fstfolder.png
    websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Oct  3 13:02:37 2013
@@ -1 +1 @@
-1528830
+1528838

Modified: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-fstfolder.png
==============================================================================
Binary files - no diff available.

Modified: websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html
==============================================================================
--- websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html (original)
+++ websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html Thu Oct  3 13:02:37 2013
@@ -120,12 +120,12 @@ Configurations can be created by using t
 <p>This is the full list of supported Field encodings:</p>
 <ul>
 <li>SolrYard: This supports the encoding use by the Stanbol Entityhub SolrYard implementation to encode RDF data types and language literals. If you configure the FST Linking Engine for a Solr index build for the SolrYard you need to use this encoding</li>
-<li>MinusPrefix: {lang}-{field} (e.g. "en-name")</li>
-<li>UnderscorePrefix: {lang}_{field} (e.g. "en_name")</li>
-<li>AtPrefix: {lang}@{field} (e.g. "en@name")</li>
-<li>MinusSuffix: {field}-{lang} (e.g. "name-en")</li>
-<li>UnderscoreSuffix: {field}-{lang} (e.g. "name_en")</li>
-<li>AtSuffix: {field}-{lang} (e.g. "name@en")</li>
+<li>MinusPrefix: <code>{lang}-{field}</code> (e.g. "en-name")</li>
+<li>UnderscorePrefix: <code>{lang}_{field}</code> (e.g. "en_name")</li>
+<li>AtPrefix: <code>{lang}@{field}</code> (e.g. "en@name")</li>
+<li>MinusSuffix: <code>{field}-{lang}</code> (e.g. "name-en")</li>
+<li>UnderscoreSuffix: <code>{field}-{lang}</code> (e.g. "name_en")</li>
+<li>AtSuffix: <code>{field}-{lang}</code> (e.g. "name@en")</li>
 <li>None: In this case no prefix/suffix rewriting of configured <code>field</code> and <code>store</code> values is done. This means that the FST Configuration MUST define the exact field names in the Solr index for every configured language.</li>
 </ul>
 <h4 id="fst-tagging-configuration">FST Tagging Configuration</h4>
@@ -147,7 +147,7 @@ Configurations can be created by using t
 <ul>
 <li><strong>field</strong>: The indexed field in the configured Solr index. In multilingual scenarios this might be the 'base name' of the field that is extended by a prefix or suffix to get the actual field name in the Solr index (see also the field encoding configuration)</li>
 <li><strong>stored</strong> (default: <em>field</em> value) : The field in the Solr index with the stored label information. This parameter is optional. If not present <code>stored</code> is assumed to be equals to <code>field</code>.</li>
-<li><strong>fst</strong> (default based on <em>field</em> value): Optionally allows to manually specify the base file name of the FST models. Those files are assumed within the data directory of the configured Solr index under <code>fst/{fst}.{lang}.fst</code>. By default the configured <code>field</code> name is used (with non alpha-numeric chars replaced by '_').If runtime creation is enabled those files will be created if not present.</li>
+<li><strong>fst</strong> (default based on <em>field</em> value): This parameter allows to specify the name of the FST file stored within the FST directory (as configured by the [FST storage location]. The default name is generated by using the <code>field</code> with non alpha-numeric chars replaced by '_').</li>
 <li><strong>generate</strong> (default: false): If enabled the Engine will generate missing FST models. If this is enabled the engine will also be able to update FST models after changes to the Solr Index. <strong>NOTE</strong> that the creation of FST models is an expensive operation (both CPU and memory wise). The FST engine uses a pool of low priority threads to create FST models. The size of the pool can be configured by using the <code>enhancer.engines.linking.lucenefst.fstThreadPoolSize</code> parameter. Because of this the default is <code>false</code>.</li>
 </ul>
 <p>A more advanced Configuration might look like:</p>
@@ -187,10 +187,11 @@ Configurations can be created by using t
 <li><code>solr-server-name</code>: the name of the <a href="/docs/trunk/utils/commons-solr#referencedsolrserver">ReferencedSolrServer</a> or <a href="/docs/trunk/utils/commons-solr#managedsolrserver">ManagedSolrServer</a> holding the SolrCore (see also [Configuration of the Solr Index]</li>
 <li><code>solr-core-name</code> : the name of the SolrCore</li>
 </ul>
-<p>The default value of this property is <code>${solr-data-dir}/fst</code>. To manage FST models within the Stanbol folder you can us e.g. <code>${sling.home}/fst/${solr-server-name}/solr-core-name</code>.</p>
+<p>The default value of this property is '<code>${solr-data-dir}/fst</code>'. To manage FST models within the Stanbol folder you can us e.g. '<code>${sling.home}/fst/${solr-server-name}/solr-core-name</code>'.</p>
 <h3 id="entity-cache-configuration">Entity Cache Configuration</h3>
 <p>While FST tagging is fully done in-memory the FST linking engine needs to read information of matching Entities from the Solr index. This requires disc IO and is typically the part of the process that consumes the most time. The Entity Cache tries to prevent such disc level IO by caching SolrDocuments containing only fields required for the linking process (labels, types and (if available) entity rankings).  To further reduce memory requirements only labels in languages requested by processed ContentItems are stored in the cache. The Cache uses the LRU semantic and is based on the Solr cache implementation.</p>
-<p>The size of the cache can be configured by using the <code>enhancer.engines.linking.lucenefst.entityCacheSize</code> parameter. The default size is ~65k entities. Increasing the maximum size of the cache will improve performance. For small and medium sized vocabularies the cache can be configured </p>
+<p>The size of the cache can be configured by using the <code>enhancer.engines.linking.lucenefst.entityCacheSize</code> parameter. The default size is ~65k entities. Increasing the maximum size of the cache will improve performance. </p>
+<p><strong>TIP:</strong> For small and medium sized vocabularies the cache can be configured to be &gt;= as the size of Entities in the Vocabulary. In this case the FST linking engine will full operate in-memory. For such scenarios linking was up to 100 times faster as with the <a href="entityhublinking">Entityhub Linking Engine</a></p>
 <h3 id="text-processing-configuration">Text Processing Configuration</h3>
 <p>With the extension of the SolrTextTagger with a <a href="https://github.com/OpenSextant/SolrTextTagger/pull/7">TaggingAttribute</a> the FST linking engine can support the exact same text processing functionality as the other Entity Linking Engine.</p>
 <p>For the configuration please see the <a href="entitylinking#text-processing-configuration">Text Processing configuration</a> section of the Entity Linking Engine.</p>
@@ -200,9 +201,9 @@ Configurations can be created by using t
 <li><s><strong>Label Field</strong> <em>(enhancer.engines.linking.labelField)</em></s>: The label field is <strong>IGNORED</strong> as the field holding the labels is anyway provided by the [FST Tagging Configuration]. That means that the field defined by the <em>stored</em> parameter is used. If the <em>stored</em> parameter is not present it fallbacks to the <em>field</em> parameter.</li>
 <li><s><strong>Type Field</strong> <em>(enhancer.engines.linking.typeField)</em></s>: This configuration gets <strong>IGNORED</strong> in favor of the <code>enhancer.engines.linking.lucenefst.typeField</code>. See the [Additional Entity Information] section for details. </li>
 <li><strong>Redirect Field</strong> <em>(enhancer.engines.linking.redirectField)</em></s>: Note implemented. <strong>NOTE</strong> This might not be possible to efficiently implement. When those redirects need already be considered when building the FST models.</li>
-<li><s><strong>Use EntityRankings (enhancer.engines.linking.useEntityRankings)_</s>: This configuration gets </strong>IGNORED__. EntityRanking based sorting is enabled as soon as the <em>Entity Ranking Field</em> is configured.</li>
+<li><s><strong>Use EntityRankings</strong> <em>(enhancer.engines.linking.useEntityRankings)</em></s>: This configuration gets <strong>IGNORED</strong>. EntityRanking based sorting is enabled as soon as the <em>Entity Ranking Field</em> is configured.</li>
 <li><s><strong>Lemma based Matching</strong> <em>(enhancer.engines.linking.lemmaMatching)</em></s>: Not Yet implemented</li>
-<li><s><strong>Min Match Score</strong> <em>(enhancer.engines.linking.minMatchScore)</em></s>: Not Yet Implemented. Currently all linked Entities are added regardless of their score. However the way the Tagging is done makes it very unlikely to have suggestions with <code>fise:confidence</code> values less as 0.5.</li>
+<li><s><strong>Min Match Score</strong> <em>(enhancer.engines.linking.minMatchScore)</em></s>: Not Yet Implemented. The FST linking engine is based on the Lucene Analyzer chains configured for the <em>index</em> and <em>store</em> field of the FST configuration. Only if Tokens do match after the Analyzers where applied a Entity is suggested.</li>
 </ul>
 <p>In addition the following properties are <strong>IGNORED</strong> as they are not relevant for the FST Linking Engine:</p>
 <ul>