You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by rw...@apache.org on 2014/05/23 09:57:18 UTC

svn commit: r1597023 - in /stanbol/site/trunk/content/docs/trunk: components/enhancer/ components/enhancer/chains/ components/enhancer/engines/ components/entityhub/ utils/

Author: rwesten
Date: Fri May 23 07:57:18 2014
New Revision: 1597023

URL: http://svn.apache.org/r1597023
Log:
STANBOL-488, STANBOL-336, STANBOL-1223, STANBOL-1165: improvements, corrections and clarifications

Modified:
    stanbol/site/trunk/content/docs/trunk/components/enhancer/chains/weightedchain.mdtext
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entityhubdereference.mdtext
    stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
    stanbol/site/trunk/content/docs/trunk/components/enhancer/enhancementproperties.mdtext
    stanbol/site/trunk/content/docs/trunk/components/entityhub/managedsite.mdtext
    stanbol/site/trunk/content/docs/trunk/utils/marmotta-kiwi-repository-service.mdtext

Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/chains/weightedchain.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/chains/weightedchain.mdtext?rev=1597023&r1=1597022&r2=1597023&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/chains/weightedchain.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/chains/weightedchain.mdtext Fri May 23 07:57:18 2014
@@ -16,9 +16,36 @@ The syntax to define an Engine as option
     <name>;optional
     <name>;optional=true
 
+The following figure shows the configuration dialog of a WeightedCahin configured with two required and an optional engine.
 
 ![Configuration dialog for the WeightedCahin](enhancer-weightedchain-config.png "Screenshot of the configuration dialog for a WeightedChain with two required and one optional engine")
 
+## Enhancement Properties Support
+
+__since `0.12.1`__
+
+Starting from `0.12.1` the Weighted Chain allows to configure [EnhancementProperties](../enhancementproperties)
+
+* __chain and engine__ scoped properties are defined as parameters to the engines with the syntax `{engine-name}; {property-name-1}={value-1},{value-2}; {property-name-2}={value-1};` 
+
+* __chain__ scoped properties can be configured by using the osgi property key `stanbol.enhancer.chain.chainproperties` by the syntax `{property-name-1}={value-1},{value-2}`. NOTE that `;` is NOT supported as separator for parsing multiple properties as OSGI configurations already define a way for parsing multiple values
+
+All EnhancementProperties configured with a [Chain](chains) are written as RDF to the [ExecutionPlan](chains/executionplan). _Chain_ scoped properties are directly added to the `ep:ExecutionPlan` instance while _chain and engine_ scoped properties are added to the `ep:ExecutionNode` of the according engine.
+
+The following figure and listing provide an example
+
+![WeightedChain including some Enhancement Properties](enhancer-weightedchain-enhprop-config.png)
+
+The figure shows that for the `dbpedia-fst` engine the maximum number of suggestions are set to `10`. Also the minimum confidence value is set to `0.8`. For the `dbpedia-dereference` engine the dereferenced languages are set to English, German and Spanish. Finally a _chain_ scoped property is used to set the maximum number of suggestions for the whole chain to `5`. However this has no effect for the `dbpedia-fst` engine as its custom configuration will override this chain wide property.
+
+The following listing shows the exact same configuration in the `.cfg` format.
+
+    stanbol.enhancer.chain.name="dbpedia-linking"
+    stanbol.enhancer.chain.weighted.chain=["tika;optional","opennlp-sentence","opennlp-token","opennlp-pos","opennlp-chunker",
+        "dbpedia-fst;\ enhancer.max-suggestions\=10;\ enhancer.min-confidence\=0.8",
+        "dbpedia-dereference;\ enhancer.engines.dereference.languages\=en,de,es"]
+    stanbol.enhancer.chain.chainproperties=["enhancer.max-suggestions\=5"]
+
 ## Calculation of the ExecutionPlan
 
 It is important to note that the ordering of the list has no influence on the ExecutionPlan because the order of execution of the configured [EnhancementEngines](../engines) is calculated only by using the value of the "org.apache.stanbol.enhancer.engine.order" property provided by the EnhancementEngine:

Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entityhubdereference.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entityhubdereference.mdtext?rev=1597023&r1=1597022&r2=1597023&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entityhubdereference.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entityhubdereference.mdtext Fri May 23 07:57:18 2014
@@ -38,7 +38,9 @@ The Shared Thread Pool is a singelton Co
 
 ![Shared Thread Pool Configuration](entityhub-dereference-engine-shared-threadpool-config.png)
 
-### Field Mapping Support
+### Advanced Dereference Configurations
+
+#### Entityhub Field Mapping Support
 
 The _enhancer.engines.dereference.fields_ configuration does support the Entityhub Field Mapping language.
 
@@ -47,12 +49,31 @@ FieldMappings do use the following synta
     :::text
     [!]FieldPattern [| Filter] [> Mapping]
 
-* an optional Exclusion indicated by '!' as the first character of the mapping used to exclude fields that are matched by the pattern.
+* an optional Exclusion indicated by '!' as the first character of the mapping used to exclude fields that are matched by the `FieldPattern` part (e.g. `!foaf:*` will exclude all properties of the FOAF namespace). Exclusions are only useful if a wildcard is used (e.g. `foaf:*` together with `!foaf:mbox`).
 * the required _FieldPattern_ supports the definition of prefixes such as `http://xmlns.com/foaf/0.1/*` or `foaf:*`
 * the optional _Filter_ part allows to filter specific languages (e.g. `@=null;en;de;` will only dereference English and German literals as well as literals with no language tag), typed literals (e.g. `d=xsd:dateTime;xsd:date`) or URI values (e.g. `d=entityhub:ref`). Filters will also try to convert values to the parsed data type (e.g. `d=xsd:double` would convert `xsd:float` values to `xsd:doule`. Also string literals that can be parsed as double would be converted).
 * an optional _Mapping_ can be used to copy values to an other field (e.g. `foaf:name > schema:name` would copy all FOAF names to the schema.org name field)
 
-__NOTE__: Field Mappings configured for the EntityhubDerefereceEngine are overridden by Field Mappings parsed as [Enhancement Properties](../enhancementproperties).
+__NOTE__ that Field Mappings configured for the EntityhubDerefereceEngine are overridden by Field Mappings parsed as [Enhancement Properties](../enhancementproperties).
+
+### LDPath support
+
+The use of[LD Path Language](http://marmotta.apache.org/ldpath/language.html) is an alternative to most of the features supported by the Entityhub Field Mapping language. Especially _Filters_ and _Mapping_ SHOULD BE expressed using LD Path. 
+
+The only advantage of the Field Mapping language is that is supports the use of wildcards and exclusions. So in cases where one once to dereference all properties of a specific namespace it is only possible to specify this by using the Field Mapping language.
+
+The following Example shows a configuration that dereferences all schema.org properties and also uses LD Path to align soem none schema.org properties
+
+    :::text
+    enhancer.engines.dereference.fields="schema:*"
+    enhancer.engines.dereference.ldpath=["@prefix schema <http://schema.org/>;",
+        "@prefix dct <http://purl.org/dc/terms/>;",
+        "schema:name = (rdfs:label | dct:title | dc:title | foaf:name | skos:prefLabel);",
+        "schema:alternateName = skos:altLabel;"
+        "schema:image = foaf:depiction;",
+        "schema:homepage = foaf:homepage;"]
+        
+_NOTE_ when used in a OSGI `*.cfg` file one would need to escape spaces and `=` with `\` and remove all line breaks.
 
 ## Supported Enhancement Properties 
 
@@ -61,3 +82,12 @@ The following Enhancement Properties are
 * __Dereference Languages__ _(enhancer.engines.dereference.languages)_: A set of languages that are dereferenced. Even if _'Dereference only Content Language Literals'_ is active explicitly configured languages will still get dereferenced. * __Dereferenced Fields__ _(enhancer.engines.dereference.fields)_: The dereferenced fields - in RDF terminology 'properties' - to be dereferenced. QNames (e.g. `rdf:label`) can be used for the configuration. This Engine supports the use of FieldMappings for the configuration. Dereferenced Fields parsed as EnhancementProperty will override values configured for the Engine.
 * __Dereference LD Path__ _(enhancer.engines.dereference.ldpath)_: The [LD Path Language](http://marmotta.apache.org/ldpath/language.html) allows to define powerful selectors for dereferenced Entities. An LD Path program parsed as EnhancementProperty will be executed in addition to those configured for the engine.
 
+As an example the following query parameter would instruct all Entityhub Dereference engines used in an enhancement engine to just dereference English and German literals.
+
+    :::bash
+    curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \
+        --data "The Eifeltower is located in Paris." 
+        http://localhost:8080/enhancer?enhancer.engines.dereference.languages=en&\
+        enhancer.engines.dereference.languages=de
+
+

Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext?rev=1597023&r1=1597022&r2=1597023&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext Fri May 23 07:57:18 2014
@@ -183,11 +183,8 @@ Enhancement Engines in this category can
 	* create Entity suggestions (fise:EntityAnnotations) for the processed fise:TextAnnotations
 	* accesses a remote service
 
-* _Solr More-like-This Disambiguation Engine:_ __under development_ (see [STANBOL-723](https://issues.apache.org/jira/browse/STANBOL-723))
+* __Solr More-like-This Disambiguation Engine:__ (see [STANBOL-723](https://issues.apache.org/jira/browse/STANBOL-723))
 	* disambiguates Entities managed by the Stanbol Entityhub by using Solr MLT queries
-	* only available via the [disambiguation-engine](http://svn.apache.org/repos/asf/stanbol/branches/disambiguation-engine/) branch
-	* adjusts the fise:confidence of existing fise:EntityAnnotations
-
 
 
 ## Postprocessing / Other

Modified: stanbol/site/trunk/content/docs/trunk/components/enhancer/enhancementproperties.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/enhancementproperties.mdtext?rev=1597023&r1=1597022&r2=1597023&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/enhancer/enhancementproperties.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/enhancer/enhancementproperties.mdtext Fri May 23 07:57:18 2014
@@ -136,6 +136,7 @@ Starting with `0.12.1` Enhancement Prope
 
 The following shows the curl request generating the equivalent of the example used in the above section:
 
+    :::bash
     curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \
         --data "The Eifeltower is located in Paris." 
         http://localhost:8080/enhancer?enhancer.max-suggestions=5&\

Modified: stanbol/site/trunk/content/docs/trunk/components/entityhub/managedsite.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/entityhub/managedsite.mdtext?rev=1597023&r1=1597022&r2=1597023&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/components/entityhub/managedsite.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/components/entityhub/managedsite.mdtext Fri May 23 07:57:18 2014
@@ -91,7 +91,7 @@ where 'sparqlQuery.txt' refers to a file
 
 With [STANBOL-1169](https://issues.apache.org/jira/browse/STANBOL-1169) (since version `0.12.1`) a Sesame Repository registered as OSGI service can be used as Entityhub Yard.
 
-The following figure shows a Apache Marmotta Kiwi Repository registered as OSGI service. 
+The following figure shows a [Apache Marmotta Kiwi Repository](/docs/trunk/utils/marmotta-kiwi-repository-service) registered as OSGI service. 
 
 ![Marmotta Kiwi Repository Service](marmotta-kiwi-repository-service.png)
 

Modified: stanbol/site/trunk/content/docs/trunk/utils/marmotta-kiwi-repository-service.mdtext
URL: http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/utils/marmotta-kiwi-repository-service.mdtext?rev=1597023&r1=1597022&r2=1597023&view=diff
==============================================================================
--- stanbol/site/trunk/content/docs/trunk/utils/marmotta-kiwi-repository-service.mdtext (original)
+++ stanbol/site/trunk/content/docs/trunk/utils/marmotta-kiwi-repository-service.mdtext Fri May 23 07:57:18 2014
@@ -16,14 +16,18 @@ configuration. The following figure show
 * `org.openrdf.repository.Repository.id`: The id of the Repository. Intended to be used by
 other components to track a specific repository instance.
 * `marmotta.kiwi.dialect`: The KiWi Database dialect. Currently Marmotta supports the
-H2Dialect, PostgreSQLDialect and MySQLDialect. Note that the selected dialect will select
+`H2Dialect`, `PostgreSQLDialect` and `MySQLDialect`. Note that the selected dialect will select
 different database driver. If those are not available the activation will throw an
 exception. PostgreSQL driver are embedded. H2 drivers are included in the default
-Bundlelist used by Stanbol.
+[Marmotta Kiwi Bundlelist](http://svn.apache.org/repos/asf/stanbol/branches/release-0.12/launchers/bundlelists/marmotta/kiwi/src/main/bundles/list.xml) used by Stanbol. For MySQL the according dependency needs to be uncommented in
+the [Marmotta Kiwi Bundlelist](http://svn.apache.org/repos/asf/stanbol/branches/release-0.12/launchers/bundlelists/marmotta/kiwi/src/main/bundles/list.xml).
 * `marmotta.kiwi.dburl`: This property can be used to directly configure the DB URL. If
-present this is preferred over the configuration of the `marmotta.kiwi.host`, 
-`marmotta.kiwi.port`, `marmotta.kiwi.database` and `marmotta.kiwi.options` parameters.
-* `marmotta.kiwi.user` and `marmotta.kiwi.password` for the database
+present this is preferred over the configuration of the `host`,`port`, `database` and `options` parameters.
+* `marmotta.kiwi.host`: The host of the database (a file path in case of H2)
+* `marmotta.kiwi.port`: The port of the database (ignored in case of H2)
+* `marmotta.kiwi.user`: The database user
+* `marmotta.kiwi.password`: The password for the configured user
+* `marmotta.kiwi.options`: Additional database options
 * `marmotta.kiwi.cluster`: defines the name of the cluster. Different KiWi Repositories
 might use clusters with different names. If not present or empty clustering will be
 deactivated.
@@ -53,6 +57,8 @@ registered as OSGI service with the para
 The marked `org.openrdf.repository.Repository.id` property is of special interest as it
 can be used to track for a Sesame Repository with a specific name. As an Example the
 Repository with the name `dummy` can be tracked with the Filter
-`(&(objectClass=org.openrdf.repository.Repository)(org.openrdf.repository.Repository.id=dummy))`
+
+    :::text
+    (&(objectClass=org.openrdf.repository.Repository)(org.openrdf.repository.Repository.id=dummy))