You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by "Rochbenritter ." <mo...@gmail.com> on 2014/09/30 16:33:22 UTC

Mapping issues when creating new sites (adding owl file)

Hi All,

We followed the instructions from
https://stanbol.apache.org/docs/trunk/customvocabulary.html to create a new
site for the CEO ontology (
http://www.ebusiness-unibw.org/ontologies/consumerelectronics/v1).

We managed to process the Ontology and to upload it as a site to the Apache
Stanbol server. But it is not working totally correct. When an entry from
the CEO-ontology refers to a rdf:type defined in the
goodRelations-ontology, then the individual entry can?t be found. We assume
additional mapping is needed. (e.g. ?LCD? and ?Ambilight 1? found but
?Philips? or ?Samsung? not).

You can see our configuration in this Google-Drive folder
https://drive.google.com/folderview?id=0B59O-GwTGmjjWVNPUGhYeGJmVXM&usp=drive_web

How can we fix this problem?

Cheers,

Re: Mapping issues when creating new sites (adding owl file)

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Rochbenritter,

The reason why Samsung is not found is because the Ontology defines
the labels as xsd:string

Here the excerpt of the ontology:

<gr:BusinessEntity rdf:ID="Samsung">
    <gr:legalName
rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >Samsung
Group</gr:legalName>
    <rdfs:seeAlso rdf:resource="http://www.samsung.com/"/>
    <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>Samsung</rdfs:label>
    <rdfs:comment xml:lang="en">The business entity Samsung
Group.</rdfs:comment>
    <belongsToModule xml:lang="en">MP3Player, TV, Printer,
DigitalCamera, Camcorder</belongsToModule>
</gr:BusinessEntity>

Expected would be something like

    <rdfs:label xml:lang="en">Samsung</rdfs:label>

or simple

    <rdfs:label>Samsung</rdfs:label>

Because those labels are defined as xsd:String they are indexed by the
indexing tool like

    <arr name="str/gr:legalName/"><str>Samsung Group</str></arr>
    <arr name="str/rdfs:label/"><str>Samsung</str></arr>

compared to natural language labels field that do start with a '@'.
Here the example for the rdfs:comment field

    <arr name="@en/rdfs:comment/"><str>The business entity Samsung
Group.</str></arr>

This is also the reason why those Entities are missing in the
EntityLinking results. xsd:string values are currently not considered
by the Entity Linking Engines.

IMO EntityLinking should consider also xsd:String values. So I
consider this clearly as an Issue of the Stanbol Entity Linking
Engines. I will analyze the implementations of both the Entityhub
Linking Engine and the Lucene FST Linking engine and see how to solve
this issue.

As a workaround I see two possible solutions:

(a) remove all "rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
mentions from the ceo.owl file
(b) apply the following mappings to the "./indexing/config/mappings.txt" file

rdfs:label | d=entityhub:text
gr:legalName | d=entityhub:text

NOTE: there is already a line "rdfs:label". You should replace this
with "rdfs:label | d=entityhub:text"

For you understanding "{field} |  d=entityhub:text" tells the indexing
tool to convert values of that field to the natural language text
datatype. Doing so will result in an SolrIndex that contains both the
xsd:String and the text version.

<arr name="str/rdfs:label/"><str>Samsung</str></arr>
<arr name="@/rdfs:label/"><str>Samsung</str></arr>

Thanks for your report. Before that I was completely unaware that
xsd:String values where not considered by the EntityLinking engine.
best
Rupert

On Tue, Sep 30, 2014 at 4:33 PM, Rochbenritter . <mo...@gmail.com> wrote:
> Hi All,
>
> We followed the instructions from
> https://stanbol.apache.org/docs/trunk/customvocabulary.html to create a new
> site for the CEO ontology (
> http://www.ebusiness-unibw.org/ontologies/consumerelectronics/v1).
>
> We managed to process the Ontology and to upload it as a site to the Apache
> Stanbol server. But it is not working totally correct. When an entry from
> the CEO-ontology refers to a rdf:type defined in the
> goodRelations-ontology, then the individual entry can?t be found. We assume
> additional mapping is needed. (e.g. ?LCD? and ?Ambilight 1? found but
> ?Philips? or ?Samsung? not).
>
> You can see our configuration in this Google-Drive folder
> https://drive.google.com/folderview?id=0B59O-GwTGmjjWVNPUGhYeGJmVXM&usp=drive_web
>
> How can we fix this problem?
>
> Cheers,



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/