You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Phillip Rhodes <mo...@gmail.com> on 2013/11/27 01:40:30 UTC

Controlling what properties are returned in enhancement results

Stanbol devs:

Sorry for the n00bie question, but I haven't been able to find the
answer to this question yet, from reading over the Stanbol docs.

Given a particular RDF Resource and some associated properties, for example:


<rdf:Description rdf:about="http://customers.fogbeam.com/Boxer_Steel"
          rdf:type="http://schema.fogbeam.com/Customer" >
          <rdfs:label>Boxer Steel</rdfs:label>
          <rdfs:label>CUS729897</rdfs:label>
          <dc:title>Boxer Steel</dc:title>
         <dc:description>Boxer Steel is a botique "mini-mill" steel
products provider located
               in Pittsburgh, Pennsylvania and operating since
1973</dc:description>
</rdf:Description>

what determines which properties I get in my graph entries returned
from the enhancer?  By default, it appears that I get back both
rdfs:label values, but not the dc:title or dc:description properties
if I use this example.


Is this configurable, or is it fixed?

I see that entries that come back matched from dbpedia include far
more properties, so I'm guessing there is a way to do this, I just
haven't found it yet...


Thanks,


Phil

---

This message optimized for indexing by NSA PRISM

Re: Controlling what properties are returned in enhancement results

Posted by Rupert Westenthaler <ru...@gmail.com>.
Small correction

On Wed, Nov 27, 2013 at 11:27 AM, Rupert Westenthaler
<ru...@gmail.com> wrote:
> * Keyword Linking Engine [3]: Allows to enable/disable dereferencing.
> If enabled only the following fields are included to the
> EnhancementStructure: {label-field}, {type-field}, rdfs:comment,
> geo:lat, geo:long, foaf:depiction and dbpedia-ontology:thumbnail.

by using the "org.apache.stanbol.enhancer.engines.keywordextraction.dereferenceFields"
property it is possible to configure additional properties to be
dereferenced.

best
Rupert

>
> [1] https://issues.apache.org/jira/browse/STANBOL-336
> [1b] https://issues.apache.org/jira/browse/STANBOL-1223
> [2] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/namedentitytaggingengine
> [3] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/keywordlinkingengine
> [4] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/entitylinking
> [5] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/entityhublinking
> [6] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/lucenefstlinking
>
> On Wed, Nov 27, 2013 at 1:40 AM, Phillip Rhodes
> <mo...@gmail.com> wrote:
>> Stanbol devs:
>>
>> Sorry for the n00bie question, but I haven't been able to find the
>> answer to this question yet, from reading over the Stanbol docs.
>>
>> Given a particular RDF Resource and some associated properties, for example:
>>
>>
>> <rdf:Description rdf:about="http://customers.fogbeam.com/Boxer_Steel"
>>           rdf:type="http://schema.fogbeam.com/Customer" >
>>           <rdfs:label>Boxer Steel</rdfs:label>
>>           <rdfs:label>CUS729897</rdfs:label>
>>           <dc:title>Boxer Steel</dc:title>
>>          <dc:description>Boxer Steel is a botique "mini-mill" steel
>> products provider located
>>                in Pittsburgh, Pennsylvania and operating since
>> 1973</dc:description>
>> </rdf:Description>
>>
>> what determines which properties I get in my graph entries returned
>> from the enhancer?  By default, it appears that I get back both
>> rdfs:label values, but not the dc:title or dc:description properties
>> if I use this example.
>>
>>
>> Is this configurable, or is it fixed?
>>
>> I see that entries that come back matched from dbpedia include far
>> more properties, so I'm guessing there is a way to do this, I just
>> haven't found it yet...
>>
>>
>> Thanks,
>>
>>
>> Phil
>>
>> ---
>>
>> This message optimized for indexing by NSA PRISM
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Controlling what properties are returned in enhancement results

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Phillip

The process of including information for Entities is called
"Dereference". Currently this is implemented by the different Entity
Linking Engines in different ways and with different sets of features.

* Named Entity Linking Engine [2]: Allows to enable/disable
dereferencing. If enabled it will copy over all information present
for the Entity to the Enhancement result
* Keyword Linking Engine [3]: Allows to enable/disable dereferencing.
If enabled only the following fields are included to the
EnhancementStructure: {label-field}, {type-field}, rdfs:comment,
geo:lat, geo:long, foaf:depiction and dbpedia-ontology:thumbnail.
* All Entity Linking Engines [4] (including [5]) Allows to
enable/disable dereferencing AND to configure the fields that are
dereferenced (via the 'enhancer.engines.linking.dereferenceFields'
property).
* FST Linking Engine [6]: Can not support dereferencing as it does not
have access to the Entity data (other as the {label-field},
{type-field} and {entity-ranking}).

STANBOL-336 [1] - I am currently working on - will change this and
provide separate EnhancementEngines that dereference suggested
Entities. The EntityhubDereferenceEngine [1b] will allow to configure
fields, field prefixes as well as LDPath statements to execute for
Entities that need to be dereferenced. So it will easily cover the use
case described by you

best
Rupert



[1] https://issues.apache.org/jira/browse/STANBOL-336
[1b] https://issues.apache.org/jira/browse/STANBOL-1223
[2] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/namedentitytaggingengine
[3] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/keywordlinkingengine
[4] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/entitylinking
[5] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/entityhublinking
[6] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/engines/lucenefstlinking

On Wed, Nov 27, 2013 at 1:40 AM, Phillip Rhodes
<mo...@gmail.com> wrote:
> Stanbol devs:
>
> Sorry for the n00bie question, but I haven't been able to find the
> answer to this question yet, from reading over the Stanbol docs.
>
> Given a particular RDF Resource and some associated properties, for example:
>
>
> <rdf:Description rdf:about="http://customers.fogbeam.com/Boxer_Steel"
>           rdf:type="http://schema.fogbeam.com/Customer" >
>           <rdfs:label>Boxer Steel</rdfs:label>
>           <rdfs:label>CUS729897</rdfs:label>
>           <dc:title>Boxer Steel</dc:title>
>          <dc:description>Boxer Steel is a botique "mini-mill" steel
> products provider located
>                in Pittsburgh, Pennsylvania and operating since
> 1973</dc:description>
> </rdf:Description>
>
> what determines which properties I get in my graph entries returned
> from the enhancer?  By default, it appears that I get back both
> rdfs:label values, but not the dc:title or dc:description properties
> if I use this example.
>
>
> Is this configurable, or is it fixed?
>
> I see that entries that come back matched from dbpedia include far
> more properties, so I'm guessing there is a way to do this, I just
> haven't found it yet...
>
>
> Thanks,
>
>
> Phil
>
> ---
>
> This message optimized for indexing by NSA PRISM



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen