You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by seralf <se...@gmail.com> on 2012/05/10 12:00:03 UTC

default / customized fields in KeywordLinkingEngine

Hi i'm trying to use the keyword linking engine with a customized solr
configuration. Basically i need to understand two different things:

   1. what are the default fields indexed and then used in the retrieval
   process? i look at the DEREFERENCE_FIELDS in the source, and i'm not sure
   if this is or not the place to look at.
   2. starting from the fact that if i am sure about the field that is used
   as a base to have a textual enhancement i could simple copy in that the
   results from other fields in the config, i wonder if i could define new
   fields and then consuming them into the process

thanks in advance if someone could give me some suggestion

Alfredo Serafini

Re: default / customized fields in KeywordLinkingEngine

Posted by seralf <se...@gmail.com>.

uhm thank you, i have a more clearer idea right now, i have to re-check
what i'm doing and i'll try to follow your suggestion then, as i had
misunderstood some points, sorry :-)

thanks very much for the explanation!

Alfredo Serafini

2012/5/10 Rupert Westenthaler <ru...@gmail.com>

>
> On 10.05.2012, at 14:18, seralf wrote:
>
> > thanks for 1)
> >
> > for the 2) point i was not very clear sorry.
> > I have on my test a particular weird use case where i am trying to
> provide
> > results for almost two different cases on the same rdfs:label field,
> (where
> > i have to use different tokenization approach, if that work)
> > So my idea is to try to create a parallel field with a different
> > tokenization approac and then copy it on the _text field. This is most
> > common on solr, but i am at the beginning with stanbol, so i have some
> > doubt: for example i'm not sure if the _text field is the field always
> used
> > for the matches or not.
> > I hope i was more clear this time, but i'm probably trying to do
> something
> > which is strange, i know :-)
> >
> I try to replicate to ensure that we do not misunderstand each other
>
> You have two two types of Entities in you vocabulary that both use
> rdfs:label.
> But you would like to use two different fields so that you can use
> different Solr Field configurations (e.g. Tokenizers)
>
> Copying values of rdf:label to an other field is easily possible with the
> Entityhub indexing tool.
>
> If those two different Entities do have some distinct feature (e.g. a
> different rdf:type) you could use the
>
>    org.apache.stanbol.entityhub.indexing.core.processor.LdpathProcessor
>
> with a LDpath program like
>
>    @prefix my : <http://www.example.com/my#>;
>    my:label1 = .[rdf:type is my:type1]/rdfs:label;
>    my:label2 = .[rdf:type is my:type1]/rdfs:label;
>
> this would ensure that
>
> * labels of Entities of type my:type1 are indexed in my:label1 and
> * labels of Entities of type my:type2 are indexed in my:label2
>
> The default "indexing.properties" file of the Entityhub Indexing tool also
> contains an example for how to configure the LdpathProcessor.
>
> Note also that if you keep using the FiledMapperProcessor, than the
> rdfs:label would still contain the labels of all Entities.
>
> For extraction you would need to configure two KeywordLinkingEngines (for
> my:label1 and my:label2).
> The dereferenced Entities included by those two engine configurations
> would however miss the rdfs:label field. So if you would like to have the
> rdfs:label values in the Enhancement metadata I would need to implement the
> possibility to configure the list of included properties.
>
>
> Regarding the *_text* field:
>
> This is configured (by default) in a way that any text value of an
> property is copied to it. So it would not only contain the rdfs:labels, but
> also all other textual values of any outgoing relation of an entity.
> Also note that this field can NOT be used with the KeywordLinkingEngine,
> because it is only indexed, but does not store the values.
>
> I hope this helps.
> best
> Rupert
>
> [1] https://issues.apache.org/jira/browse/STANBOL-596
>
> >
> > 2012/5/10 Rupert Westenthaler <ru...@gmail.com>
> >
> >> Hi
> >>
> >> On Thu, May 10, 2012 at 12:00 PM, seralf <se...@gmail.com> wrote:
> >>> Hi i'm trying to use the keyword linking engine with a customized solr
> >>> configuration. Basically i need to understand two different things:
> >>>
> >>>  1. what are the default fields indexed and then used in the retrieval
> >>>  process? i look at the DEREFERENCE_FIELDS in the source, and i'm not
> >> sure
> >>>  if this is or not the place to look at.
> >>
> >> Currently it is hard coded in the "DEREFERENCE_FIELDS" constant
> >> defining fields required by the Web UI of the enhancer. Currently it
> >> includes:
> >>
> >> * rdfs:comment
> >> * geo:lat/geo:long
> >> * foaf:depiction
> >> * dbp-ont:thumbnail
> >>
> >> However note that in addition to this also the
> >>
> >> * nameField (the field configured to be used as label for extraction -
> >> default: rdfs:label)
> >> * redirectField (the field used to follow redirections - default:
> >> rdf:seeAlso)
> >> * typeField (the field used to determine the type of Entities -
> >> default: rdf:type)
> >>
> >> are included.
> >>
> >> If you want this to be configurable I can easily add this feature. Not
> >> sure why I have not enabled that in the beginning.
> >>
> >>>  2. starting from the fact that if i am sure about the field that is
> >> used
> >>>  as a base to have a textual enhancement i could simple copy in that
> the
> >>>  results from other fields in the config, i wonder if i could define
> new
> >>>  fields and then consuming them into the process
> >>>
> >>
> >> Sorry, I do not understand what you mean with that.
> >>
> >>> thanks in advance if someone could give me some suggestion
> >>>
> >>> Alfredo Serafini
> >>
> >> best
> >> Rupert
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>

Re: default / customized fields in KeywordLinkingEngine

Posted by Rupert Westenthaler <ru...@gmail.com>.

On 10.05.2012, at 14:18, seralf wrote:

> thanks for 1)
> 
> for the 2) point i was not very clear sorry.
> I have on my test a particular weird use case where i am trying to provide
> results for almost two different cases on the same rdfs:label field, (where
> i have to use different tokenization approach, if that work)
> So my idea is to try to create a parallel field with a different
> tokenization approac and then copy it on the _text field. This is most
> common on solr, but i am at the beginning with stanbol, so i have some
> doubt: for example i'm not sure if the _text field is the field always used
> for the matches or not.
> I hope i was more clear this time, but i'm probably trying to do something
> which is strange, i know :-)
> 
I try to replicate to ensure that we do not misunderstand each other

You have two two types of Entities in you vocabulary that both use rdfs:label.
But you would like to use two different fields so that you can use 
different Solr Field configurations (e.g. Tokenizers)

Copying values of rdf:label to an other field is easily possible with the Entityhub indexing tool.

If those two different Entities do have some distinct feature (e.g. a different rdf:type) you could use the

    org.apache.stanbol.entityhub.indexing.core.processor.LdpathProcessor

with a LDpath program like

    @prefix my : <http://www.example.com/my#>;
    my:label1 = .[rdf:type is my:type1]/rdfs:label;
    my:label2 = .[rdf:type is my:type1]/rdfs:label;

this would ensure that 

* labels of Entities of type my:type1 are indexed in my:label1 and 
* labels of Entities of type my:type2 are indexed in my:label2

The default "indexing.properties" file of the Entityhub Indexing tool also contains an example for how to configure the LdpathProcessor.

Note also that if you keep using the FiledMapperProcessor, than the rdfs:label would still contain the labels of all Entities.

For extraction you would need to configure two KeywordLinkingEngines (for my:label1 and my:label2).
The dereferenced Entities included by those two engine configurations would however miss the rdfs:label field. So if you would like to have the rdfs:label values in the Enhancement metadata I would need to implement the possibility to configure the list of included properties.

Regarding the *_text* field:

This is configured (by default) in a way that any text value of an property is copied to it. So it would not only contain the rdfs:labels, but also all other textual values of any outgoing relation of an entity.
Also note that this field can NOT be used with the KeywordLinkingEngine, because it is only indexed, but does not store the values.

I hope this helps.
best
Rupert

[1] https://issues.apache.org/jira/browse/STANBOL-596

> 
> 2012/5/10 Rupert Westenthaler <ru...@gmail.com>
> 
>> Hi
>> 
>> On Thu, May 10, 2012 at 12:00 PM, seralf <se...@gmail.com> wrote:
>>> Hi i'm trying to use the keyword linking engine with a customized solr
>>> configuration. Basically i need to understand two different things:
>>> 
>>>  1. what are the default fields indexed and then used in the retrieval
>>>  process? i look at the DEREFERENCE_FIELDS in the source, and i'm not
>> sure
>>>  if this is or not the place to look at.
>> 
>> Currently it is hard coded in the "DEREFERENCE_FIELDS" constant
>> defining fields required by the Web UI of the enhancer. Currently it
>> includes:
>> 
>> * rdfs:comment
>> * geo:lat/geo:long
>> * foaf:depiction
>> * dbp-ont:thumbnail
>> 
>> However note that in addition to this also the
>> 
>> * nameField (the field configured to be used as label for extraction -
>> default: rdfs:label)
>> * redirectField (the field used to follow redirections - default:
>> rdf:seeAlso)
>> * typeField (the field used to determine the type of Entities -
>> default: rdf:type)
>> 
>> are included.
>> 
>> If you want this to be configurable I can easily add this feature. Not
>> sure why I have not enabled that in the beginning.
>> 
>>>  2. starting from the fact that if i am sure about the field that is
>> used
>>>  as a base to have a textual enhancement i could simple copy in that the
>>>  results from other fields in the config, i wonder if i could define new
>>>  fields and then consuming them into the process
>>> 
>> 
>> Sorry, I do not understand what you mean with that.
>> 
>>> thanks in advance if someone could give me some suggestion
>>> 
>>> Alfredo Serafini
>> 
>> best
>> Rupert
>> 
>> 
>> 
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>

Re: default / customized fields in KeywordLinkingEngine

Posted by seralf <se...@gmail.com>.

thanks for 1)

for the 2) point i was not very clear sorry.
I have on my test a particular weird use case where i am trying to provide
results for almost two different cases on the same rdfs:label field, (where
i have to use different tokenization approach, if that work)
So my idea is to try to create a parallel field with a different
tokenization approac and then copy it on the _text field. This is most
common on solr, but i am at the beginning with stanbol, so i have some
doubt: for example i'm not sure if the _text field is the field always used
for the matches or not.
I hope i was more clear this time, but i'm probably trying to do something
which is strange, i know :-)


2012/5/10 Rupert Westenthaler <ru...@gmail.com>

> Hi
>
> On Thu, May 10, 2012 at 12:00 PM, seralf <se...@gmail.com> wrote:
> > Hi i'm trying to use the keyword linking engine with a customized solr
> > configuration. Basically i need to understand two different things:
> >
> >   1. what are the default fields indexed and then used in the retrieval
> >   process? i look at the DEREFERENCE_FIELDS in the source, and i'm not
> sure
> >   if this is or not the place to look at.
>
> Currently it is hard coded in the "DEREFERENCE_FIELDS" constant
> defining fields required by the Web UI of the enhancer. Currently it
> includes:
>
> * rdfs:comment
> * geo:lat/geo:long
> * foaf:depiction
> * dbp-ont:thumbnail
>
> However note that in addition to this also the
>
> * nameField (the field configured to be used as label for extraction -
> default: rdfs:label)
> * redirectField (the field used to follow redirections - default:
> rdf:seeAlso)
> * typeField (the field used to determine the type of Entities -
> default: rdf:type)
>
> are included.
>
> If you want this to be configurable I can easily add this feature. Not
> sure why I have not enabled that in the beginning.
>
> >   2. starting from the fact that if i am sure about the field that is
> used
> >   as a base to have a textual enhancement i could simple copy in that the
> >   results from other fields in the config, i wonder if i could define new
> >   fields and then consuming them into the process
> >
>
> Sorry, I do not understand what you mean with that.
>
> > thanks in advance if someone could give me some suggestion
> >
> > Alfredo Serafini
>
> best
> Rupert
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: default / customized fields in KeywordLinkingEngine

Posted by Rupert Westenthaler <ru...@gmail.com>.

Hi

On Thu, May 10, 2012 at 12:00 PM, seralf <se...@gmail.com> wrote:
> Hi i'm trying to use the keyword linking engine with a customized solr
> configuration. Basically i need to understand two different things:
>
>   1. what are the default fields indexed and then used in the retrieval
>   process? i look at the DEREFERENCE_FIELDS in the source, and i'm not sure
>   if this is or not the place to look at.

Currently it is hard coded in the "DEREFERENCE_FIELDS" constant
defining fields required by the Web UI of the enhancer. Currently it
includes:

* rdfs:comment
* geo:lat/geo:long
* foaf:depiction
* dbp-ont:thumbnail

However note that in addition to this also the

* nameField (the field configured to be used as label for extraction -
default: rdfs:label)
* redirectField (the field used to follow redirections - default: rdf:seeAlso)
* typeField (the field used to determine the type of Entities -
default: rdf:type)

are included.

If you want this to be configurable I can easily add this feature. Not
sure why I have not enabled that in the beginning.

>   2. starting from the fact that if i am sure about the field that is used
>   as a base to have a textual enhancement i could simple copy in that the
>   results from other fields in the config, i wonder if i could define new
>   fields and then consuming them into the process
>

Sorry, I do not understand what you mean with that.

> thanks in advance if someone could give me some suggestion
>
> Alfredo Serafini

best
Rupert



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen