You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by "Sawhney, Tarandeep Singh" <ts...@innodata.com> on 2013/07/18 09:48:38 UTC

New reference site with additonal DBpedia triples

Hi All,

In the stanbol local cache we have limited triples in dbpedia reference
site.

We have a need to get more triples for entities which are present in dbpedia
reference site. For example entity "India" has limited triples, so when we
enhance text which has india, it gets us only information which is there in
dbpedia reference site.

We have followed below mentioned steps to add more RDF data for entity
"India" by creating our own reference site.

1 - Downloaded rdf-data for 'India' from [1].

2 - Generated indexes for this rdf-data as suggested in article [2] with *Demo
*as a reference site name.

3-  Initialized indexes within stanbol instance  as per [2].

4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with *Demo *as
referenced site as per [3].
     I have added *dbp-ont:capital *in *'"Fields used for derefrencing*
"option.

5- Configured new weighted chain (*demoChain*).

6 - Now i am trying to enhance *"India is a country."* I am getting India
as de-reference entity but unable to get any new information related
to *dbp-ont:capital
*which exists in my new reference site *Demo, *which in this case should
give us URI value of "New Delhi"

[1] http://dbpedia.org/page/India
[2] http://stanbol.apache.org/docs/trunk/customvocabulary.html
[3]
http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking

Can you please let me know if i am doing something wrong here or missing
some configurations.
Please let me know in case you need some more information on how we are
trying to do it

best regards
tarandeep

-- 

"This e-mail and any attachments transmitted with it are for the sole use 
of the intended recipient(s) and may contain confidential , proprietary or 
privileged information. If you are not the intended recipient, please 
contact the sender by reply e-mail and destroy all copies of the original 
message. Any unauthorized review, use, disclosure, dissemination, 
forwarding, printing or copying of this e-mail or any action taken in 
reliance on this e-mail is strictly prohibited and may be unlawful."

Re: New reference site with additonal DBpedia triples

Posted by "Sawhney, Tarandeep Singh" <ts...@innodata.com>.
Hi Rafa.

we are directly downloading RDF for a given entity from http://dbpedia.orgin XML

for example for entity "Adidas", we went to "http://dbpedia.org/page/Adidas"
and downloaded RDF using hyperlink present in the bottom of page

we are then running indexer as per steps in the stanbol documentation

please let me know if you meant somwthing else with your question

best regards
tarandeep

there is a link in the footer of
On Jul 18, 2013 7:11 PM, "Rafa Haro" <rh...@zaizi.com> wrote:

> Hi Tarandeep,
>
> How are you building your RDF dataset?
>
> Cheers,
>
> Rafa Haro
>
> El 18/07/13 15:24, Sawhney, Tarandeep Singh escribió:
>
>> Hi Rafa,
>>
>> Thanks for giving the pointer to resolve the issue. We tried below
>> mentioned step but now the issue seems to be in how we are indexing.
>>
>> As you suggested we verified if we have the required indexes for
>> additional
>> fields in our reference site. We found out, that indexer had created all
>> indexes except for fields in "dbpedia-owl<http://dbpedia.**
>> org/ontology/foundedBy <http://dbpedia.org/ontology/foundedBy>>
>> *:http://dbpedia.org/ontology/**" *namespace
>> *
>> *
>> For example, Please look at RDF ---> http://dbpedia.org/page/Adidas
>>
>> When we query our reference site for entity "Adidas", we are able to see
>> all indexes except for fields in *dbpedia-owl* namespace
>> *
>> *
>> Below are the LDPaths we used while querying our reference site.
>>
>> *
>> name = rdfs:label[@en] :: xsd:string;
>> comment = rdfs:comment[@en] :: xsd:string;
>> categories = dc:subject :: xsd:anyURI;
>> homepage = foaf:homepage :: xsd:anyURI;
>> location = fn:concat("[",geo:lat,",",geo:**long,"]") :: xsd:string;
>> foafname = foaf:name :: xsd:string;
>> abstract = dbp-ont:abstract[@en] :: xsd:string;
>> type = rdf:type :: xsd:anyURI;
>> foundedBy = dbp-ont:foundedBy :: xsd:anyURI;
>> *
>>
>> We can see data for everything except for "abstract" and "foundedBy" since
>> they are in namespace *dbpedia-owl <http://dbpedia.org/ontology/**
>> foundedBy <http://dbpedia.org/ontology/foundedBy>>*
>>
>> So it means indexes for above two fields were not created by indexer and
>> therefore we dont have them in the reference site and hence cant see them
>> when entity is linked
>>
>> We have not changed any default settings while running indexer
>>
>> Can you please provide further help in order to find out what we are
>> missing
>>
>> best regards
>> tarandeep
>>
>>
>> On Thu, Jul 18, 2013 at 4:22 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>
>>  Hi Tarandeep,
>>>
>>> Thanks, it's quite more clear now :-). Have you check if the information
>>> you need (for example dbp-ont:capital) is actually in the index?. You can
>>> check it for example looking for "India" entity directly in your custom
>>> DBpedia site in the EntityHub.
>>>
>>> Regards
>>>
>>> El 18/07/13 12:30, Sawhney, Tarandeep Singh escribió:
>>>
>>>  No problems Rafa, may be i didnt explain with details/clarity.
>>>>
>>>> We are using custom ontology to extract custom entities from text and
>>>> then
>>>> we want to them to link with DBpedia entities (in local dbpedia
>>>> reference
>>>> site).
>>>>
>>>> We found dbpedia reference doesnt have enough data that we need, so
>>>> decided
>>>> to download additional data for selected entities (related to fashion
>>>> brands, fashion designers, company names) directly from dbpedia.org
>>>>
>>>> We then indexed these individual RDF files and created indexes with new
>>>> reference site
>>>>
>>>> We then did not use DBpedia reference site, instead used our new
>>>> reference
>>>> site which has dbpedia data that we need with our new Entityhub linking
>>>> engine
>>>>
>>>> But after we followed steps i mentioned in my earlier email, during
>>>> enhancement, custom entities are getting de-referenced from my new
>>>> reference site but i dont see additional data that i needed which exists
>>>> in
>>>> local cache.
>>>>
>>>> Hope this explains what we are trying to do, please let me know if some
>>>> more information is required.
>>>>
>>>> Best regards
>>>> tarandeep
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jul 18, 2013 at 3:21 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>
>>>>   Hi Tarandeep,
>>>>
>>>>> El 18/07/13 11:18, Sawhney, Tarandeep Singh escribió:
>>>>>
>>>>>   Hi Rafa
>>>>>
>>>>>> Thanks for your response
>>>>>>
>>>>>> Yes, we have tried the whole URI of the property (
>>>>>> http://dbpedia.org/ontology/********capital<http://dbpedia.org/ontology/******capital>
>>>>>> <http://dbpedia.**org/ontology/****capital<http://dbpedia.org/ontology/****capital>
>>>>>> >
>>>>>> <http://dbpedia.org/****ontology/**capital<http://dbpedia.org/**ontology/**capital>
>>>>>> <http://**dbpedia.org/ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>>> **>
>>>>>> )<http://dbpedia.org/******ontology/capital<http://dbpedia.org/****ontology/capital>
>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>>> **>
>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>> >
>>>>>> )>
>>>>>>
>>>>>> also
>>>>>> but it didn't help
>>>>>>
>>>>>> Yes we are using EntityHub cache to locally store with all the
>>>>>> additional
>>>>>> information we pulled from Dbpedia.org
>>>>>>
>>>>>> In the documentation provided at
>>>>>> http://stanbol.apache.org/******docs/trunk/customvocabulary.****
>>>>>> **html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html>
>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>>> >
>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<
>>>>>> http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>> >
>>>>>>
>>>>>> it is mentioned --->
>>>>>>
>>>>>> *Optionally, if your data do use namespaces that are not present in
>>>>>>
>>>>>> prefix.cc (or the server used for indexing does not have internet
>>>>>> connectivity) you can manually define required prefixes by
>>>>>> creating/using
>>>>>> the a indexing/config/******namespaceprefix.mappings file
>>>>>>
>>>>>> *
>>>>>> *
>>>>>>
>>>>>> *
>>>>>> Can we get some inputs on if some changes to this file are required
>>>>>> while
>>>>>> using DBpedia data
>>>>>>
>>>>>>   This file can be used at 'indexing time' when you use the indexing
>>>>>> tool
>>>>>>
>>>>> for creating the index for the DBpedia site. I have just seen that
>>>>> dbp-ont
>>>>> is already included as prefix. What I don't have clear right now is if
>>>>> you
>>>>> are generating your own dbpedia index including all the dbpedia
>>>>> ontology
>>>>> properties (that should be a enormous index) or if you are generating
>>>>> an
>>>>> index each time you need a new entity or even you are trying to
>>>>> retrieve
>>>>> the entities from dbpedia in a 'live' way :-). Sorry I'm confused about
>>>>> your workflow.
>>>>>
>>>>>
>>>>>   Also, looks like we are missing on some configurations in the overall
>>>>>
>>>>>> process, so if dev community can please provide help, it will be much
>>>>>> appreciated
>>>>>>
>>>>>> best regards
>>>>>> tarandeep
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 18, 2013 at 1:38 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>>>
>>>>>>    Hi Tarandeep,
>>>>>>
>>>>>>  Have you tried using the whole URI of the property (
>>>>>>> http://dbpedia.org/ontology/********capital<http://dbpedia.org/ontology/******capital>
>>>>>>> <http://dbpedia.**org/ontology/****capital<http://dbpedia.org/ontology/****capital>
>>>>>>> >
>>>>>>> <http://dbpedia.org/****ontology/**capital<http://dbpedia.org/**ontology/**capital>
>>>>>>> <http://**dbpedia.org/ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>>>> **>
>>>>>>> )<http://dbpedia.org/******ontology/capital<http://dbpedia.org/****ontology/capital>
>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>>>> **>
>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>> >
>>>>>>> )>
>>>>>>>
>>>>>>> ??
>>>>>>>
>>>>>>> Anyway, maybe it is a better idea to change your workflow, because I
>>>>>>> suppose that your example about "India" entity is something that
>>>>>>> could
>>>>>>> happen to you with more entities because the default DBpedia site in
>>>>>>> Stanbol doesn't contain information about dbp-ont properties. I would
>>>>>>> suggest to use EntityHub cache to locally store entities with all the
>>>>>>> information you need directly from DBpedia. So, maybe you can try to
>>>>>>> directly retrieve the entities from any DBpedia endpoint, store them
>>>>>>> in
>>>>>>> the
>>>>>>> EntityHub cache to ensure that you can use it later as your
>>>>>>> convenience.
>>>>>>> Maybe the workflow could be the following:
>>>>>>>
>>>>>>> 1. Enhance a document using Stanbol DBpedia site for linking.
>>>>>>> 2. For each extracted entity:
>>>>>>>            2.1. If the entity is already store in the EntityHub, get
>>>>>>> it
>>>>>>> using
>>>>>>> LDPath for dereferencing.
>>>>>>>            2.2. If not, retrieve the entity from DBpedia endpoint as
>>>>>>> RDF
>>>>>>> data
>>>>>>> and store it in the EntityHub. Then retrieve it
>>>>>>>
>>>>>>> I would day that this is currently possible in Stanbol, but maybe
>>>>>>> someone
>>>>>>> else in the list can give you more light with the issue.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió:
>>>>>>>
>>>>>>>    Hi All,
>>>>>>>
>>>>>>>  In the stanbol local cache we have limited triples in dbpedia
>>>>>>>> reference
>>>>>>>> site.
>>>>>>>>
>>>>>>>> We have a need to get more triples for entities which are present in
>>>>>>>> dbpedia
>>>>>>>> reference site. For example entity "India" has limited triples, so
>>>>>>>> when
>>>>>>>> we
>>>>>>>> enhance text which has india, it gets us only information which is
>>>>>>>> there
>>>>>>>> in
>>>>>>>> dbpedia reference site.
>>>>>>>>
>>>>>>>> We have followed below mentioned steps to add more RDF data for
>>>>>>>> entity
>>>>>>>> "India" by creating our own reference site.
>>>>>>>>
>>>>>>>> 1 - Downloaded rdf-data for 'India' from [1].
>>>>>>>>
>>>>>>>> 2 - Generated indexes for this rdf-data as suggested in article [2]
>>>>>>>> with
>>>>>>>> *Demo
>>>>>>>> *as a reference site name.
>>>>>>>>
>>>>>>>>
>>>>>>>> 3-  Initialized indexes within stanbol instance  as per [2].
>>>>>>>>
>>>>>>>> 4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with
>>>>>>>> *Demo
>>>>>>>> *as
>>>>>>>>
>>>>>>>> referenced site as per [3].
>>>>>>>>          I have added *dbp-ont:capital *in *'"Fields used for
>>>>>>>> derefrencing*
>>>>>>>> "option.
>>>>>>>>
>>>>>>>> 5- Configured new weighted chain (*demoChain*).
>>>>>>>>
>>>>>>>> 6 - Now i am trying to enhance *"India is a country."* I am getting
>>>>>>>> India
>>>>>>>>
>>>>>>>> as de-reference entity but unable to get any new information related
>>>>>>>> to *dbp-ont:capital
>>>>>>>> *which exists in my new reference site *Demo, *which in this case
>>>>>>>> should
>>>>>>>>
>>>>>>>> give us URI value of "New Delhi"
>>>>>>>>
>>>>>>>> [1] http://dbpedia.org/page/India
>>>>>>>> [2] http://stanbol.apache.org/********docs/trunk/customvocabulary.*
>>>>>>>> *****<http://stanbol.apache.org/******docs/trunk/customvocabulary.****>
>>>>>>>> **html<http://stanbol.apache.**org/****docs/trunk/**
>>>>>>>> customvocabulary.****html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html>
>>>>>>>> >
>>>>>>>> <http://stanbol.apache.**org/****docs/trunk/****
>>>>>>>> customvocabulary.**html<http:/**/stanbol.apache.org/**docs/**
>>>>>>>> trunk/customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>>>>> >
>>>>>>>> <http://stanbol.apache.**org/****docs/trunk/****
>>>>>>>> customvocabulary.**html<
>>>>>>>> http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<
>>>>>>>> http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>>>> >
>>>>>>>> [3]
>>>>>>>> http://stanbol.apache.org/********docs/trunk/components/****<http://stanbol.apache.org/******docs/trunk/components/****>
>>>>>>>> <h**ttp://stanbol.apache.org/******docs/trunk/components/****<http://stanbol.apache.org/****docs/trunk/components/****>
>>>>>>>> >
>>>>>>>> enhancer/engines/**<http://**s**tanbol.apache.org/**docs/**<http://stanbol.apache.org/**docs/**>
>>>>>>>> trunk/components/**enhancer/****engines/**<http://stanbol.**
>>>>>>>> apache.org/**docs/trunk/**components/**enhancer/engines/****<http://stanbol.apache.org/**docs/trunk/components/**enhancer/engines/**>
>>>>>>>> >
>>>>>>>> entityhublinking<http://****stan**bol.apache.org/docs/**trunk/**<http://bol.apache.org/docs/trunk/**>
>>>>>>>> <http://stanbol.**apache.org/docs/trunk/**<http://stanbol.apache.org/docs/trunk/**>
>>>>>>>> >
>>>>>>>> components/enhancer/engines/******entityhublinking<http://**
>>>>>>>> stanbol.apache.org/docs/trunk/****components/enhancer/engines/****<http://stanbol.apache.org/docs/trunk/**components/enhancer/engines/**>
>>>>>>>> entityhublinking<http://**stanbol.apache.org/docs/trunk/**
>>>>>>>> components/enhancer/engines/**entityhublinking<http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking>
>>>>>>>> >
>>>>>>>>
>>>>>>>> Can you please let me know if i am doing something wrong here or
>>>>>>>> missing
>>>>>>>> some configurations.
>>>>>>>> Please let me know in case you need some more information on how we
>>>>>>>> are
>>>>>>>> trying to do it
>>>>>>>>
>>>>>>>> best regards
>>>>>>>> tarandeep
>>>>>>>>
>>>>>>>>
>>>>>>>>    --
>>>>>>>>
>>>>>>>>  ------------------------------
>>>>>>> This message should be regarded as confidential. If you have received
>>>>>>> this
>>>>>>> email in error please notify the sender and destroy it immediately.
>>>>>>> Statements of intent shall only become binding when confirmed in hard
>>>>>>> copy
>>>>>>> by an authorised signatory.
>>>>>>>
>>>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>>>> number
>>>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>>>>>>> Road,
>>>>>>> London W6 7AN.
>>>>>>>
>>>>>>>   --
>>>>>>>
>>>>>> ------------------------------
>>>>> This message should be regarded as confidential. If you have received
>>>>> this
>>>>> email in error please notify the sender and destroy it immediately.
>>>>> Statements of intent shall only become binding when confirmed in hard
>>>>> copy
>>>>> by an authorised signatory.
>>>>>
>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>> number
>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>>>> London W6 7AN.
>>>>>
>>>>>
>>>>>  --
>>>
>>> ------------------------------
>>> This message should be regarded as confidential. If you have received
>>> this
>>> email in error please notify the sender and destroy it immediately.
>>> Statements of intent shall only become binding when confirmed in hard
>>> copy
>>> by an authorised signatory.
>>>
>>> Zaizi Ltd is registered in England and Wales with the registration number
>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>> London W6 7AN.
>>>
>>>
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.

-- 

"This e-mail and any attachments transmitted with it are for the sole use 
of the intended recipient(s) and may contain confidential , proprietary or 
privileged information. If you are not the intended recipient, please 
contact the sender by reply e-mail and destroy all copies of the original 
message. Any unauthorized review, use, disclosure, dissemination, 
forwarding, printing or copying of this e-mail or any action taken in 
reliance on this e-mail is strictly prohibited and may be unlawful."

Re: New reference site with additonal DBpedia triples

Posted by Rafa Haro <rh...@zaizi.com>.
Hi Tarandeep,

How are you building your RDF dataset?

Cheers,

Rafa Haro

El 18/07/13 15:24, Sawhney, Tarandeep Singh escribió:
> Hi Rafa,
>
> Thanks for giving the pointer to resolve the issue. We tried below
> mentioned step but now the issue seems to be in how we are indexing.
>
> As you suggested we verified if we have the required indexes for additional
> fields in our reference site. We found out, that indexer had created all
> indexes except for fields in "dbpedia-owl<http://dbpedia.org/ontology/foundedBy>
> *:http://dbpedia.org/ontology/" *namespace
> *
> *
> For example, Please look at RDF ---> http://dbpedia.org/page/Adidas
>
> When we query our reference site for entity "Adidas", we are able to see
> all indexes except for fields in *dbpedia-owl* namespace
> *
> *
> Below are the LDPaths we used while querying our reference site.
>
> *
> name = rdfs:label[@en] :: xsd:string;
> comment = rdfs:comment[@en] :: xsd:string;
> categories = dc:subject :: xsd:anyURI;
> homepage = foaf:homepage :: xsd:anyURI;
> location = fn:concat("[",geo:lat,",",geo:long,"]") :: xsd:string;
> foafname = foaf:name :: xsd:string;
> abstract = dbp-ont:abstract[@en] :: xsd:string;
> type = rdf:type :: xsd:anyURI;
> foundedBy = dbp-ont:foundedBy :: xsd:anyURI;
> *
>
> We can see data for everything except for "abstract" and "foundedBy" since
> they are in namespace *dbpedia-owl <http://dbpedia.org/ontology/foundedBy>*
>
> So it means indexes for above two fields were not created by indexer and
> therefore we dont have them in the reference site and hence cant see them
> when entity is linked
>
> We have not changed any default settings while running indexer
>
> Can you please provide further help in order to find out what we are missing
>
> best regards
> tarandeep
>
>
> On Thu, Jul 18, 2013 at 4:22 PM, Rafa Haro <rh...@zaizi.com> wrote:
>
>> Hi Tarandeep,
>>
>> Thanks, it's quite more clear now :-). Have you check if the information
>> you need (for example dbp-ont:capital) is actually in the index?. You can
>> check it for example looking for "India" entity directly in your custom
>> DBpedia site in the EntityHub.
>>
>> Regards
>>
>> El 18/07/13 12:30, Sawhney, Tarandeep Singh escribió:
>>
>>> No problems Rafa, may be i didnt explain with details/clarity.
>>>
>>> We are using custom ontology to extract custom entities from text and then
>>> we want to them to link with DBpedia entities (in local dbpedia reference
>>> site).
>>>
>>> We found dbpedia reference doesnt have enough data that we need, so
>>> decided
>>> to download additional data for selected entities (related to fashion
>>> brands, fashion designers, company names) directly from dbpedia.org
>>>
>>> We then indexed these individual RDF files and created indexes with new
>>> reference site
>>>
>>> We then did not use DBpedia reference site, instead used our new reference
>>> site which has dbpedia data that we need with our new Entityhub linking
>>> engine
>>>
>>> But after we followed steps i mentioned in my earlier email, during
>>> enhancement, custom entities are getting de-referenced from my new
>>> reference site but i dont see additional data that i needed which exists
>>> in
>>> local cache.
>>>
>>> Hope this explains what we are trying to do, please let me know if some
>>> more information is required.
>>>
>>> Best regards
>>> tarandeep
>>>
>>>
>>>
>>>
>>> On Thu, Jul 18, 2013 at 3:21 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>
>>>   Hi Tarandeep,
>>>> El 18/07/13 11:18, Sawhney, Tarandeep Singh escribió:
>>>>
>>>>   Hi Rafa
>>>>> Thanks for your response
>>>>>
>>>>> Yes, we have tried the whole URI of the property (
>>>>> http://dbpedia.org/ontology/******capital<http://dbpedia.org/ontology/****capital>
>>>>> <http://dbpedia.org/**ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>> )<http://dbpedia.org/****ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>> )>
>>>>>
>>>>> also
>>>>> but it didn't help
>>>>>
>>>>> Yes we are using EntityHub cache to locally store with all the
>>>>> additional
>>>>> information we pulled from Dbpedia.org
>>>>>
>>>>> In the documentation provided at
>>>>> http://stanbol.apache.org/****docs/trunk/customvocabulary.****html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>> <http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>
>>>>> it is mentioned --->
>>>>>
>>>>> *Optionally, if your data do use namespaces that are not present in
>>>>>
>>>>> prefix.cc (or the server used for indexing does not have internet
>>>>> connectivity) you can manually define required prefixes by
>>>>> creating/using
>>>>> the a indexing/config/****namespaceprefix.mappings file
>>>>>
>>>>> *
>>>>> *
>>>>>
>>>>> *
>>>>> Can we get some inputs on if some changes to this file are required
>>>>> while
>>>>> using DBpedia data
>>>>>
>>>>>   This file can be used at 'indexing time' when you use the indexing tool
>>>> for creating the index for the DBpedia site. I have just seen that
>>>> dbp-ont
>>>> is already included as prefix. What I don't have clear right now is if
>>>> you
>>>> are generating your own dbpedia index including all the dbpedia ontology
>>>> properties (that should be a enormous index) or if you are generating an
>>>> index each time you need a new entity or even you are trying to retrieve
>>>> the entities from dbpedia in a 'live' way :-). Sorry I'm confused about
>>>> your workflow.
>>>>
>>>>
>>>>   Also, looks like we are missing on some configurations in the overall
>>>>> process, so if dev community can please provide help, it will be much
>>>>> appreciated
>>>>>
>>>>> best regards
>>>>> tarandeep
>>>>>
>>>>>
>>>>> On Thu, Jul 18, 2013 at 1:38 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>>
>>>>>    Hi Tarandeep,
>>>>>
>>>>>> Have you tried using the whole URI of the property (
>>>>>> http://dbpedia.org/ontology/******capital<http://dbpedia.org/ontology/****capital>
>>>>>> <http://dbpedia.org/**ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>>> )<http://dbpedia.org/****ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>> )>
>>>>>>
>>>>>> ??
>>>>>>
>>>>>> Anyway, maybe it is a better idea to change your workflow, because I
>>>>>> suppose that your example about "India" entity is something that could
>>>>>> happen to you with more entities because the default DBpedia site in
>>>>>> Stanbol doesn't contain information about dbp-ont properties. I would
>>>>>> suggest to use EntityHub cache to locally store entities with all the
>>>>>> information you need directly from DBpedia. So, maybe you can try to
>>>>>> directly retrieve the entities from any DBpedia endpoint, store them in
>>>>>> the
>>>>>> EntityHub cache to ensure that you can use it later as your
>>>>>> convenience.
>>>>>> Maybe the workflow could be the following:
>>>>>>
>>>>>> 1. Enhance a document using Stanbol DBpedia site for linking.
>>>>>> 2. For each extracted entity:
>>>>>>            2.1. If the entity is already store in the EntityHub, get it
>>>>>> using
>>>>>> LDPath for dereferencing.
>>>>>>            2.2. If not, retrieve the entity from DBpedia endpoint as RDF
>>>>>> data
>>>>>> and store it in the EntityHub. Then retrieve it
>>>>>>
>>>>>> I would day that this is currently possible in Stanbol, but maybe
>>>>>> someone
>>>>>> else in the list can give you more light with the issue.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió:
>>>>>>
>>>>>>    Hi All,
>>>>>>
>>>>>>> In the stanbol local cache we have limited triples in dbpedia
>>>>>>> reference
>>>>>>> site.
>>>>>>>
>>>>>>> We have a need to get more triples for entities which are present in
>>>>>>> dbpedia
>>>>>>> reference site. For example entity "India" has limited triples, so
>>>>>>> when
>>>>>>> we
>>>>>>> enhance text which has india, it gets us only information which is
>>>>>>> there
>>>>>>> in
>>>>>>> dbpedia reference site.
>>>>>>>
>>>>>>> We have followed below mentioned steps to add more RDF data for entity
>>>>>>> "India" by creating our own reference site.
>>>>>>>
>>>>>>> 1 - Downloaded rdf-data for 'India' from [1].
>>>>>>>
>>>>>>> 2 - Generated indexes for this rdf-data as suggested in article [2]
>>>>>>> with
>>>>>>> *Demo
>>>>>>> *as a reference site name.
>>>>>>>
>>>>>>>
>>>>>>> 3-  Initialized indexes within stanbol instance  as per [2].
>>>>>>>
>>>>>>> 4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with
>>>>>>> *Demo
>>>>>>> *as
>>>>>>>
>>>>>>> referenced site as per [3].
>>>>>>>          I have added *dbp-ont:capital *in *'"Fields used for
>>>>>>> derefrencing*
>>>>>>> "option.
>>>>>>>
>>>>>>> 5- Configured new weighted chain (*demoChain*).
>>>>>>>
>>>>>>> 6 - Now i am trying to enhance *"India is a country."* I am getting
>>>>>>> India
>>>>>>>
>>>>>>> as de-reference entity but unable to get any new information related
>>>>>>> to *dbp-ont:capital
>>>>>>> *which exists in my new reference site *Demo, *which in this case
>>>>>>> should
>>>>>>>
>>>>>>> give us URI value of "New Delhi"
>>>>>>>
>>>>>>> [1] http://dbpedia.org/page/India
>>>>>>> [2] http://stanbol.apache.org/******docs/trunk/customvocabulary.****
>>>>>>> **html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html>
>>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<
>>>>>>> http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>>> [3]
>>>>>>> http://stanbol.apache.org/******docs/trunk/components/****<http://stanbol.apache.org/****docs/trunk/components/****>
>>>>>>> enhancer/engines/**<http://**stanbol.apache.org/**docs/**
>>>>>>> trunk/components/**enhancer/**engines/**<http://stanbol.apache.org/**docs/trunk/components/**enhancer/engines/**>
>>>>>>> entityhublinking<http://**stan**bol.apache.org/docs/trunk/**<http://stanbol.apache.org/docs/trunk/**>
>>>>>>> components/enhancer/engines/****entityhublinking<http://**
>>>>>>> stanbol.apache.org/docs/trunk/**components/enhancer/engines/**
>>>>>>> entityhublinking<http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking>
>>>>>>>
>>>>>>> Can you please let me know if i am doing something wrong here or
>>>>>>> missing
>>>>>>> some configurations.
>>>>>>> Please let me know in case you need some more information on how we
>>>>>>> are
>>>>>>> trying to do it
>>>>>>>
>>>>>>> best regards
>>>>>>> tarandeep
>>>>>>>
>>>>>>>
>>>>>>>    --
>>>>>>>
>>>>>> ------------------------------
>>>>>> This message should be regarded as confidential. If you have received
>>>>>> this
>>>>>> email in error please notify the sender and destroy it immediately.
>>>>>> Statements of intent shall only become binding when confirmed in hard
>>>>>> copy
>>>>>> by an authorised signatory.
>>>>>>
>>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>>> number
>>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>>>>> London W6 7AN.
>>>>>>
>>>>>>   --
>>>> ------------------------------
>>>> This message should be regarded as confidential. If you have received
>>>> this
>>>> email in error please notify the sender and destroy it immediately.
>>>> Statements of intent shall only become binding when confirmed in hard
>>>> copy
>>>> by an authorised signatory.
>>>>
>>>> Zaizi Ltd is registered in England and Wales with the registration number
>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>>> London W6 7AN.
>>>>
>>>>
>> --
>>
>> ------------------------------
>> This message should be regarded as confidential. If you have received this
>> email in error please notify the sender and destroy it immediately.
>> Statements of intent shall only become binding when confirmed in hard copy
>> by an authorised signatory.
>>
>> Zaizi Ltd is registered in England and Wales with the registration number
>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>> London W6 7AN.
>>


-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: New reference site with additonal DBpedia triples

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Sawhney

If you want to index ALL rdf data with the Entityhub Indexing tool,
you can use a "mappings.txt" file with a single line "*".

Given your use case - managing a Entityhub Site with some selected
DBpedia entities - you should consider to configure a ManagedSite [1]
and using the RESTful CRUD interface to create/update Entities
downloaded from DBpedia. The Entityhub Indexing Tool is intended to be
used for dataset that do not change often (e.g. a domain taxonomy that
is released every few month, a Dataset like dbpedia that is released
once every year ...)

best
Rupert


[1] http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite.html

On Fri, Jul 19, 2013 at 10:07 AM, Sawhney, Tarandeep Singh
<ts...@innodata.com> wrote:
> Hi Rafa
>
> Finally we have been able to get it working
>
> We modified file "namespaceprefix.mappings" with two entries below
>
> *'dbpedia-owl\thttp://dbpedia.org/ontology/\n*
> *'dbpprop\thttp://dbpedia.org/property/\n*
> *
> *
> Also we modified file "mappings.txt" with below two enteries
>
> dbpedia-owl:*
> dbpprop:*
>
> And then we performed indexing and we were thenable to get indexes created
> for additional data we wanted from DBpedia RDF
>
> We then ran into issue similar to already reported jira issue
> https://issues.apache.org/jira/browse/STANBOL-519
> We followed steps to resolve this issue and then everything is working
>
> Thanks a lot for your prompt responses and giving us your valuable pointers
> to isolate the problem area
>
> best regards,
> tarandeep
>
>
> On Thu, Jul 18, 2013 at 10:36 PM, Sawhney, Tarandeep Singh <
> tsawhney@innodata.com> wrote:
>
>> Thanks Rafa
>>
>> I am querying my reference site through
>> http://localhost:9090/entityhub/site/<MyReferencesiteName>/find
>>
>> and now entered LDPath as below including dbp-ont prefix
>>
>> @prefix dbp-ont : <http://dbpedia.org/ontology/>;
>> name = rdfs:label[@en] :: xsd:string;
>> comment = rdfs:comment[@en] :: xsd:string;
>> categories = dc:subject :: xsd:anyURI;
>> homepage = foaf:homepage :: xsd:anyURI;
>> location = fn:concat("[",geo:lat,",",geo:long,"]") :: xsd:string;
>> foafname = foaf:name :: xsd:string;
>> abstract = dbp-ont:abstract[@en] :: xsd:string;
>> type = rdf:type :: xsd:anyURI;
>> foundedBy = dbp-ont:foundedBy :: xsd:anyURI;
>>
>> But still it doesnt get me data i have added for "foundedBy" and
>> "abstract" fields
>>
>> Is this you wanted me to try or you meant something else
>>
>> I am still not sure where the problem lies :-(  whether indexes are
>> created or accessing them has an issue.
>>
>> best regards,
>> tarandeep
>>
>>
>> On Thu, Jul 18, 2013 at 8:26 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>
>>> Hi Tarandeep,
>>>
>>> Have you included dbp-ont as prefix for http://dbpedia.org/ontology/ in
>>> your LDPath program? According to ldpath documentation (
>>> https://code.google.com/p/**ldpath/wiki/PathLanguage<https://code.google.com/p/ldpath/wiki/PathLanguage>)
>>> dbp-ont is not included as default prefix, so you might need to start your
>>> program with:
>>>
>>> @prefix dbp-ont : <http://dbpedia.org/ontology/>**;
>>>
>>> Hope that helps.
>>>
>>> Cheers,
>>>
>>> Rafa
>>>
>>> El 18/07/13 15:24, Sawhney, Tarandeep Singh escribió:
>>>
>>>> Hi Rafa,
>>>>
>>>> Thanks for giving the pointer to resolve the issue. We tried below
>>>> mentioned step but now the issue seems to be in how we are indexing.
>>>>
>>>> As you suggested we verified if we have the required indexes for
>>>> additional
>>>> fields in our reference site. We found out, that indexer had created all
>>>> indexes except for fields in "dbpedia-owl<http://dbpedia.**
>>>> org/ontology/foundedBy <http://dbpedia.org/ontology/foundedBy>>
>>>> *:http://dbpedia.org/ontology/**" *namespace
>>>> *
>>>> *
>>>> For example, Please look at RDF ---> http://dbpedia.org/page/Adidas
>>>>
>>>> When we query our reference site for entity "Adidas", we are able to see
>>>> all indexes except for fields in *dbpedia-owl* namespace
>>>> *
>>>>
>>>> *
>>>> Below are the LDPaths we used while querying our reference site.
>>>>
>>>> *
>>>> name = rdfs:label[@en] :: xsd:string;
>>>> comment = rdfs:comment[@en] :: xsd:string;
>>>> categories = dc:subject :: xsd:anyURI;
>>>> homepage = foaf:homepage :: xsd:anyURI;
>>>> location = fn:concat("[",geo:lat,",",geo:**long,"]") :: xsd:string;
>>>> foafname = foaf:name :: xsd:string;
>>>> abstract = dbp-ont:abstract[@en] :: xsd:string;
>>>> type = rdf:type :: xsd:anyURI;
>>>> foundedBy = dbp-ont:foundedBy :: xsd:anyURI;
>>>> *
>>>>
>>>> We can see data for everything except for "abstract" and "foundedBy"
>>>> since
>>>> they are in namespace *dbpedia-owl <http://dbpedia.org/ontology/**
>>>> foundedBy <http://dbpedia.org/ontology/foundedBy>>*
>>>>
>>>>
>>>> So it means indexes for above two fields were not created by indexer and
>>>> therefore we dont have them in the reference site and hence cant see them
>>>> when entity is linked
>>>>
>>>> We have not changed any default settings while running indexer
>>>>
>>>> Can you please provide further help in order to find out what we are
>>>> missing
>>>>
>>>> best regards
>>>> tarandeep
>>>>
>>>>
>>>> On Thu, Jul 18, 2013 at 4:22 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>
>>>>  Hi Tarandeep,
>>>>>
>>>>> Thanks, it's quite more clear now :-). Have you check if the information
>>>>> you need (for example dbp-ont:capital) is actually in the index?. You
>>>>> can
>>>>> check it for example looking for "India" entity directly in your custom
>>>>> DBpedia site in the EntityHub.
>>>>>
>>>>> Regards
>>>>>
>>>>> El 18/07/13 12:30, Sawhney, Tarandeep Singh escribió:
>>>>>
>>>>>  No problems Rafa, may be i didnt explain with details/clarity.
>>>>>>
>>>>>> We are using custom ontology to extract custom entities from text and
>>>>>> then
>>>>>> we want to them to link with DBpedia entities (in local dbpedia
>>>>>> reference
>>>>>> site).
>>>>>>
>>>>>> We found dbpedia reference doesnt have enough data that we need, so
>>>>>> decided
>>>>>> to download additional data for selected entities (related to fashion
>>>>>> brands, fashion designers, company names) directly from dbpedia.org
>>>>>>
>>>>>> We then indexed these individual RDF files and created indexes with new
>>>>>> reference site
>>>>>>
>>>>>> We then did not use DBpedia reference site, instead used our new
>>>>>> reference
>>>>>> site which has dbpedia data that we need with our new Entityhub linking
>>>>>> engine
>>>>>>
>>>>>> But after we followed steps i mentioned in my earlier email, during
>>>>>> enhancement, custom entities are getting de-referenced from my new
>>>>>> reference site but i dont see additional data that i needed which
>>>>>> exists
>>>>>> in
>>>>>> local cache.
>>>>>>
>>>>>> Hope this explains what we are trying to do, please let me know if some
>>>>>> more information is required.
>>>>>>
>>>>>> Best regards
>>>>>> tarandeep
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 18, 2013 at 3:21 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>>>
>>>>>>   Hi Tarandeep,
>>>>>>
>>>>>>> El 18/07/13 11:18, Sawhney, Tarandeep Singh escribió:
>>>>>>>
>>>>>>>   Hi Rafa
>>>>>>>
>>>>>>>> Thanks for your response
>>>>>>>>
>>>>>>>> Yes, we have tried the whole URI of the property (
>>>>>>>> http://dbpedia.org/ontology/********capital<http://dbpedia.org/ontology/******capital>
>>>>>>>> <http://dbpedia.**org/ontology/****capital<http://dbpedia.org/ontology/****capital>
>>>>>>>> >
>>>>>>>> <http://dbpedia.org/****ontology/**capital<http://dbpedia.org/**ontology/**capital>
>>>>>>>> <http://**dbpedia.org/ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>>>>> **>
>>>>>>>> )<http://dbpedia.org/******ontology/capital<http://dbpedia.org/****ontology/capital>
>>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>>>>> **>
>>>>>>>>  <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>>> >
>>>>>>>>
>>>>>>>> )>
>>>>>>>>
>>>>>>>> also
>>>>>>>> but it didn't help
>>>>>>>>
>>>>>>>> Yes we are using EntityHub cache to locally store with all the
>>>>>>>> additional
>>>>>>>> information we pulled from Dbpedia.org
>>>>>>>>
>>>>>>>> In the documentation provided at
>>>>>>>> http://stanbol.apache.org/******docs/trunk/customvocabulary.****
>>>>>>>> **html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html>
>>>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>>>>> >
>>>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<
>>>>>>>> http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>>>> >
>>>>>>>>
>>>>>>>> it is mentioned --->
>>>>>>>>
>>>>>>>> *Optionally, if your data do use namespaces that are not present in
>>>>>>>>
>>>>>>>> prefix.cc (or the server used for indexing does not have internet
>>>>>>>> connectivity) you can manually define required prefixes by
>>>>>>>> creating/using
>>>>>>>> the a indexing/config/******namespaceprefix.mappings file
>>>>>>>>
>>>>>>>>
>>>>>>>> *
>>>>>>>> *
>>>>>>>>
>>>>>>>> *
>>>>>>>> Can we get some inputs on if some changes to this file are required
>>>>>>>> while
>>>>>>>> using DBpedia data
>>>>>>>>
>>>>>>>>   This file can be used at 'indexing time' when you use the indexing
>>>>>>>> tool
>>>>>>>>
>>>>>>> for creating the index for the DBpedia site. I have just seen that
>>>>>>> dbp-ont
>>>>>>> is already included as prefix. What I don't have clear right now is if
>>>>>>> you
>>>>>>> are generating your own dbpedia index including all the dbpedia
>>>>>>> ontology
>>>>>>> properties (that should be a enormous index) or if you are generating
>>>>>>> an
>>>>>>> index each time you need a new entity or even you are trying to
>>>>>>> retrieve
>>>>>>> the entities from dbpedia in a 'live' way :-). Sorry I'm confused
>>>>>>> about
>>>>>>> your workflow.
>>>>>>>
>>>>>>>
>>>>>>>   Also, looks like we are missing on some configurations in the
>>>>>>> overall
>>>>>>>
>>>>>>>> process, so if dev community can please provide help, it will be much
>>>>>>>> appreciated
>>>>>>>>
>>>>>>>> best regards
>>>>>>>> tarandeep
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 18, 2013 at 1:38 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>>>>>
>>>>>>>>    Hi Tarandeep,
>>>>>>>>
>>>>>>>>  Have you tried using the whole URI of the property (
>>>>>>>>> http://dbpedia.org/ontology/********capital<http://dbpedia.org/ontology/******capital>
>>>>>>>>> <http://dbpedia.**org/ontology/****capital<http://dbpedia.org/ontology/****capital>
>>>>>>>>> >
>>>>>>>>> <http://dbpedia.org/****ontology/**capital<http://dbpedia.org/**ontology/**capital>
>>>>>>>>> <http://**dbpedia.org/ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>>>>>> **>
>>>>>>>>> )<http://dbpedia.org/******ontology/capital<http://dbpedia.org/****ontology/capital>
>>>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>>>>>> **>
>>>>>>>>>  <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>> )>
>>>>>>>>>
>>>>>>>>> ??
>>>>>>>>>
>>>>>>>>> Anyway, maybe it is a better idea to change your workflow, because I
>>>>>>>>> suppose that your example about "India" entity is something that
>>>>>>>>> could
>>>>>>>>> happen to you with more entities because the default DBpedia site in
>>>>>>>>> Stanbol doesn't contain information about dbp-ont properties. I
>>>>>>>>> would
>>>>>>>>> suggest to use EntityHub cache to locally store entities with all
>>>>>>>>> the
>>>>>>>>> information you need directly from DBpedia. So, maybe you can try to
>>>>>>>>> directly retrieve the entities from any DBpedia endpoint, store
>>>>>>>>> them in
>>>>>>>>> the
>>>>>>>>> EntityHub cache to ensure that you can use it later as your
>>>>>>>>> convenience.
>>>>>>>>> Maybe the workflow could be the following:
>>>>>>>>>
>>>>>>>>> 1. Enhance a document using Stanbol DBpedia site for linking.
>>>>>>>>> 2. For each extracted entity:
>>>>>>>>>            2.1. If the entity is already store in the EntityHub,
>>>>>>>>> get it
>>>>>>>>> using
>>>>>>>>> LDPath for dereferencing.
>>>>>>>>>            2.2. If not, retrieve the entity from DBpedia endpoint
>>>>>>>>> as RDF
>>>>>>>>> data
>>>>>>>>> and store it in the EntityHub. Then retrieve it
>>>>>>>>>
>>>>>>>>> I would day that this is currently possible in Stanbol, but maybe
>>>>>>>>> someone
>>>>>>>>> else in the list can give you more light with the issue.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió:
>>>>>>>>>
>>>>>>>>>    Hi All,
>>>>>>>>>
>>>>>>>>>  In the stanbol local cache we have limited triples in dbpedia
>>>>>>>>>> reference
>>>>>>>>>> site.
>>>>>>>>>>
>>>>>>>>>> We have a need to get more triples for entities which are present
>>>>>>>>>> in
>>>>>>>>>> dbpedia
>>>>>>>>>> reference site. For example entity "India" has limited triples, so
>>>>>>>>>> when
>>>>>>>>>> we
>>>>>>>>>> enhance text which has india, it gets us only information which is
>>>>>>>>>> there
>>>>>>>>>> in
>>>>>>>>>> dbpedia reference site.
>>>>>>>>>>
>>>>>>>>>> We have followed below mentioned steps to add more RDF data for
>>>>>>>>>> entity
>>>>>>>>>> "India" by creating our own reference site.
>>>>>>>>>>
>>>>>>>>>> 1 - Downloaded rdf-data for 'India' from [1].
>>>>>>>>>>
>>>>>>>>>> 2 - Generated indexes for this rdf-data as suggested in article [2]
>>>>>>>>>> with
>>>>>>>>>> *Demo
>>>>>>>>>> *as a reference site name.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 3-  Initialized indexes within stanbol instance  as per [2].
>>>>>>>>>>
>>>>>>>>>> 4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with
>>>>>>>>>> *Demo
>>>>>>>>>> *as
>>>>>>>>>>
>>>>>>>>>> referenced site as per [3].
>>>>>>>>>>          I have added *dbp-ont:capital *in *'"Fields used for
>>>>>>>>>> derefrencing*
>>>>>>>>>> "option.
>>>>>>>>>>
>>>>>>>>>> 5- Configured new weighted chain (*demoChain*).
>>>>>>>>>>
>>>>>>>>>> 6 - Now i am trying to enhance *"India is a country."* I am getting
>>>>>>>>>> India
>>>>>>>>>>
>>>>>>>>>> as de-reference entity but unable to get any new information
>>>>>>>>>> related
>>>>>>>>>> to *dbp-ont:capital
>>>>>>>>>> *which exists in my new reference site *Demo, *which in this case
>>>>>>>>>> should
>>>>>>>>>>
>>>>>>>>>> give us URI value of "New Delhi"
>>>>>>>>>>
>>>>>>>>>> [1] http://dbpedia.org/page/India
>>>>>>>>>> [2] http://stanbol.apache.org/********docs/trunk/customvocabulary.
>>>>>>>>>> ******<http://stanbol.apache.org/******docs/trunk/customvocabulary.****>
>>>>>>>>>> **html<http://stanbol.apache.**org/****docs/trunk/**
>>>>>>>>>> customvocabulary.****html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html>
>>>>>>>>>> >
>>>>>>>>>> <http://stanbol.apache.**org/****docs/trunk/****
>>>>>>>>>> customvocabulary.**html<http:/**/stanbol.apache.org/**docs/**
>>>>>>>>>> trunk/customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>>>>>>> >
>>>>>>>>>> <http://stanbol.apache.**org/****docs/trunk/****
>>>>>>>>>> customvocabulary.**html<
>>>>>>>>>> http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**
>>>>>>>>>> html<http://stanbol.apache.**org/docs/trunk/**
>>>>>>>>>> customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>>>>>> >
>>>>>>>>>> [3]
>>>>>>>>>> http://stanbol.apache.org/********docs/trunk/components/****<http://stanbol.apache.org/******docs/trunk/components/****>
>>>>>>>>>> <h**ttp://stanbol.apache.org/******docs/trunk/components/****<http://stanbol.apache.org/****docs/trunk/components/****>
>>>>>>>>>> >
>>>>>>>>>> enhancer/engines/**<http://**s**tanbol.apache.org/**docs/**<http://stanbol.apache.org/**docs/**>
>>>>>>>>>> trunk/components/**enhancer/****engines/**<http://stanbol.**
>>>>>>>>>> apache.org/**docs/trunk/**components/**enhancer/engines/****<http://stanbol.apache.org/**docs/trunk/components/**enhancer/engines/**>
>>>>>>>>>> >
>>>>>>>>>> entityhublinking<http://****stan**bol.apache.org/docs/**trunk/**<http://bol.apache.org/docs/trunk/**>
>>>>>>>>>> <http://stanbol.**apache.org/docs/trunk/**<http://stanbol.apache.org/docs/trunk/**>
>>>>>>>>>> >
>>>>>>>>>> components/enhancer/engines/******entityhublinking<http://**
>>>>>>>>>> stanbol.apache.org/docs/trunk/****components/enhancer/engines/****<http://stanbol.apache.org/docs/trunk/**components/enhancer/engines/**>
>>>>>>>>>> entityhublinking<http://**stanbol.apache.org/docs/trunk/**
>>>>>>>>>> components/enhancer/engines/**entityhublinking<http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking>
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>> Can you please let me know if i am doing something wrong here or
>>>>>>>>>> missing
>>>>>>>>>> some configurations.
>>>>>>>>>> Please let me know in case you need some more information on how we
>>>>>>>>>> are
>>>>>>>>>> trying to do it
>>>>>>>>>>
>>>>>>>>>> best regards
>>>>>>>>>> tarandeep
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    --
>>>>>>>>>>
>>>>>>>>>>  ------------------------------
>>>>>>>>> This message should be regarded as confidential. If you have
>>>>>>>>> received
>>>>>>>>> this
>>>>>>>>> email in error please notify the sender and destroy it immediately.
>>>>>>>>> Statements of intent shall only become binding when confirmed in
>>>>>>>>> hard
>>>>>>>>> copy
>>>>>>>>> by an authorised signatory.
>>>>>>>>>
>>>>>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>>>>>> number
>>>>>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>>>>>>>>> Road,
>>>>>>>>> London W6 7AN.
>>>>>>>>>
>>>>>>>>>   --
>>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>> This message should be regarded as confidential. If you have received
>>>>>>> this
>>>>>>> email in error please notify the sender and destroy it immediately.
>>>>>>> Statements of intent shall only become binding when confirmed in hard
>>>>>>> copy
>>>>>>> by an authorised signatory.
>>>>>>>
>>>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>>>> number
>>>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>>>>>>> Road,
>>>>>>> London W6 7AN.
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>
>>>>> ------------------------------
>>>>> This message should be regarded as confidential. If you have received
>>>>> this
>>>>> email in error please notify the sender and destroy it immediately.
>>>>> Statements of intent shall only become binding when confirmed in hard
>>>>> copy
>>>>> by an authorised signatory.
>>>>>
>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>> number
>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>>>> London W6 7AN.
>>>>>
>>>>>
>>>
>>> --
>>>
>>> ------------------------------
>>> This message should be regarded as confidential. If you have received
>>> this email in error please notify the sender and destroy it immediately.
>>> Statements of intent shall only become binding when confirmed in hard copy
>>> by an authorised signatory.
>>>
>>> Zaizi Ltd is registered in England and Wales with the registration number
>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>> London W6 7AN.
>>>
>>
>>
>
> --
>
> "This e-mail and any attachments transmitted with it are for the sole use
> of the intended recipient(s) and may contain confidential , proprietary or
> privileged information. If you are not the intended recipient, please
> contact the sender by reply e-mail and destroy all copies of the original
> message. Any unauthorized review, use, disclosure, dissemination,
> forwarding, printing or copying of this e-mail or any action taken in
> reliance on this e-mail is strictly prohibited and may be unlawful."



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: New reference site with additonal DBpedia triples

Posted by "Sawhney, Tarandeep Singh" <ts...@innodata.com>.
Hi Rafa

Finally we have been able to get it working

We modified file "namespaceprefix.mappings" with two entries below

*'dbpedia-owl\thttp://dbpedia.org/ontology/\n*
*'dbpprop\thttp://dbpedia.org/property/\n*
*
*
Also we modified file "mappings.txt" with below two enteries

dbpedia-owl:*
dbpprop:*

And then we performed indexing and we were thenable to get indexes created
for additional data we wanted from DBpedia RDF

We then ran into issue similar to already reported jira issue
https://issues.apache.org/jira/browse/STANBOL-519
We followed steps to resolve this issue and then everything is working

Thanks a lot for your prompt responses and giving us your valuable pointers
to isolate the problem area

best regards,
tarandeep


On Thu, Jul 18, 2013 at 10:36 PM, Sawhney, Tarandeep Singh <
tsawhney@innodata.com> wrote:

> Thanks Rafa
>
> I am querying my reference site through
> http://localhost:9090/entityhub/site/<MyReferencesiteName>/find
>
> and now entered LDPath as below including dbp-ont prefix
>
> @prefix dbp-ont : <http://dbpedia.org/ontology/>;
> name = rdfs:label[@en] :: xsd:string;
> comment = rdfs:comment[@en] :: xsd:string;
> categories = dc:subject :: xsd:anyURI;
> homepage = foaf:homepage :: xsd:anyURI;
> location = fn:concat("[",geo:lat,",",geo:long,"]") :: xsd:string;
> foafname = foaf:name :: xsd:string;
> abstract = dbp-ont:abstract[@en] :: xsd:string;
> type = rdf:type :: xsd:anyURI;
> foundedBy = dbp-ont:foundedBy :: xsd:anyURI;
>
> But still it doesnt get me data i have added for "foundedBy" and
> "abstract" fields
>
> Is this you wanted me to try or you meant something else
>
> I am still not sure where the problem lies :-(  whether indexes are
> created or accessing them has an issue.
>
> best regards,
> tarandeep
>
>
> On Thu, Jul 18, 2013 at 8:26 PM, Rafa Haro <rh...@zaizi.com> wrote:
>
>> Hi Tarandeep,
>>
>> Have you included dbp-ont as prefix for http://dbpedia.org/ontology/ in
>> your LDPath program? According to ldpath documentation (
>> https://code.google.com/p/**ldpath/wiki/PathLanguage<https://code.google.com/p/ldpath/wiki/PathLanguage>)
>> dbp-ont is not included as default prefix, so you might need to start your
>> program with:
>>
>> @prefix dbp-ont : <http://dbpedia.org/ontology/>**;
>>
>> Hope that helps.
>>
>> Cheers,
>>
>> Rafa
>>
>> El 18/07/13 15:24, Sawhney, Tarandeep Singh escribió:
>>
>>> Hi Rafa,
>>>
>>> Thanks for giving the pointer to resolve the issue. We tried below
>>> mentioned step but now the issue seems to be in how we are indexing.
>>>
>>> As you suggested we verified if we have the required indexes for
>>> additional
>>> fields in our reference site. We found out, that indexer had created all
>>> indexes except for fields in "dbpedia-owl<http://dbpedia.**
>>> org/ontology/foundedBy <http://dbpedia.org/ontology/foundedBy>>
>>> *:http://dbpedia.org/ontology/**" *namespace
>>> *
>>> *
>>> For example, Please look at RDF ---> http://dbpedia.org/page/Adidas
>>>
>>> When we query our reference site for entity "Adidas", we are able to see
>>> all indexes except for fields in *dbpedia-owl* namespace
>>> *
>>>
>>> *
>>> Below are the LDPaths we used while querying our reference site.
>>>
>>> *
>>> name = rdfs:label[@en] :: xsd:string;
>>> comment = rdfs:comment[@en] :: xsd:string;
>>> categories = dc:subject :: xsd:anyURI;
>>> homepage = foaf:homepage :: xsd:anyURI;
>>> location = fn:concat("[",geo:lat,",",geo:**long,"]") :: xsd:string;
>>> foafname = foaf:name :: xsd:string;
>>> abstract = dbp-ont:abstract[@en] :: xsd:string;
>>> type = rdf:type :: xsd:anyURI;
>>> foundedBy = dbp-ont:foundedBy :: xsd:anyURI;
>>> *
>>>
>>> We can see data for everything except for "abstract" and "foundedBy"
>>> since
>>> they are in namespace *dbpedia-owl <http://dbpedia.org/ontology/**
>>> foundedBy <http://dbpedia.org/ontology/foundedBy>>*
>>>
>>>
>>> So it means indexes for above two fields were not created by indexer and
>>> therefore we dont have them in the reference site and hence cant see them
>>> when entity is linked
>>>
>>> We have not changed any default settings while running indexer
>>>
>>> Can you please provide further help in order to find out what we are
>>> missing
>>>
>>> best regards
>>> tarandeep
>>>
>>>
>>> On Thu, Jul 18, 2013 at 4:22 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>
>>>  Hi Tarandeep,
>>>>
>>>> Thanks, it's quite more clear now :-). Have you check if the information
>>>> you need (for example dbp-ont:capital) is actually in the index?. You
>>>> can
>>>> check it for example looking for "India" entity directly in your custom
>>>> DBpedia site in the EntityHub.
>>>>
>>>> Regards
>>>>
>>>> El 18/07/13 12:30, Sawhney, Tarandeep Singh escribió:
>>>>
>>>>  No problems Rafa, may be i didnt explain with details/clarity.
>>>>>
>>>>> We are using custom ontology to extract custom entities from text and
>>>>> then
>>>>> we want to them to link with DBpedia entities (in local dbpedia
>>>>> reference
>>>>> site).
>>>>>
>>>>> We found dbpedia reference doesnt have enough data that we need, so
>>>>> decided
>>>>> to download additional data for selected entities (related to fashion
>>>>> brands, fashion designers, company names) directly from dbpedia.org
>>>>>
>>>>> We then indexed these individual RDF files and created indexes with new
>>>>> reference site
>>>>>
>>>>> We then did not use DBpedia reference site, instead used our new
>>>>> reference
>>>>> site which has dbpedia data that we need with our new Entityhub linking
>>>>> engine
>>>>>
>>>>> But after we followed steps i mentioned in my earlier email, during
>>>>> enhancement, custom entities are getting de-referenced from my new
>>>>> reference site but i dont see additional data that i needed which
>>>>> exists
>>>>> in
>>>>> local cache.
>>>>>
>>>>> Hope this explains what we are trying to do, please let me know if some
>>>>> more information is required.
>>>>>
>>>>> Best regards
>>>>> tarandeep
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 18, 2013 at 3:21 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>>
>>>>>   Hi Tarandeep,
>>>>>
>>>>>> El 18/07/13 11:18, Sawhney, Tarandeep Singh escribió:
>>>>>>
>>>>>>   Hi Rafa
>>>>>>
>>>>>>> Thanks for your response
>>>>>>>
>>>>>>> Yes, we have tried the whole URI of the property (
>>>>>>> http://dbpedia.org/ontology/********capital<http://dbpedia.org/ontology/******capital>
>>>>>>> <http://dbpedia.**org/ontology/****capital<http://dbpedia.org/ontology/****capital>
>>>>>>> >
>>>>>>> <http://dbpedia.org/****ontology/**capital<http://dbpedia.org/**ontology/**capital>
>>>>>>> <http://**dbpedia.org/ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>>>> **>
>>>>>>> )<http://dbpedia.org/******ontology/capital<http://dbpedia.org/****ontology/capital>
>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>>>> **>
>>>>>>>  <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>> >
>>>>>>>
>>>>>>> )>
>>>>>>>
>>>>>>> also
>>>>>>> but it didn't help
>>>>>>>
>>>>>>> Yes we are using EntityHub cache to locally store with all the
>>>>>>> additional
>>>>>>> information we pulled from Dbpedia.org
>>>>>>>
>>>>>>> In the documentation provided at
>>>>>>> http://stanbol.apache.org/******docs/trunk/customvocabulary.****
>>>>>>> **html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html>
>>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>>>> >
>>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<
>>>>>>> http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>>> >
>>>>>>>
>>>>>>> it is mentioned --->
>>>>>>>
>>>>>>> *Optionally, if your data do use namespaces that are not present in
>>>>>>>
>>>>>>> prefix.cc (or the server used for indexing does not have internet
>>>>>>> connectivity) you can manually define required prefixes by
>>>>>>> creating/using
>>>>>>> the a indexing/config/******namespaceprefix.mappings file
>>>>>>>
>>>>>>>
>>>>>>> *
>>>>>>> *
>>>>>>>
>>>>>>> *
>>>>>>> Can we get some inputs on if some changes to this file are required
>>>>>>> while
>>>>>>> using DBpedia data
>>>>>>>
>>>>>>>   This file can be used at 'indexing time' when you use the indexing
>>>>>>> tool
>>>>>>>
>>>>>> for creating the index for the DBpedia site. I have just seen that
>>>>>> dbp-ont
>>>>>> is already included as prefix. What I don't have clear right now is if
>>>>>> you
>>>>>> are generating your own dbpedia index including all the dbpedia
>>>>>> ontology
>>>>>> properties (that should be a enormous index) or if you are generating
>>>>>> an
>>>>>> index each time you need a new entity or even you are trying to
>>>>>> retrieve
>>>>>> the entities from dbpedia in a 'live' way :-). Sorry I'm confused
>>>>>> about
>>>>>> your workflow.
>>>>>>
>>>>>>
>>>>>>   Also, looks like we are missing on some configurations in the
>>>>>> overall
>>>>>>
>>>>>>> process, so if dev community can please provide help, it will be much
>>>>>>> appreciated
>>>>>>>
>>>>>>> best regards
>>>>>>> tarandeep
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 18, 2013 at 1:38 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>>>>
>>>>>>>    Hi Tarandeep,
>>>>>>>
>>>>>>>  Have you tried using the whole URI of the property (
>>>>>>>> http://dbpedia.org/ontology/********capital<http://dbpedia.org/ontology/******capital>
>>>>>>>> <http://dbpedia.**org/ontology/****capital<http://dbpedia.org/ontology/****capital>
>>>>>>>> >
>>>>>>>> <http://dbpedia.org/****ontology/**capital<http://dbpedia.org/**ontology/**capital>
>>>>>>>> <http://**dbpedia.org/ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>>>>> **>
>>>>>>>> )<http://dbpedia.org/******ontology/capital<http://dbpedia.org/****ontology/capital>
>>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>>>>> **>
>>>>>>>>  <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>>> >
>>>>>>>>
>>>>>>>> )>
>>>>>>>>
>>>>>>>> ??
>>>>>>>>
>>>>>>>> Anyway, maybe it is a better idea to change your workflow, because I
>>>>>>>> suppose that your example about "India" entity is something that
>>>>>>>> could
>>>>>>>> happen to you with more entities because the default DBpedia site in
>>>>>>>> Stanbol doesn't contain information about dbp-ont properties. I
>>>>>>>> would
>>>>>>>> suggest to use EntityHub cache to locally store entities with all
>>>>>>>> the
>>>>>>>> information you need directly from DBpedia. So, maybe you can try to
>>>>>>>> directly retrieve the entities from any DBpedia endpoint, store
>>>>>>>> them in
>>>>>>>> the
>>>>>>>> EntityHub cache to ensure that you can use it later as your
>>>>>>>> convenience.
>>>>>>>> Maybe the workflow could be the following:
>>>>>>>>
>>>>>>>> 1. Enhance a document using Stanbol DBpedia site for linking.
>>>>>>>> 2. For each extracted entity:
>>>>>>>>            2.1. If the entity is already store in the EntityHub,
>>>>>>>> get it
>>>>>>>> using
>>>>>>>> LDPath for dereferencing.
>>>>>>>>            2.2. If not, retrieve the entity from DBpedia endpoint
>>>>>>>> as RDF
>>>>>>>> data
>>>>>>>> and store it in the EntityHub. Then retrieve it
>>>>>>>>
>>>>>>>> I would day that this is currently possible in Stanbol, but maybe
>>>>>>>> someone
>>>>>>>> else in the list can give you more light with the issue.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió:
>>>>>>>>
>>>>>>>>    Hi All,
>>>>>>>>
>>>>>>>>  In the stanbol local cache we have limited triples in dbpedia
>>>>>>>>> reference
>>>>>>>>> site.
>>>>>>>>>
>>>>>>>>> We have a need to get more triples for entities which are present
>>>>>>>>> in
>>>>>>>>> dbpedia
>>>>>>>>> reference site. For example entity "India" has limited triples, so
>>>>>>>>> when
>>>>>>>>> we
>>>>>>>>> enhance text which has india, it gets us only information which is
>>>>>>>>> there
>>>>>>>>> in
>>>>>>>>> dbpedia reference site.
>>>>>>>>>
>>>>>>>>> We have followed below mentioned steps to add more RDF data for
>>>>>>>>> entity
>>>>>>>>> "India" by creating our own reference site.
>>>>>>>>>
>>>>>>>>> 1 - Downloaded rdf-data for 'India' from [1].
>>>>>>>>>
>>>>>>>>> 2 - Generated indexes for this rdf-data as suggested in article [2]
>>>>>>>>> with
>>>>>>>>> *Demo
>>>>>>>>> *as a reference site name.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 3-  Initialized indexes within stanbol instance  as per [2].
>>>>>>>>>
>>>>>>>>> 4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with
>>>>>>>>> *Demo
>>>>>>>>> *as
>>>>>>>>>
>>>>>>>>> referenced site as per [3].
>>>>>>>>>          I have added *dbp-ont:capital *in *'"Fields used for
>>>>>>>>> derefrencing*
>>>>>>>>> "option.
>>>>>>>>>
>>>>>>>>> 5- Configured new weighted chain (*demoChain*).
>>>>>>>>>
>>>>>>>>> 6 - Now i am trying to enhance *"India is a country."* I am getting
>>>>>>>>> India
>>>>>>>>>
>>>>>>>>> as de-reference entity but unable to get any new information
>>>>>>>>> related
>>>>>>>>> to *dbp-ont:capital
>>>>>>>>> *which exists in my new reference site *Demo, *which in this case
>>>>>>>>> should
>>>>>>>>>
>>>>>>>>> give us URI value of "New Delhi"
>>>>>>>>>
>>>>>>>>> [1] http://dbpedia.org/page/India
>>>>>>>>> [2] http://stanbol.apache.org/********docs/trunk/customvocabulary.
>>>>>>>>> ******<http://stanbol.apache.org/******docs/trunk/customvocabulary.****>
>>>>>>>>> **html<http://stanbol.apache.**org/****docs/trunk/**
>>>>>>>>> customvocabulary.****html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html>
>>>>>>>>> >
>>>>>>>>> <http://stanbol.apache.**org/****docs/trunk/****
>>>>>>>>> customvocabulary.**html<http:/**/stanbol.apache.org/**docs/**
>>>>>>>>> trunk/customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>>>>>> >
>>>>>>>>> <http://stanbol.apache.**org/****docs/trunk/****
>>>>>>>>> customvocabulary.**html<
>>>>>>>>> http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**
>>>>>>>>> html<http://stanbol.apache.**org/docs/trunk/**
>>>>>>>>> customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>>>>> >
>>>>>>>>> [3]
>>>>>>>>> http://stanbol.apache.org/********docs/trunk/components/****<http://stanbol.apache.org/******docs/trunk/components/****>
>>>>>>>>> <h**ttp://stanbol.apache.org/******docs/trunk/components/****<http://stanbol.apache.org/****docs/trunk/components/****>
>>>>>>>>> >
>>>>>>>>> enhancer/engines/**<http://**s**tanbol.apache.org/**docs/**<http://stanbol.apache.org/**docs/**>
>>>>>>>>> trunk/components/**enhancer/****engines/**<http://stanbol.**
>>>>>>>>> apache.org/**docs/trunk/**components/**enhancer/engines/****<http://stanbol.apache.org/**docs/trunk/components/**enhancer/engines/**>
>>>>>>>>> >
>>>>>>>>> entityhublinking<http://****stan**bol.apache.org/docs/**trunk/**<http://bol.apache.org/docs/trunk/**>
>>>>>>>>> <http://stanbol.**apache.org/docs/trunk/**<http://stanbol.apache.org/docs/trunk/**>
>>>>>>>>> >
>>>>>>>>> components/enhancer/engines/******entityhublinking<http://**
>>>>>>>>> stanbol.apache.org/docs/trunk/****components/enhancer/engines/****<http://stanbol.apache.org/docs/trunk/**components/enhancer/engines/**>
>>>>>>>>> entityhublinking<http://**stanbol.apache.org/docs/trunk/**
>>>>>>>>> components/enhancer/engines/**entityhublinking<http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking>
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>> Can you please let me know if i am doing something wrong here or
>>>>>>>>> missing
>>>>>>>>> some configurations.
>>>>>>>>> Please let me know in case you need some more information on how we
>>>>>>>>> are
>>>>>>>>> trying to do it
>>>>>>>>>
>>>>>>>>> best regards
>>>>>>>>> tarandeep
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    --
>>>>>>>>>
>>>>>>>>>  ------------------------------
>>>>>>>> This message should be regarded as confidential. If you have
>>>>>>>> received
>>>>>>>> this
>>>>>>>> email in error please notify the sender and destroy it immediately.
>>>>>>>> Statements of intent shall only become binding when confirmed in
>>>>>>>> hard
>>>>>>>> copy
>>>>>>>> by an authorised signatory.
>>>>>>>>
>>>>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>>>>> number
>>>>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>>>>>>>> Road,
>>>>>>>> London W6 7AN.
>>>>>>>>
>>>>>>>>   --
>>>>>>>>
>>>>>>> ------------------------------
>>>>>> This message should be regarded as confidential. If you have received
>>>>>> this
>>>>>> email in error please notify the sender and destroy it immediately.
>>>>>> Statements of intent shall only become binding when confirmed in hard
>>>>>> copy
>>>>>> by an authorised signatory.
>>>>>>
>>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>>> number
>>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>>>>>> Road,
>>>>>> London W6 7AN.
>>>>>>
>>>>>>
>>>>>>  --
>>>>
>>>> ------------------------------
>>>> This message should be regarded as confidential. If you have received
>>>> this
>>>> email in error please notify the sender and destroy it immediately.
>>>> Statements of intent shall only become binding when confirmed in hard
>>>> copy
>>>> by an authorised signatory.
>>>>
>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>> number
>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>>> London W6 7AN.
>>>>
>>>>
>>
>> --
>>
>> ------------------------------
>> This message should be regarded as confidential. If you have received
>> this email in error please notify the sender and destroy it immediately.
>> Statements of intent shall only become binding when confirmed in hard copy
>> by an authorised signatory.
>>
>> Zaizi Ltd is registered in England and Wales with the registration number
>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>> London W6 7AN.
>>
>
>

-- 

"This e-mail and any attachments transmitted with it are for the sole use 
of the intended recipient(s) and may contain confidential , proprietary or 
privileged information. If you are not the intended recipient, please 
contact the sender by reply e-mail and destroy all copies of the original 
message. Any unauthorized review, use, disclosure, dissemination, 
forwarding, printing or copying of this e-mail or any action taken in 
reliance on this e-mail is strictly prohibited and may be unlawful."

Re: New reference site with additonal DBpedia triples

Posted by "Sawhney, Tarandeep Singh" <ts...@innodata.com>.
Thanks Rafa

I am querying my reference site through
http://localhost:9090/entityhub/site/<MyReferencesiteName>/find

and now entered LDPath as below including dbp-ont prefix

@prefix dbp-ont : <http://dbpedia.org/ontology/>;
name = rdfs:label[@en] :: xsd:string;
comment = rdfs:comment[@en] :: xsd:string;
categories = dc:subject :: xsd:anyURI;
homepage = foaf:homepage :: xsd:anyURI;
location = fn:concat("[",geo:lat,",",geo:long,"]") :: xsd:string;
foafname = foaf:name :: xsd:string;
abstract = dbp-ont:abstract[@en] :: xsd:string;
type = rdf:type :: xsd:anyURI;
foundedBy = dbp-ont:foundedBy :: xsd:anyURI;

But still it doesnt get me data i have added for "foundedBy" and "abstract"
fields

Is this you wanted me to try or you meant something else

I am still not sure where the problem lies :-(  whether indexes are created
or accessing them has an issue.

best regards,
tarandeep


On Thu, Jul 18, 2013 at 8:26 PM, Rafa Haro <rh...@zaizi.com> wrote:

> Hi Tarandeep,
>
> Have you included dbp-ont as prefix for http://dbpedia.org/ontology/ in
> your LDPath program? According to ldpath documentation (
> https://code.google.com/p/**ldpath/wiki/PathLanguage<https://code.google.com/p/ldpath/wiki/PathLanguage>)
> dbp-ont is not included as default prefix, so you might need to start your
> program with:
>
> @prefix dbp-ont : <http://dbpedia.org/ontology/>**;
>
> Hope that helps.
>
> Cheers,
>
> Rafa
>
> El 18/07/13 15:24, Sawhney, Tarandeep Singh escribió:
>
>> Hi Rafa,
>>
>> Thanks for giving the pointer to resolve the issue. We tried below
>> mentioned step but now the issue seems to be in how we are indexing.
>>
>> As you suggested we verified if we have the required indexes for
>> additional
>> fields in our reference site. We found out, that indexer had created all
>> indexes except for fields in "dbpedia-owl<http://dbpedia.**
>> org/ontology/foundedBy <http://dbpedia.org/ontology/foundedBy>>
>> *:http://dbpedia.org/ontology/**" *namespace
>> *
>> *
>> For example, Please look at RDF ---> http://dbpedia.org/page/Adidas
>>
>> When we query our reference site for entity "Adidas", we are able to see
>> all indexes except for fields in *dbpedia-owl* namespace
>> *
>>
>> *
>> Below are the LDPaths we used while querying our reference site.
>>
>> *
>> name = rdfs:label[@en] :: xsd:string;
>> comment = rdfs:comment[@en] :: xsd:string;
>> categories = dc:subject :: xsd:anyURI;
>> homepage = foaf:homepage :: xsd:anyURI;
>> location = fn:concat("[",geo:lat,",",geo:**long,"]") :: xsd:string;
>> foafname = foaf:name :: xsd:string;
>> abstract = dbp-ont:abstract[@en] :: xsd:string;
>> type = rdf:type :: xsd:anyURI;
>> foundedBy = dbp-ont:foundedBy :: xsd:anyURI;
>> *
>>
>> We can see data for everything except for "abstract" and "foundedBy" since
>> they are in namespace *dbpedia-owl <http://dbpedia.org/ontology/**
>> foundedBy <http://dbpedia.org/ontology/foundedBy>>*
>>
>>
>> So it means indexes for above two fields were not created by indexer and
>> therefore we dont have them in the reference site and hence cant see them
>> when entity is linked
>>
>> We have not changed any default settings while running indexer
>>
>> Can you please provide further help in order to find out what we are
>> missing
>>
>> best regards
>> tarandeep
>>
>>
>> On Thu, Jul 18, 2013 at 4:22 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>
>>  Hi Tarandeep,
>>>
>>> Thanks, it's quite more clear now :-). Have you check if the information
>>> you need (for example dbp-ont:capital) is actually in the index?. You can
>>> check it for example looking for "India" entity directly in your custom
>>> DBpedia site in the EntityHub.
>>>
>>> Regards
>>>
>>> El 18/07/13 12:30, Sawhney, Tarandeep Singh escribió:
>>>
>>>  No problems Rafa, may be i didnt explain with details/clarity.
>>>>
>>>> We are using custom ontology to extract custom entities from text and
>>>> then
>>>> we want to them to link with DBpedia entities (in local dbpedia
>>>> reference
>>>> site).
>>>>
>>>> We found dbpedia reference doesnt have enough data that we need, so
>>>> decided
>>>> to download additional data for selected entities (related to fashion
>>>> brands, fashion designers, company names) directly from dbpedia.org
>>>>
>>>> We then indexed these individual RDF files and created indexes with new
>>>> reference site
>>>>
>>>> We then did not use DBpedia reference site, instead used our new
>>>> reference
>>>> site which has dbpedia data that we need with our new Entityhub linking
>>>> engine
>>>>
>>>> But after we followed steps i mentioned in my earlier email, during
>>>> enhancement, custom entities are getting de-referenced from my new
>>>> reference site but i dont see additional data that i needed which exists
>>>> in
>>>> local cache.
>>>>
>>>> Hope this explains what we are trying to do, please let me know if some
>>>> more information is required.
>>>>
>>>> Best regards
>>>> tarandeep
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jul 18, 2013 at 3:21 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>
>>>>   Hi Tarandeep,
>>>>
>>>>> El 18/07/13 11:18, Sawhney, Tarandeep Singh escribió:
>>>>>
>>>>>   Hi Rafa
>>>>>
>>>>>> Thanks for your response
>>>>>>
>>>>>> Yes, we have tried the whole URI of the property (
>>>>>> http://dbpedia.org/ontology/********capital<http://dbpedia.org/ontology/******capital>
>>>>>> <http://dbpedia.**org/ontology/****capital<http://dbpedia.org/ontology/****capital>
>>>>>> >
>>>>>> <http://dbpedia.org/****ontology/**capital<http://dbpedia.org/**ontology/**capital>
>>>>>> <http://**dbpedia.org/ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>>> **>
>>>>>> )<http://dbpedia.org/******ontology/capital<http://dbpedia.org/****ontology/capital>
>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>>> **>
>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>> >
>>>>>>
>>>>>> )>
>>>>>>
>>>>>> also
>>>>>> but it didn't help
>>>>>>
>>>>>> Yes we are using EntityHub cache to locally store with all the
>>>>>> additional
>>>>>> information we pulled from Dbpedia.org
>>>>>>
>>>>>> In the documentation provided at
>>>>>> http://stanbol.apache.org/******docs/trunk/customvocabulary.****
>>>>>> **html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html>
>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>>> >
>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<
>>>>>> http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>> >
>>>>>>
>>>>>> it is mentioned --->
>>>>>>
>>>>>> *Optionally, if your data do use namespaces that are not present in
>>>>>>
>>>>>> prefix.cc (or the server used for indexing does not have internet
>>>>>> connectivity) you can manually define required prefixes by
>>>>>> creating/using
>>>>>> the a indexing/config/******namespaceprefix.mappings file
>>>>>>
>>>>>>
>>>>>> *
>>>>>> *
>>>>>>
>>>>>> *
>>>>>> Can we get some inputs on if some changes to this file are required
>>>>>> while
>>>>>> using DBpedia data
>>>>>>
>>>>>>   This file can be used at 'indexing time' when you use the indexing
>>>>>> tool
>>>>>>
>>>>> for creating the index for the DBpedia site. I have just seen that
>>>>> dbp-ont
>>>>> is already included as prefix. What I don't have clear right now is if
>>>>> you
>>>>> are generating your own dbpedia index including all the dbpedia
>>>>> ontology
>>>>> properties (that should be a enormous index) or if you are generating
>>>>> an
>>>>> index each time you need a new entity or even you are trying to
>>>>> retrieve
>>>>> the entities from dbpedia in a 'live' way :-). Sorry I'm confused about
>>>>> your workflow.
>>>>>
>>>>>
>>>>>   Also, looks like we are missing on some configurations in the overall
>>>>>
>>>>>> process, so if dev community can please provide help, it will be much
>>>>>> appreciated
>>>>>>
>>>>>> best regards
>>>>>> tarandeep
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 18, 2013 at 1:38 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>>>
>>>>>>    Hi Tarandeep,
>>>>>>
>>>>>>  Have you tried using the whole URI of the property (
>>>>>>> http://dbpedia.org/ontology/********capital<http://dbpedia.org/ontology/******capital>
>>>>>>> <http://dbpedia.**org/ontology/****capital<http://dbpedia.org/ontology/****capital>
>>>>>>> >
>>>>>>> <http://dbpedia.org/****ontology/**capital<http://dbpedia.org/**ontology/**capital>
>>>>>>> <http://**dbpedia.org/ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>>>> **>
>>>>>>> )<http://dbpedia.org/******ontology/capital<http://dbpedia.org/****ontology/capital>
>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>>>> **>
>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>> >
>>>>>>>
>>>>>>> )>
>>>>>>>
>>>>>>> ??
>>>>>>>
>>>>>>> Anyway, maybe it is a better idea to change your workflow, because I
>>>>>>> suppose that your example about "India" entity is something that
>>>>>>> could
>>>>>>> happen to you with more entities because the default DBpedia site in
>>>>>>> Stanbol doesn't contain information about dbp-ont properties. I would
>>>>>>> suggest to use EntityHub cache to locally store entities with all the
>>>>>>> information you need directly from DBpedia. So, maybe you can try to
>>>>>>> directly retrieve the entities from any DBpedia endpoint, store them
>>>>>>> in
>>>>>>> the
>>>>>>> EntityHub cache to ensure that you can use it later as your
>>>>>>> convenience.
>>>>>>> Maybe the workflow could be the following:
>>>>>>>
>>>>>>> 1. Enhance a document using Stanbol DBpedia site for linking.
>>>>>>> 2. For each extracted entity:
>>>>>>>            2.1. If the entity is already store in the EntityHub, get
>>>>>>> it
>>>>>>> using
>>>>>>> LDPath for dereferencing.
>>>>>>>            2.2. If not, retrieve the entity from DBpedia endpoint as
>>>>>>> RDF
>>>>>>> data
>>>>>>> and store it in the EntityHub. Then retrieve it
>>>>>>>
>>>>>>> I would day that this is currently possible in Stanbol, but maybe
>>>>>>> someone
>>>>>>> else in the list can give you more light with the issue.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió:
>>>>>>>
>>>>>>>    Hi All,
>>>>>>>
>>>>>>>  In the stanbol local cache we have limited triples in dbpedia
>>>>>>>> reference
>>>>>>>> site.
>>>>>>>>
>>>>>>>> We have a need to get more triples for entities which are present in
>>>>>>>> dbpedia
>>>>>>>> reference site. For example entity "India" has limited triples, so
>>>>>>>> when
>>>>>>>> we
>>>>>>>> enhance text which has india, it gets us only information which is
>>>>>>>> there
>>>>>>>> in
>>>>>>>> dbpedia reference site.
>>>>>>>>
>>>>>>>> We have followed below mentioned steps to add more RDF data for
>>>>>>>> entity
>>>>>>>> "India" by creating our own reference site.
>>>>>>>>
>>>>>>>> 1 - Downloaded rdf-data for 'India' from [1].
>>>>>>>>
>>>>>>>> 2 - Generated indexes for this rdf-data as suggested in article [2]
>>>>>>>> with
>>>>>>>> *Demo
>>>>>>>> *as a reference site name.
>>>>>>>>
>>>>>>>>
>>>>>>>> 3-  Initialized indexes within stanbol instance  as per [2].
>>>>>>>>
>>>>>>>> 4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with
>>>>>>>> *Demo
>>>>>>>> *as
>>>>>>>>
>>>>>>>> referenced site as per [3].
>>>>>>>>          I have added *dbp-ont:capital *in *'"Fields used for
>>>>>>>> derefrencing*
>>>>>>>> "option.
>>>>>>>>
>>>>>>>> 5- Configured new weighted chain (*demoChain*).
>>>>>>>>
>>>>>>>> 6 - Now i am trying to enhance *"India is a country."* I am getting
>>>>>>>> India
>>>>>>>>
>>>>>>>> as de-reference entity but unable to get any new information related
>>>>>>>> to *dbp-ont:capital
>>>>>>>> *which exists in my new reference site *Demo, *which in this case
>>>>>>>> should
>>>>>>>>
>>>>>>>> give us URI value of "New Delhi"
>>>>>>>>
>>>>>>>> [1] http://dbpedia.org/page/India
>>>>>>>> [2] http://stanbol.apache.org/********docs/trunk/customvocabulary.*
>>>>>>>> *****<http://stanbol.apache.org/******docs/trunk/customvocabulary.****>
>>>>>>>> **html<http://stanbol.apache.**org/****docs/trunk/**
>>>>>>>> customvocabulary.****html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html>
>>>>>>>> >
>>>>>>>> <http://stanbol.apache.**org/****docs/trunk/****
>>>>>>>> customvocabulary.**html<http:/**/stanbol.apache.org/**docs/**
>>>>>>>> trunk/customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>>>>> >
>>>>>>>> <http://stanbol.apache.**org/****docs/trunk/****
>>>>>>>> customvocabulary.**html<
>>>>>>>> http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<
>>>>>>>> http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>>>> >
>>>>>>>> [3]
>>>>>>>> http://stanbol.apache.org/********docs/trunk/components/****<http://stanbol.apache.org/******docs/trunk/components/****>
>>>>>>>> <h**ttp://stanbol.apache.org/******docs/trunk/components/****<http://stanbol.apache.org/****docs/trunk/components/****>
>>>>>>>> >
>>>>>>>> enhancer/engines/**<http://**s**tanbol.apache.org/**docs/**<http://stanbol.apache.org/**docs/**>
>>>>>>>> trunk/components/**enhancer/****engines/**<http://stanbol.**
>>>>>>>> apache.org/**docs/trunk/**components/**enhancer/engines/****<http://stanbol.apache.org/**docs/trunk/components/**enhancer/engines/**>
>>>>>>>> >
>>>>>>>> entityhublinking<http://****stan**bol.apache.org/docs/**trunk/**<http://bol.apache.org/docs/trunk/**>
>>>>>>>> <http://stanbol.**apache.org/docs/trunk/**<http://stanbol.apache.org/docs/trunk/**>
>>>>>>>> >
>>>>>>>> components/enhancer/engines/******entityhublinking<http://**
>>>>>>>> stanbol.apache.org/docs/trunk/****components/enhancer/engines/****<http://stanbol.apache.org/docs/trunk/**components/enhancer/engines/**>
>>>>>>>> entityhublinking<http://**stanbol.apache.org/docs/trunk/**
>>>>>>>> components/enhancer/engines/**entityhublinking<http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking>
>>>>>>>> >
>>>>>>>>
>>>>>>>> Can you please let me know if i am doing something wrong here or
>>>>>>>> missing
>>>>>>>> some configurations.
>>>>>>>> Please let me know in case you need some more information on how we
>>>>>>>> are
>>>>>>>> trying to do it
>>>>>>>>
>>>>>>>> best regards
>>>>>>>> tarandeep
>>>>>>>>
>>>>>>>>
>>>>>>>>    --
>>>>>>>>
>>>>>>>>  ------------------------------
>>>>>>> This message should be regarded as confidential. If you have received
>>>>>>> this
>>>>>>> email in error please notify the sender and destroy it immediately.
>>>>>>> Statements of intent shall only become binding when confirmed in hard
>>>>>>> copy
>>>>>>> by an authorised signatory.
>>>>>>>
>>>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>>>> number
>>>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>>>>>>> Road,
>>>>>>> London W6 7AN.
>>>>>>>
>>>>>>>   --
>>>>>>>
>>>>>> ------------------------------
>>>>> This message should be regarded as confidential. If you have received
>>>>> this
>>>>> email in error please notify the sender and destroy it immediately.
>>>>> Statements of intent shall only become binding when confirmed in hard
>>>>> copy
>>>>> by an authorised signatory.
>>>>>
>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>> number
>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>>>> London W6 7AN.
>>>>>
>>>>>
>>>>>  --
>>>
>>> ------------------------------
>>> This message should be regarded as confidential. If you have received
>>> this
>>> email in error please notify the sender and destroy it immediately.
>>> Statements of intent shall only become binding when confirmed in hard
>>> copy
>>> by an authorised signatory.
>>>
>>> Zaizi Ltd is registered in England and Wales with the registration number
>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>> London W6 7AN.
>>>
>>>
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

-- 

"This e-mail and any attachments transmitted with it are for the sole use 
of the intended recipient(s) and may contain confidential , proprietary or 
privileged information. If you are not the intended recipient, please 
contact the sender by reply e-mail and destroy all copies of the original 
message. Any unauthorized review, use, disclosure, dissemination, 
forwarding, printing or copying of this e-mail or any action taken in 
reliance on this e-mail is strictly prohibited and may be unlawful."

Re: New reference site with additonal DBpedia triples

Posted by Rafa Haro <rh...@zaizi.com>.
Hi Tarandeep,

Have you included dbp-ont as prefix for http://dbpedia.org/ontology/ in 
your LDPath program? According to ldpath documentation 
(https://code.google.com/p/ldpath/wiki/PathLanguage) dbp-ont is not 
included as default prefix, so you might need to start your program with:

@prefix dbp-ont : <http://dbpedia.org/ontology/>;

Hope that helps.

Cheers,

Rafa

El 18/07/13 15:24, Sawhney, Tarandeep Singh escribió:
> Hi Rafa,
>
> Thanks for giving the pointer to resolve the issue. We tried below
> mentioned step but now the issue seems to be in how we are indexing.
>
> As you suggested we verified if we have the required indexes for additional
> fields in our reference site. We found out, that indexer had created all
> indexes except for fields in "dbpedia-owl<http://dbpedia.org/ontology/foundedBy>
> *:http://dbpedia.org/ontology/" *namespace
> *
> *
> For example, Please look at RDF ---> http://dbpedia.org/page/Adidas
>
> When we query our reference site for entity "Adidas", we are able to see
> all indexes except for fields in *dbpedia-owl* namespace
> *
> *
> Below are the LDPaths we used while querying our reference site.
>
> *
> name = rdfs:label[@en] :: xsd:string;
> comment = rdfs:comment[@en] :: xsd:string;
> categories = dc:subject :: xsd:anyURI;
> homepage = foaf:homepage :: xsd:anyURI;
> location = fn:concat("[",geo:lat,",",geo:long,"]") :: xsd:string;
> foafname = foaf:name :: xsd:string;
> abstract = dbp-ont:abstract[@en] :: xsd:string;
> type = rdf:type :: xsd:anyURI;
> foundedBy = dbp-ont:foundedBy :: xsd:anyURI;
> *
>
> We can see data for everything except for "abstract" and "foundedBy" since
> they are in namespace *dbpedia-owl <http://dbpedia.org/ontology/foundedBy>*
>
> So it means indexes for above two fields were not created by indexer and
> therefore we dont have them in the reference site and hence cant see them
> when entity is linked
>
> We have not changed any default settings while running indexer
>
> Can you please provide further help in order to find out what we are missing
>
> best regards
> tarandeep
>
>
> On Thu, Jul 18, 2013 at 4:22 PM, Rafa Haro <rh...@zaizi.com> wrote:
>
>> Hi Tarandeep,
>>
>> Thanks, it's quite more clear now :-). Have you check if the information
>> you need (for example dbp-ont:capital) is actually in the index?. You can
>> check it for example looking for "India" entity directly in your custom
>> DBpedia site in the EntityHub.
>>
>> Regards
>>
>> El 18/07/13 12:30, Sawhney, Tarandeep Singh escribió:
>>
>>> No problems Rafa, may be i didnt explain with details/clarity.
>>>
>>> We are using custom ontology to extract custom entities from text and then
>>> we want to them to link with DBpedia entities (in local dbpedia reference
>>> site).
>>>
>>> We found dbpedia reference doesnt have enough data that we need, so
>>> decided
>>> to download additional data for selected entities (related to fashion
>>> brands, fashion designers, company names) directly from dbpedia.org
>>>
>>> We then indexed these individual RDF files and created indexes with new
>>> reference site
>>>
>>> We then did not use DBpedia reference site, instead used our new reference
>>> site which has dbpedia data that we need with our new Entityhub linking
>>> engine
>>>
>>> But after we followed steps i mentioned in my earlier email, during
>>> enhancement, custom entities are getting de-referenced from my new
>>> reference site but i dont see additional data that i needed which exists
>>> in
>>> local cache.
>>>
>>> Hope this explains what we are trying to do, please let me know if some
>>> more information is required.
>>>
>>> Best regards
>>> tarandeep
>>>
>>>
>>>
>>>
>>> On Thu, Jul 18, 2013 at 3:21 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>
>>>   Hi Tarandeep,
>>>> El 18/07/13 11:18, Sawhney, Tarandeep Singh escribió:
>>>>
>>>>   Hi Rafa
>>>>> Thanks for your response
>>>>>
>>>>> Yes, we have tried the whole URI of the property (
>>>>> http://dbpedia.org/ontology/******capital<http://dbpedia.org/ontology/****capital>
>>>>> <http://dbpedia.org/**ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>> )<http://dbpedia.org/****ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>> )>
>>>>>
>>>>> also
>>>>> but it didn't help
>>>>>
>>>>> Yes we are using EntityHub cache to locally store with all the
>>>>> additional
>>>>> information we pulled from Dbpedia.org
>>>>>
>>>>> In the documentation provided at
>>>>> http://stanbol.apache.org/****docs/trunk/customvocabulary.****html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>> <http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>
>>>>> it is mentioned --->
>>>>>
>>>>> *Optionally, if your data do use namespaces that are not present in
>>>>>
>>>>> prefix.cc (or the server used for indexing does not have internet
>>>>> connectivity) you can manually define required prefixes by
>>>>> creating/using
>>>>> the a indexing/config/****namespaceprefix.mappings file
>>>>>
>>>>> *
>>>>> *
>>>>>
>>>>> *
>>>>> Can we get some inputs on if some changes to this file are required
>>>>> while
>>>>> using DBpedia data
>>>>>
>>>>>   This file can be used at 'indexing time' when you use the indexing tool
>>>> for creating the index for the DBpedia site. I have just seen that
>>>> dbp-ont
>>>> is already included as prefix. What I don't have clear right now is if
>>>> you
>>>> are generating your own dbpedia index including all the dbpedia ontology
>>>> properties (that should be a enormous index) or if you are generating an
>>>> index each time you need a new entity or even you are trying to retrieve
>>>> the entities from dbpedia in a 'live' way :-). Sorry I'm confused about
>>>> your workflow.
>>>>
>>>>
>>>>   Also, looks like we are missing on some configurations in the overall
>>>>> process, so if dev community can please provide help, it will be much
>>>>> appreciated
>>>>>
>>>>> best regards
>>>>> tarandeep
>>>>>
>>>>>
>>>>> On Thu, Jul 18, 2013 at 1:38 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>>
>>>>>    Hi Tarandeep,
>>>>>
>>>>>> Have you tried using the whole URI of the property (
>>>>>> http://dbpedia.org/ontology/******capital<http://dbpedia.org/ontology/****capital>
>>>>>> <http://dbpedia.org/**ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>>> )<http://dbpedia.org/****ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>> )>
>>>>>>
>>>>>> ??
>>>>>>
>>>>>> Anyway, maybe it is a better idea to change your workflow, because I
>>>>>> suppose that your example about "India" entity is something that could
>>>>>> happen to you with more entities because the default DBpedia site in
>>>>>> Stanbol doesn't contain information about dbp-ont properties. I would
>>>>>> suggest to use EntityHub cache to locally store entities with all the
>>>>>> information you need directly from DBpedia. So, maybe you can try to
>>>>>> directly retrieve the entities from any DBpedia endpoint, store them in
>>>>>> the
>>>>>> EntityHub cache to ensure that you can use it later as your
>>>>>> convenience.
>>>>>> Maybe the workflow could be the following:
>>>>>>
>>>>>> 1. Enhance a document using Stanbol DBpedia site for linking.
>>>>>> 2. For each extracted entity:
>>>>>>            2.1. If the entity is already store in the EntityHub, get it
>>>>>> using
>>>>>> LDPath for dereferencing.
>>>>>>            2.2. If not, retrieve the entity from DBpedia endpoint as RDF
>>>>>> data
>>>>>> and store it in the EntityHub. Then retrieve it
>>>>>>
>>>>>> I would day that this is currently possible in Stanbol, but maybe
>>>>>> someone
>>>>>> else in the list can give you more light with the issue.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió:
>>>>>>
>>>>>>    Hi All,
>>>>>>
>>>>>>> In the stanbol local cache we have limited triples in dbpedia
>>>>>>> reference
>>>>>>> site.
>>>>>>>
>>>>>>> We have a need to get more triples for entities which are present in
>>>>>>> dbpedia
>>>>>>> reference site. For example entity "India" has limited triples, so
>>>>>>> when
>>>>>>> we
>>>>>>> enhance text which has india, it gets us only information which is
>>>>>>> there
>>>>>>> in
>>>>>>> dbpedia reference site.
>>>>>>>
>>>>>>> We have followed below mentioned steps to add more RDF data for entity
>>>>>>> "India" by creating our own reference site.
>>>>>>>
>>>>>>> 1 - Downloaded rdf-data for 'India' from [1].
>>>>>>>
>>>>>>> 2 - Generated indexes for this rdf-data as suggested in article [2]
>>>>>>> with
>>>>>>> *Demo
>>>>>>> *as a reference site name.
>>>>>>>
>>>>>>>
>>>>>>> 3-  Initialized indexes within stanbol instance  as per [2].
>>>>>>>
>>>>>>> 4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with
>>>>>>> *Demo
>>>>>>> *as
>>>>>>>
>>>>>>> referenced site as per [3].
>>>>>>>          I have added *dbp-ont:capital *in *'"Fields used for
>>>>>>> derefrencing*
>>>>>>> "option.
>>>>>>>
>>>>>>> 5- Configured new weighted chain (*demoChain*).
>>>>>>>
>>>>>>> 6 - Now i am trying to enhance *"India is a country."* I am getting
>>>>>>> India
>>>>>>>
>>>>>>> as de-reference entity but unable to get any new information related
>>>>>>> to *dbp-ont:capital
>>>>>>> *which exists in my new reference site *Demo, *which in this case
>>>>>>> should
>>>>>>>
>>>>>>> give us URI value of "New Delhi"
>>>>>>>
>>>>>>> [1] http://dbpedia.org/page/India
>>>>>>> [2] http://stanbol.apache.org/******docs/trunk/customvocabulary.****
>>>>>>> **html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html>
>>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<
>>>>>>> http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>>> [3]
>>>>>>> http://stanbol.apache.org/******docs/trunk/components/****<http://stanbol.apache.org/****docs/trunk/components/****>
>>>>>>> enhancer/engines/**<http://**stanbol.apache.org/**docs/**
>>>>>>> trunk/components/**enhancer/**engines/**<http://stanbol.apache.org/**docs/trunk/components/**enhancer/engines/**>
>>>>>>> entityhublinking<http://**stan**bol.apache.org/docs/trunk/**<http://stanbol.apache.org/docs/trunk/**>
>>>>>>> components/enhancer/engines/****entityhublinking<http://**
>>>>>>> stanbol.apache.org/docs/trunk/**components/enhancer/engines/**
>>>>>>> entityhublinking<http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking>
>>>>>>>
>>>>>>> Can you please let me know if i am doing something wrong here or
>>>>>>> missing
>>>>>>> some configurations.
>>>>>>> Please let me know in case you need some more information on how we
>>>>>>> are
>>>>>>> trying to do it
>>>>>>>
>>>>>>> best regards
>>>>>>> tarandeep
>>>>>>>
>>>>>>>
>>>>>>>    --
>>>>>>>
>>>>>> ------------------------------
>>>>>> This message should be regarded as confidential. If you have received
>>>>>> this
>>>>>> email in error please notify the sender and destroy it immediately.
>>>>>> Statements of intent shall only become binding when confirmed in hard
>>>>>> copy
>>>>>> by an authorised signatory.
>>>>>>
>>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>>> number
>>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>>>>> London W6 7AN.
>>>>>>
>>>>>>   --
>>>> ------------------------------
>>>> This message should be regarded as confidential. If you have received
>>>> this
>>>> email in error please notify the sender and destroy it immediately.
>>>> Statements of intent shall only become binding when confirmed in hard
>>>> copy
>>>> by an authorised signatory.
>>>>
>>>> Zaizi Ltd is registered in England and Wales with the registration number
>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>>> London W6 7AN.
>>>>
>>>>
>> --
>>
>> ------------------------------
>> This message should be regarded as confidential. If you have received this
>> email in error please notify the sender and destroy it immediately.
>> Statements of intent shall only become binding when confirmed in hard copy
>> by an authorised signatory.
>>
>> Zaizi Ltd is registered in England and Wales with the registration number
>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>> London W6 7AN.
>>


-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: New reference site with additonal DBpedia triples

Posted by "Sawhney, Tarandeep Singh" <ts...@innodata.com>.
Hi Rafa,

Thanks for giving the pointer to resolve the issue. We tried below
mentioned step but now the issue seems to be in how we are indexing.

As you suggested we verified if we have the required indexes for additional
fields in our reference site. We found out, that indexer had created all
indexes except for fields in "dbpedia-owl<http://dbpedia.org/ontology/foundedBy>
*:http://dbpedia.org/ontology/" *namespace
*
*
For example, Please look at RDF ---> http://dbpedia.org/page/Adidas

When we query our reference site for entity "Adidas", we are able to see
all indexes except for fields in *dbpedia-owl* namespace
*
*
Below are the LDPaths we used while querying our reference site.

*
name = rdfs:label[@en] :: xsd:string;
comment = rdfs:comment[@en] :: xsd:string;
categories = dc:subject :: xsd:anyURI;
homepage = foaf:homepage :: xsd:anyURI;
location = fn:concat("[",geo:lat,",",geo:long,"]") :: xsd:string;
foafname = foaf:name :: xsd:string;
abstract = dbp-ont:abstract[@en] :: xsd:string;
type = rdf:type :: xsd:anyURI;
foundedBy = dbp-ont:foundedBy :: xsd:anyURI;
*

We can see data for everything except for "abstract" and "foundedBy" since
they are in namespace *dbpedia-owl <http://dbpedia.org/ontology/foundedBy>*

So it means indexes for above two fields were not created by indexer and
therefore we dont have them in the reference site and hence cant see them
when entity is linked

We have not changed any default settings while running indexer

Can you please provide further help in order to find out what we are missing

best regards
tarandeep


On Thu, Jul 18, 2013 at 4:22 PM, Rafa Haro <rh...@zaizi.com> wrote:

> Hi Tarandeep,
>
> Thanks, it's quite more clear now :-). Have you check if the information
> you need (for example dbp-ont:capital) is actually in the index?. You can
> check it for example looking for "India" entity directly in your custom
> DBpedia site in the EntityHub.
>
> Regards
>
> El 18/07/13 12:30, Sawhney, Tarandeep Singh escribió:
>
>> No problems Rafa, may be i didnt explain with details/clarity.
>>
>> We are using custom ontology to extract custom entities from text and then
>> we want to them to link with DBpedia entities (in local dbpedia reference
>> site).
>>
>> We found dbpedia reference doesnt have enough data that we need, so
>> decided
>> to download additional data for selected entities (related to fashion
>> brands, fashion designers, company names) directly from dbpedia.org
>>
>> We then indexed these individual RDF files and created indexes with new
>> reference site
>>
>> We then did not use DBpedia reference site, instead used our new reference
>> site which has dbpedia data that we need with our new Entityhub linking
>> engine
>>
>> But after we followed steps i mentioned in my earlier email, during
>> enhancement, custom entities are getting de-referenced from my new
>> reference site but i dont see additional data that i needed which exists
>> in
>> local cache.
>>
>> Hope this explains what we are trying to do, please let me know if some
>> more information is required.
>>
>> Best regards
>> tarandeep
>>
>>
>>
>>
>> On Thu, Jul 18, 2013 at 3:21 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>
>>  Hi Tarandeep,
>>>
>>> El 18/07/13 11:18, Sawhney, Tarandeep Singh escribió:
>>>
>>>  Hi Rafa
>>>>
>>>> Thanks for your response
>>>>
>>>> Yes, we have tried the whole URI of the property (
>>>> http://dbpedia.org/ontology/******capital<http://dbpedia.org/ontology/****capital>
>>>> <http://dbpedia.org/**ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>> >
>>>> )<http://dbpedia.org/****ontology/capital<http://dbpedia.org/**ontology/capital>
>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>> >
>>>>
>>>> )>
>>>>
>>>> also
>>>> but it didn't help
>>>>
>>>> Yes we are using EntityHub cache to locally store with all the
>>>> additional
>>>> information we pulled from Dbpedia.org
>>>>
>>>> In the documentation provided at
>>>> http://stanbol.apache.org/****docs/trunk/customvocabulary.****html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>> <http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>> >
>>>>
>>>>
>>>> it is mentioned --->
>>>>
>>>> *Optionally, if your data do use namespaces that are not present in
>>>>
>>>> prefix.cc (or the server used for indexing does not have internet
>>>> connectivity) you can manually define required prefixes by
>>>> creating/using
>>>> the a indexing/config/****namespaceprefix.mappings file
>>>>
>>>> *
>>>> *
>>>>
>>>> *
>>>> Can we get some inputs on if some changes to this file are required
>>>> while
>>>> using DBpedia data
>>>>
>>>>  This file can be used at 'indexing time' when you use the indexing tool
>>> for creating the index for the DBpedia site. I have just seen that
>>> dbp-ont
>>> is already included as prefix. What I don't have clear right now is if
>>> you
>>> are generating your own dbpedia index including all the dbpedia ontology
>>> properties (that should be a enormous index) or if you are generating an
>>> index each time you need a new entity or even you are trying to retrieve
>>> the entities from dbpedia in a 'live' way :-). Sorry I'm confused about
>>> your workflow.
>>>
>>>
>>>  Also, looks like we are missing on some configurations in the overall
>>>> process, so if dev community can please provide help, it will be much
>>>> appreciated
>>>>
>>>> best regards
>>>> tarandeep
>>>>
>>>>
>>>> On Thu, Jul 18, 2013 at 1:38 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>
>>>>   Hi Tarandeep,
>>>>
>>>>> Have you tried using the whole URI of the property (
>>>>> http://dbpedia.org/ontology/******capital<http://dbpedia.org/ontology/****capital>
>>>>> <http://dbpedia.org/**ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>> >
>>>>> )<http://dbpedia.org/****ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>> >
>>>>>
>>>>> )>
>>>>>
>>>>> ??
>>>>>
>>>>> Anyway, maybe it is a better idea to change your workflow, because I
>>>>> suppose that your example about "India" entity is something that could
>>>>> happen to you with more entities because the default DBpedia site in
>>>>> Stanbol doesn't contain information about dbp-ont properties. I would
>>>>> suggest to use EntityHub cache to locally store entities with all the
>>>>> information you need directly from DBpedia. So, maybe you can try to
>>>>> directly retrieve the entities from any DBpedia endpoint, store them in
>>>>> the
>>>>> EntityHub cache to ensure that you can use it later as your
>>>>> convenience.
>>>>> Maybe the workflow could be the following:
>>>>>
>>>>> 1. Enhance a document using Stanbol DBpedia site for linking.
>>>>> 2. For each extracted entity:
>>>>>           2.1. If the entity is already store in the EntityHub, get it
>>>>> using
>>>>> LDPath for dereferencing.
>>>>>           2.2. If not, retrieve the entity from DBpedia endpoint as RDF
>>>>> data
>>>>> and store it in the EntityHub. Then retrieve it
>>>>>
>>>>> I would day that this is currently possible in Stanbol, but maybe
>>>>> someone
>>>>> else in the list can give you more light with the issue.
>>>>>
>>>>> Regards
>>>>>
>>>>> El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió:
>>>>>
>>>>>   Hi All,
>>>>>
>>>>>> In the stanbol local cache we have limited triples in dbpedia
>>>>>> reference
>>>>>> site.
>>>>>>
>>>>>> We have a need to get more triples for entities which are present in
>>>>>> dbpedia
>>>>>> reference site. For example entity "India" has limited triples, so
>>>>>> when
>>>>>> we
>>>>>> enhance text which has india, it gets us only information which is
>>>>>> there
>>>>>> in
>>>>>> dbpedia reference site.
>>>>>>
>>>>>> We have followed below mentioned steps to add more RDF data for entity
>>>>>> "India" by creating our own reference site.
>>>>>>
>>>>>> 1 - Downloaded rdf-data for 'India' from [1].
>>>>>>
>>>>>> 2 - Generated indexes for this rdf-data as suggested in article [2]
>>>>>> with
>>>>>> *Demo
>>>>>> *as a reference site name.
>>>>>>
>>>>>>
>>>>>> 3-  Initialized indexes within stanbol instance  as per [2].
>>>>>>
>>>>>> 4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with
>>>>>> *Demo
>>>>>> *as
>>>>>>
>>>>>> referenced site as per [3].
>>>>>>         I have added *dbp-ont:capital *in *'"Fields used for
>>>>>> derefrencing*
>>>>>> "option.
>>>>>>
>>>>>> 5- Configured new weighted chain (*demoChain*).
>>>>>>
>>>>>> 6 - Now i am trying to enhance *"India is a country."* I am getting
>>>>>> India
>>>>>>
>>>>>> as de-reference entity but unable to get any new information related
>>>>>> to *dbp-ont:capital
>>>>>> *which exists in my new reference site *Demo, *which in this case
>>>>>> should
>>>>>>
>>>>>> give us URI value of "New Delhi"
>>>>>>
>>>>>> [1] http://dbpedia.org/page/India
>>>>>> [2] http://stanbol.apache.org/******docs/trunk/customvocabulary.****
>>>>>> **html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html>
>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>>> >
>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<
>>>>>> http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>> >
>>>>>> [3]
>>>>>> http://stanbol.apache.org/******docs/trunk/components/****<http://stanbol.apache.org/****docs/trunk/components/****>
>>>>>> enhancer/engines/**<http://**stanbol.apache.org/**docs/**
>>>>>> trunk/components/**enhancer/**engines/**<http://stanbol.apache.org/**docs/trunk/components/**enhancer/engines/**>
>>>>>> >
>>>>>> entityhublinking<http://**stan**bol.apache.org/docs/trunk/**<http://stanbol.apache.org/docs/trunk/**>
>>>>>> components/enhancer/engines/****entityhublinking<http://**
>>>>>> stanbol.apache.org/docs/trunk/**components/enhancer/engines/**
>>>>>> entityhublinking<http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking>
>>>>>> >
>>>>>>
>>>>>>
>>>>>> Can you please let me know if i am doing something wrong here or
>>>>>> missing
>>>>>> some configurations.
>>>>>> Please let me know in case you need some more information on how we
>>>>>> are
>>>>>> trying to do it
>>>>>>
>>>>>> best regards
>>>>>> tarandeep
>>>>>>
>>>>>>
>>>>>>   --
>>>>>>
>>>>> ------------------------------
>>>>> This message should be regarded as confidential. If you have received
>>>>> this
>>>>> email in error please notify the sender and destroy it immediately.
>>>>> Statements of intent shall only become binding when confirmed in hard
>>>>> copy
>>>>> by an authorised signatory.
>>>>>
>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>> number
>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>>>> London W6 7AN.
>>>>>
>>>>>  --
>>>
>>> ------------------------------
>>> This message should be regarded as confidential. If you have received
>>> this
>>> email in error please notify the sender and destroy it immediately.
>>> Statements of intent shall only become binding when confirmed in hard
>>> copy
>>> by an authorised signatory.
>>>
>>> Zaizi Ltd is registered in England and Wales with the registration number
>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>> London W6 7AN.
>>>
>>>
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

-- 

"This e-mail and any attachments transmitted with it are for the sole use 
of the intended recipient(s) and may contain confidential , proprietary or 
privileged information. If you are not the intended recipient, please 
contact the sender by reply e-mail and destroy all copies of the original 
message. Any unauthorized review, use, disclosure, dissemination, 
forwarding, printing or copying of this e-mail or any action taken in 
reliance on this e-mail is strictly prohibited and may be unlawful."

Re: New reference site with additonal DBpedia triples

Posted by Rafa Haro <rh...@zaizi.com>.
Hi Tarandeep,

Thanks, it's quite more clear now :-). Have you check if the information 
you need (for example dbp-ont:capital) is actually in the index?. You 
can check it for example looking for "India" entity directly in your 
custom DBpedia site in the EntityHub.

Regards

El 18/07/13 12:30, Sawhney, Tarandeep Singh escribió:
> No problems Rafa, may be i didnt explain with details/clarity.
>
> We are using custom ontology to extract custom entities from text and then
> we want to them to link with DBpedia entities (in local dbpedia reference
> site).
>
> We found dbpedia reference doesnt have enough data that we need, so decided
> to download additional data for selected entities (related to fashion
> brands, fashion designers, company names) directly from dbpedia.org
>
> We then indexed these individual RDF files and created indexes with new
> reference site
>
> We then did not use DBpedia reference site, instead used our new reference
> site which has dbpedia data that we need with our new Entityhub linking
> engine
>
> But after we followed steps i mentioned in my earlier email, during
> enhancement, custom entities are getting de-referenced from my new
> reference site but i dont see additional data that i needed which exists in
> local cache.
>
> Hope this explains what we are trying to do, please let me know if some
> more information is required.
>
> Best regards
> tarandeep
>
>
>
>
> On Thu, Jul 18, 2013 at 3:21 PM, Rafa Haro <rh...@zaizi.com> wrote:
>
>> Hi Tarandeep,
>>
>> El 18/07/13 11:18, Sawhney, Tarandeep Singh escribió:
>>
>>> Hi Rafa
>>>
>>> Thanks for your response
>>>
>>> Yes, we have tried the whole URI of the property (
>>> http://dbpedia.org/ontology/****capital<http://dbpedia.org/ontology/**capital>
>>> )<http://dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>>> )>
>>>
>>> also
>>> but it didn't help
>>>
>>> Yes we are using EntityHub cache to locally store with all the additional
>>> information we pulled from Dbpedia.org
>>>
>>> In the documentation provided at
>>> http://stanbol.apache.org/**docs/trunk/customvocabulary.**html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>
>>> it is mentioned --->
>>>
>>> *Optionally, if your data do use namespaces that are not present in
>>>
>>> prefix.cc (or the server used for indexing does not have internet
>>> connectivity) you can manually define required prefixes by creating/using
>>> the a indexing/config/**namespaceprefix.mappings file
>>> *
>>> *
>>>
>>> *
>>> Can we get some inputs on if some changes to this file are required while
>>> using DBpedia data
>>>
>> This file can be used at 'indexing time' when you use the indexing tool
>> for creating the index for the DBpedia site. I have just seen that dbp-ont
>> is already included as prefix. What I don't have clear right now is if you
>> are generating your own dbpedia index including all the dbpedia ontology
>> properties (that should be a enormous index) or if you are generating an
>> index each time you need a new entity or even you are trying to retrieve
>> the entities from dbpedia in a 'live' way :-). Sorry I'm confused about
>> your workflow.
>>
>>
>>> Also, looks like we are missing on some configurations in the overall
>>> process, so if dev community can please provide help, it will be much
>>> appreciated
>>>
>>> best regards
>>> tarandeep
>>>
>>>
>>> On Thu, Jul 18, 2013 at 1:38 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>
>>>   Hi Tarandeep,
>>>> Have you tried using the whole URI of the property (
>>>> http://dbpedia.org/ontology/****capital<http://dbpedia.org/ontology/**capital>
>>>> )<http://dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>>>> )>
>>>>
>>>> ??
>>>>
>>>> Anyway, maybe it is a better idea to change your workflow, because I
>>>> suppose that your example about "India" entity is something that could
>>>> happen to you with more entities because the default DBpedia site in
>>>> Stanbol doesn't contain information about dbp-ont properties. I would
>>>> suggest to use EntityHub cache to locally store entities with all the
>>>> information you need directly from DBpedia. So, maybe you can try to
>>>> directly retrieve the entities from any DBpedia endpoint, store them in
>>>> the
>>>> EntityHub cache to ensure that you can use it later as your convenience.
>>>> Maybe the workflow could be the following:
>>>>
>>>> 1. Enhance a document using Stanbol DBpedia site for linking.
>>>> 2. For each extracted entity:
>>>>           2.1. If the entity is already store in the EntityHub, get it
>>>> using
>>>> LDPath for dereferencing.
>>>>           2.2. If not, retrieve the entity from DBpedia endpoint as RDF
>>>> data
>>>> and store it in the EntityHub. Then retrieve it
>>>>
>>>> I would day that this is currently possible in Stanbol, but maybe someone
>>>> else in the list can give you more light with the issue.
>>>>
>>>> Regards
>>>>
>>>> El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió:
>>>>
>>>>   Hi All,
>>>>> In the stanbol local cache we have limited triples in dbpedia reference
>>>>> site.
>>>>>
>>>>> We have a need to get more triples for entities which are present in
>>>>> dbpedia
>>>>> reference site. For example entity "India" has limited triples, so when
>>>>> we
>>>>> enhance text which has india, it gets us only information which is there
>>>>> in
>>>>> dbpedia reference site.
>>>>>
>>>>> We have followed below mentioned steps to add more RDF data for entity
>>>>> "India" by creating our own reference site.
>>>>>
>>>>> 1 - Downloaded rdf-data for 'India' from [1].
>>>>>
>>>>> 2 - Generated indexes for this rdf-data as suggested in article [2] with
>>>>> *Demo
>>>>> *as a reference site name.
>>>>>
>>>>>
>>>>> 3-  Initialized indexes within stanbol instance  as per [2].
>>>>>
>>>>> 4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with
>>>>> *Demo
>>>>> *as
>>>>>
>>>>> referenced site as per [3].
>>>>>         I have added *dbp-ont:capital *in *'"Fields used for
>>>>> derefrencing*
>>>>> "option.
>>>>>
>>>>> 5- Configured new weighted chain (*demoChain*).
>>>>>
>>>>> 6 - Now i am trying to enhance *"India is a country."* I am getting
>>>>> India
>>>>>
>>>>> as de-reference entity but unable to get any new information related
>>>>> to *dbp-ont:capital
>>>>> *which exists in my new reference site *Demo, *which in this case should
>>>>>
>>>>> give us URI value of "New Delhi"
>>>>>
>>>>> [1] http://dbpedia.org/page/India
>>>>> [2] http://stanbol.apache.org/****docs/trunk/customvocabulary.****html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>> <http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>> [3]
>>>>> http://stanbol.apache.org/****docs/trunk/components/****
>>>>> enhancer/engines/**<http://stanbol.apache.org/**docs/trunk/components/**enhancer/engines/**>
>>>>> entityhublinking<http://**stanbol.apache.org/docs/trunk/**
>>>>> components/enhancer/engines/**entityhublinking<http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking>
>>>>>
>>>>> Can you please let me know if i am doing something wrong here or missing
>>>>> some configurations.
>>>>> Please let me know in case you need some more information on how we are
>>>>> trying to do it
>>>>>
>>>>> best regards
>>>>> tarandeep
>>>>>
>>>>>
>>>>>   --
>>>> ------------------------------
>>>> This message should be regarded as confidential. If you have received
>>>> this
>>>> email in error please notify the sender and destroy it immediately.
>>>> Statements of intent shall only become binding when confirmed in hard
>>>> copy
>>>> by an authorised signatory.
>>>>
>>>> Zaizi Ltd is registered in England and Wales with the registration number
>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>>> London W6 7AN.
>>>>
>> --
>>
>> ------------------------------
>> This message should be regarded as confidential. If you have received this
>> email in error please notify the sender and destroy it immediately.
>> Statements of intent shall only become binding when confirmed in hard copy
>> by an authorised signatory.
>>
>> Zaizi Ltd is registered in England and Wales with the registration number
>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>> London W6 7AN.
>>


-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: New reference site with additonal DBpedia triples

Posted by "Sawhney, Tarandeep Singh" <ts...@innodata.com>.
No problems Rafa, may be i didnt explain with details/clarity.

We are using custom ontology to extract custom entities from text and then
we want to them to link with DBpedia entities (in local dbpedia reference
site).

We found dbpedia reference doesnt have enough data that we need, so decided
to download additional data for selected entities (related to fashion
brands, fashion designers, company names) directly from dbpedia.org

We then indexed these individual RDF files and created indexes with new
reference site

We then did not use DBpedia reference site, instead used our new reference
site which has dbpedia data that we need with our new Entityhub linking
engine

But after we followed steps i mentioned in my earlier email, during
enhancement, custom entities are getting de-referenced from my new
reference site but i dont see additional data that i needed which exists in
local cache.

Hope this explains what we are trying to do, please let me know if some
more information is required.

Best regards
tarandeep




On Thu, Jul 18, 2013 at 3:21 PM, Rafa Haro <rh...@zaizi.com> wrote:

> Hi Tarandeep,
>
> El 18/07/13 11:18, Sawhney, Tarandeep Singh escribió:
>
>> Hi Rafa
>>
>> Thanks for your response
>>
>> Yes, we have tried the whole URI of the property (
>> http://dbpedia.org/ontology/****capital<http://dbpedia.org/ontology/**capital>
>> )<http://dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>> )>
>>
>> also
>> but it didn't help
>>
>> Yes we are using EntityHub cache to locally store with all the additional
>> information we pulled from Dbpedia.org
>>
>> In the documentation provided at
>> http://stanbol.apache.org/**docs/trunk/customvocabulary.**html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>
>> it is mentioned --->
>>
>> *Optionally, if your data do use namespaces that are not present in
>>
>> prefix.cc (or the server used for indexing does not have internet
>> connectivity) you can manually define required prefixes by creating/using
>> the a indexing/config/**namespaceprefix.mappings file
>> *
>> *
>>
>> *
>> Can we get some inputs on if some changes to this file are required while
>> using DBpedia data
>>
> This file can be used at 'indexing time' when you use the indexing tool
> for creating the index for the DBpedia site. I have just seen that dbp-ont
> is already included as prefix. What I don't have clear right now is if you
> are generating your own dbpedia index including all the dbpedia ontology
> properties (that should be a enormous index) or if you are generating an
> index each time you need a new entity or even you are trying to retrieve
> the entities from dbpedia in a 'live' way :-). Sorry I'm confused about
> your workflow.
>
>
>> Also, looks like we are missing on some configurations in the overall
>> process, so if dev community can please provide help, it will be much
>> appreciated
>>
>> best regards
>> tarandeep
>>
>>
>> On Thu, Jul 18, 2013 at 1:38 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>
>>  Hi Tarandeep,
>>>
>>> Have you tried using the whole URI of the property (
>>> http://dbpedia.org/ontology/****capital<http://dbpedia.org/ontology/**capital>
>>> )<http://dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>>> )>
>>>
>>> ??
>>>
>>> Anyway, maybe it is a better idea to change your workflow, because I
>>> suppose that your example about "India" entity is something that could
>>> happen to you with more entities because the default DBpedia site in
>>> Stanbol doesn't contain information about dbp-ont properties. I would
>>> suggest to use EntityHub cache to locally store entities with all the
>>> information you need directly from DBpedia. So, maybe you can try to
>>> directly retrieve the entities from any DBpedia endpoint, store them in
>>> the
>>> EntityHub cache to ensure that you can use it later as your convenience.
>>> Maybe the workflow could be the following:
>>>
>>> 1. Enhance a document using Stanbol DBpedia site for linking.
>>> 2. For each extracted entity:
>>>          2.1. If the entity is already store in the EntityHub, get it
>>> using
>>> LDPath for dereferencing.
>>>          2.2. If not, retrieve the entity from DBpedia endpoint as RDF
>>> data
>>> and store it in the EntityHub. Then retrieve it
>>>
>>> I would day that this is currently possible in Stanbol, but maybe someone
>>> else in the list can give you more light with the issue.
>>>
>>> Regards
>>>
>>> El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió:
>>>
>>>  Hi All,
>>>>
>>>> In the stanbol local cache we have limited triples in dbpedia reference
>>>> site.
>>>>
>>>> We have a need to get more triples for entities which are present in
>>>> dbpedia
>>>> reference site. For example entity "India" has limited triples, so when
>>>> we
>>>> enhance text which has india, it gets us only information which is there
>>>> in
>>>> dbpedia reference site.
>>>>
>>>> We have followed below mentioned steps to add more RDF data for entity
>>>> "India" by creating our own reference site.
>>>>
>>>> 1 - Downloaded rdf-data for 'India' from [1].
>>>>
>>>> 2 - Generated indexes for this rdf-data as suggested in article [2] with
>>>> *Demo
>>>> *as a reference site name.
>>>>
>>>>
>>>> 3-  Initialized indexes within stanbol instance  as per [2].
>>>>
>>>> 4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with
>>>> *Demo
>>>> *as
>>>>
>>>> referenced site as per [3].
>>>>        I have added *dbp-ont:capital *in *'"Fields used for
>>>> derefrencing*
>>>> "option.
>>>>
>>>> 5- Configured new weighted chain (*demoChain*).
>>>>
>>>> 6 - Now i am trying to enhance *"India is a country."* I am getting
>>>> India
>>>>
>>>> as de-reference entity but unable to get any new information related
>>>> to *dbp-ont:capital
>>>> *which exists in my new reference site *Demo, *which in this case should
>>>>
>>>> give us URI value of "New Delhi"
>>>>
>>>> [1] http://dbpedia.org/page/India
>>>> [2] http://stanbol.apache.org/****docs/trunk/customvocabulary.****html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>> <http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>> >
>>>> [3]
>>>> http://stanbol.apache.org/****docs/trunk/components/****
>>>> enhancer/engines/**<http://stanbol.apache.org/**docs/trunk/components/**enhancer/engines/**>
>>>> entityhublinking<http://**stanbol.apache.org/docs/trunk/**
>>>> components/enhancer/engines/**entityhublinking<http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking>
>>>> >
>>>>
>>>>
>>>> Can you please let me know if i am doing something wrong here or missing
>>>> some configurations.
>>>> Please let me know in case you need some more information on how we are
>>>> trying to do it
>>>>
>>>> best regards
>>>> tarandeep
>>>>
>>>>
>>>>  --
>>>
>>> ------------------------------
>>> This message should be regarded as confidential. If you have received
>>> this
>>> email in error please notify the sender and destroy it immediately.
>>> Statements of intent shall only become binding when confirmed in hard
>>> copy
>>> by an authorised signatory.
>>>
>>> Zaizi Ltd is registered in England and Wales with the registration number
>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>> London W6 7AN.
>>>
>>
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

-- 

"This e-mail and any attachments transmitted with it are for the sole use 
of the intended recipient(s) and may contain confidential , proprietary or 
privileged information. If you are not the intended recipient, please 
contact the sender by reply e-mail and destroy all copies of the original 
message. Any unauthorized review, use, disclosure, dissemination, 
forwarding, printing or copying of this e-mail or any action taken in 
reliance on this e-mail is strictly prohibited and may be unlawful."

Re: New reference site with additonal DBpedia triples

Posted by Rafa Haro <rh...@zaizi.com>.
Hi Tarandeep,

El 18/07/13 11:18, Sawhney, Tarandeep Singh escribió:
> Hi Rafa
>
> Thanks for your response
>
> Yes, we have tried the whole URI of the property (
> http://dbpedia.org/ontology/**capital)<http://dbpedia.org/ontology/capital)>
> also
> but it didn't help
>
> Yes we are using EntityHub cache to locally store with all the additional
> information we pulled from Dbpedia.org
>
> In the documentation provided at
> http://stanbol.apache.org/docs/trunk/customvocabulary.html
>
> it is mentioned --->
>
> *Optionally, if your data do use namespaces that are not present in
> prefix.cc (or the server used for indexing does not have internet
> connectivity) you can manually define required prefixes by creating/using
> the a indexing/config/namespaceprefix.mappings file
> *
> *
> *
> Can we get some inputs on if some changes to this file are required while
> using DBpedia data
This file can be used at 'indexing time' when you use the indexing tool 
for creating the index for the DBpedia site. I have just seen that 
dbp-ont is already included as prefix. What I don't have clear right now 
is if you are generating your own dbpedia index including all the 
dbpedia ontology properties (that should be a enormous index) or if you 
are generating an index each time you need a new entity or even you are 
trying to retrieve the entities from dbpedia in a 'live' way :-). Sorry 
I'm confused about your workflow.

>
> Also, looks like we are missing on some configurations in the overall
> process, so if dev community can please provide help, it will be much
> appreciated
>
> best regards
> tarandeep
>
>
> On Thu, Jul 18, 2013 at 1:38 PM, Rafa Haro <rh...@zaizi.com> wrote:
>
>> Hi Tarandeep,
>>
>> Have you tried using the whole URI of the property (
>> http://dbpedia.org/ontology/**capital)<http://dbpedia.org/ontology/capital)>
>> ??
>>
>> Anyway, maybe it is a better idea to change your workflow, because I
>> suppose that your example about "India" entity is something that could
>> happen to you with more entities because the default DBpedia site in
>> Stanbol doesn't contain information about dbp-ont properties. I would
>> suggest to use EntityHub cache to locally store entities with all the
>> information you need directly from DBpedia. So, maybe you can try to
>> directly retrieve the entities from any DBpedia endpoint, store them in the
>> EntityHub cache to ensure that you can use it later as your convenience.
>> Maybe the workflow could be the following:
>>
>> 1. Enhance a document using Stanbol DBpedia site for linking.
>> 2. For each extracted entity:
>>          2.1. If the entity is already store in the EntityHub, get it using
>> LDPath for dereferencing.
>>          2.2. If not, retrieve the entity from DBpedia endpoint as RDF data
>> and store it in the EntityHub. Then retrieve it
>>
>> I would day that this is currently possible in Stanbol, but maybe someone
>> else in the list can give you more light with the issue.
>>
>> Regards
>>
>> El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió:
>>
>>> Hi All,
>>>
>>> In the stanbol local cache we have limited triples in dbpedia reference
>>> site.
>>>
>>> We have a need to get more triples for entities which are present in
>>> dbpedia
>>> reference site. For example entity "India" has limited triples, so when we
>>> enhance text which has india, it gets us only information which is there
>>> in
>>> dbpedia reference site.
>>>
>>> We have followed below mentioned steps to add more RDF data for entity
>>> "India" by creating our own reference site.
>>>
>>> 1 - Downloaded rdf-data for 'India' from [1].
>>>
>>> 2 - Generated indexes for this rdf-data as suggested in article [2] with
>>> *Demo
>>> *as a reference site name.
>>>
>>>
>>> 3-  Initialized indexes within stanbol instance  as per [2].
>>>
>>> 4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with *Demo
>>> *as
>>>
>>> referenced site as per [3].
>>>        I have added *dbp-ont:capital *in *'"Fields used for derefrencing*
>>> "option.
>>>
>>> 5- Configured new weighted chain (*demoChain*).
>>>
>>> 6 - Now i am trying to enhance *"India is a country."* I am getting India
>>>
>>> as de-reference entity but unable to get any new information related
>>> to *dbp-ont:capital
>>> *which exists in my new reference site *Demo, *which in this case should
>>>
>>> give us URI value of "New Delhi"
>>>
>>> [1] http://dbpedia.org/page/India
>>> [2] http://stanbol.apache.org/**docs/trunk/customvocabulary.**html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>> [3]
>>> http://stanbol.apache.org/**docs/trunk/components/**enhancer/engines/**
>>> entityhublinking<http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking>
>>>
>>> Can you please let me know if i am doing something wrong here or missing
>>> some configurations.
>>> Please let me know in case you need some more information on how we are
>>> trying to do it
>>>
>>> best regards
>>> tarandeep
>>>
>>>
>> --
>>
>> ------------------------------
>> This message should be regarded as confidential. If you have received this
>> email in error please notify the sender and destroy it immediately.
>> Statements of intent shall only become binding when confirmed in hard copy
>> by an authorised signatory.
>>
>> Zaizi Ltd is registered in England and Wales with the registration number
>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>> London W6 7AN.


-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: New reference site with additonal DBpedia triples

Posted by "Sawhney, Tarandeep Singh" <ts...@innodata.com>.
Hi Rafa

Thanks for your response

Yes, we have tried the whole URI of the property (
http://dbpedia.org/ontology/**capital)<http://dbpedia.org/ontology/capital)>
also
but it didn't help

Yes we are using EntityHub cache to locally store with all the additional
information we pulled from Dbpedia.org

In the documentation provided at
http://stanbol.apache.org/docs/trunk/customvocabulary.html

it is mentioned --->

*Optionally, if your data do use namespaces that are not present in
prefix.cc (or the server used for indexing does not have internet
connectivity) you can manually define required prefixes by creating/using
the a indexing/config/namespaceprefix.mappings file
*
*
*
Can we get some inputs on if some changes to this file are required while
using DBpedia data

Also, looks like we are missing on some configurations in the overall
process, so if dev community can please provide help, it will be much
appreciated

best regards
tarandeep


On Thu, Jul 18, 2013 at 1:38 PM, Rafa Haro <rh...@zaizi.com> wrote:

> Hi Tarandeep,
>
> Have you tried using the whole URI of the property (
> http://dbpedia.org/ontology/**capital)<http://dbpedia.org/ontology/capital)>
> ??
>
> Anyway, maybe it is a better idea to change your workflow, because I
> suppose that your example about "India" entity is something that could
> happen to you with more entities because the default DBpedia site in
> Stanbol doesn't contain information about dbp-ont properties. I would
> suggest to use EntityHub cache to locally store entities with all the
> information you need directly from DBpedia. So, maybe you can try to
> directly retrieve the entities from any DBpedia endpoint, store them in the
> EntityHub cache to ensure that you can use it later as your convenience.
> Maybe the workflow could be the following:
>
> 1. Enhance a document using Stanbol DBpedia site for linking.
> 2. For each extracted entity:
>         2.1. If the entity is already store in the EntityHub, get it using
> LDPath for dereferencing.
>         2.2. If not, retrieve the entity from DBpedia endpoint as RDF data
> and store it in the EntityHub. Then retrieve it
>
> I would day that this is currently possible in Stanbol, but maybe someone
> else in the list can give you more light with the issue.
>
> Regards
>
> El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió:
>
>> Hi All,
>>
>> In the stanbol local cache we have limited triples in dbpedia reference
>> site.
>>
>> We have a need to get more triples for entities which are present in
>> dbpedia
>> reference site. For example entity "India" has limited triples, so when we
>> enhance text which has india, it gets us only information which is there
>> in
>> dbpedia reference site.
>>
>> We have followed below mentioned steps to add more RDF data for entity
>> "India" by creating our own reference site.
>>
>> 1 - Downloaded rdf-data for 'India' from [1].
>>
>> 2 - Generated indexes for this rdf-data as suggested in article [2] with
>> *Demo
>> *as a reference site name.
>>
>>
>> 3-  Initialized indexes within stanbol instance  as per [2].
>>
>> 4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with *Demo
>> *as
>>
>> referenced site as per [3].
>>       I have added *dbp-ont:capital *in *'"Fields used for derefrencing*
>> "option.
>>
>> 5- Configured new weighted chain (*demoChain*).
>>
>> 6 - Now i am trying to enhance *"India is a country."* I am getting India
>>
>> as de-reference entity but unable to get any new information related
>> to *dbp-ont:capital
>> *which exists in my new reference site *Demo, *which in this case should
>>
>> give us URI value of "New Delhi"
>>
>> [1] http://dbpedia.org/page/India
>> [2] http://stanbol.apache.org/**docs/trunk/customvocabulary.**html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>> [3]
>> http://stanbol.apache.org/**docs/trunk/components/**enhancer/engines/**
>> entityhublinking<http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking>
>>
>> Can you please let me know if i am doing something wrong here or missing
>> some configurations.
>> Please let me know in case you need some more information on how we are
>> trying to do it
>>
>> best regards
>> tarandeep
>>
>>
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.

-- 

"This e-mail and any attachments transmitted with it are for the sole use 
of the intended recipient(s) and may contain confidential , proprietary or 
privileged information. If you are not the intended recipient, please 
contact the sender by reply e-mail and destroy all copies of the original 
message. Any unauthorized review, use, disclosure, dissemination, 
forwarding, printing or copying of this e-mail or any action taken in 
reliance on this e-mail is strictly prohibited and may be unlawful."

Re: New reference site with additonal DBpedia triples

Posted by Rafa Haro <rh...@zaizi.com>.
Hi Tarandeep,

Have you tried using the whole URI of the property 
(http://dbpedia.org/ontology/capital)??

Anyway, maybe it is a better idea to change your workflow, because I 
suppose that your example about "India" entity is something that could 
happen to you with more entities because the default DBpedia site in 
Stanbol doesn't contain information about dbp-ont properties. I would 
suggest to use EntityHub cache to locally store entities with all the 
information you need directly from DBpedia. So, maybe you can try to 
directly retrieve the entities from any DBpedia endpoint, store them in 
the EntityHub cache to ensure that you can use it later as your 
convenience. Maybe the workflow could be the following:

1. Enhance a document using Stanbol DBpedia site for linking.
2. For each extracted entity:
         2.1. If the entity is already store in the EntityHub, get it 
using LDPath for dereferencing.
         2.2. If not, retrieve the entity from DBpedia endpoint as RDF 
data and store it in the EntityHub. Then retrieve it

I would day that this is currently possible in Stanbol, but maybe 
someone else in the list can give you more light with the issue.

Regards

El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió:
> Hi All,
>
> In the stanbol local cache we have limited triples in dbpedia reference
> site.
>
> We have a need to get more triples for entities which are present in dbpedia
> reference site. For example entity "India" has limited triples, so when we
> enhance text which has india, it gets us only information which is there in
> dbpedia reference site.
>
> We have followed below mentioned steps to add more RDF data for entity
> "India" by creating our own reference site.
>
> 1 - Downloaded rdf-data for 'India' from [1].
>
> 2 - Generated indexes for this rdf-data as suggested in article [2] with *Demo
> *as a reference site name.
>
> 3-  Initialized indexes within stanbol instance  as per [2].
>
> 4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with *Demo *as
> referenced site as per [3].
>       I have added *dbp-ont:capital *in *'"Fields used for derefrencing*
> "option.
>
> 5- Configured new weighted chain (*demoChain*).
>
> 6 - Now i am trying to enhance *"India is a country."* I am getting India
> as de-reference entity but unable to get any new information related
> to *dbp-ont:capital
> *which exists in my new reference site *Demo, *which in this case should
> give us URI value of "New Delhi"
>
> [1] http://dbpedia.org/page/India
> [2] http://stanbol.apache.org/docs/trunk/customvocabulary.html
> [3]
> http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking
>
> Can you please let me know if i am doing something wrong here or missing
> some configurations.
> Please let me know in case you need some more information on how we are
> trying to do it
>
> best regards
> tarandeep
>


-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN.