You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by co...@linkeddatatools.com on 2014/11/10 19:54:57 UTC

Searching RDF graph using SKOS thesaurus

Hi, I posted a similar message to the IKS mailing list, but understand  
from the response that this mailing list is no longer administrated.

Stanbol is a great tool and I'm having some success with it;  
particularly the entity extractor tool.

I have a requirement and, I am not sure the best way to approach this  
and whether a best practice for this sort of problem has already been  
established.

I have an RDF graph - one in accordance with the FOAF ontology - and I  
have a controlled vocabulary in the form of a SKOS RDF graph, which  
contains a set of literal string terms and their semantic equivalents  
(e.g. 'President' <-> 'Managing Director' <-> 'Chief Executive' <->  
'MD' <-> etc.).

I would like to search the literal strings in the FOAF graph for the  
occurrence of the string literals, and their equivalents as defined by  
the SKOS thesaurus.

I can suggest one approach to this problem, but I fear it may be quite  
inefficient and take a long time, namely:

- Query the RDF graph using SPARQL for all string literals.
- Pass each string literal to the Stanbol Entity Extractor, having  
uploaded the SKOS thesaurus to the Stanbol Entity Hub.

Now this seems quite a long winded. Further, I'm not even clear from  
the documentation whether the Stanbol Entity Extractor is capable of  
using SKOS vocabularies to map string literals to entities. Is Stanbol  
capable of extracting entities using a SKOS vocabulary?

This seems a fairly common thing to do (semantic search of an RDF  
graph using a thesaurus) - is there some better way of solving this  
problem using an already established strategy?


Many thanks!

Linked Data Tools

Re: Searching RDF graph using SKOS thesaurus

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Mark,

Sorry for the late replay, but with ApacheCon last week I need to
catch up a lot of things ...

If you are looking for managing your RDF datasets and perform SPARQL
queries on it I suggest you to use Apache Marmotta or Apache Jena.
Marmotta also implements Linked Data Platform (LDP) the defines how to
manage RDF data.

Managing RDF vocabularies is not within the focus of Apache Stanbol.
The two reasons why the Entityhub supports it is because

1. Triple Stores are not fast enough for queries as required for
Entity Extraction.
2. At the time the Entityhub was Implemented their was no LDP so
managing user vocabularies seamed like a nice feature to have.
Nowadays I would recommend to use LDP and use the Entityhub just as a
secondary index.

So in case you want to extract persons and roles from text document
here is how you can do it:

This assumes that you do manage your Vocabularies outside of Apache
Stanbol (e.g. in Apache Marmotta).
You can index your FOAF vocabulary with the person data and SKOS
thesaurus with the roles in a single of multiple Site. You will want
to configure  ManagedSite [1] with a Solr Yard as backend. You can
update single entities (as they change) are update the whole RDF
graph. TripleStores provide services to export a single resource
and/or a whole Graph. The ManagedSite also allows to update a single
Entity and/or to delete all and after that re-import the whole RDF
graph. So what you will need is a component that performs such updates
when you need them.

Additional notes:
 * entity extraction by defaults does use rdfs:label the default
configuration for ManagedSite does match some properties to
rdfs:label. Those defaults should be fine for your use case.
* For the Persons you might also need the foaf:name field with a
concatenation of foaf:firstName and foaf:lastName in your ontology.
(e.g.  <foaf:name>John Smith</foaf:name>). If you also want to extract
persons based on the lastName you will need to add the according
mapping to the configuration of the ManagedSite.

With this in place you will get Persons and Roles extracted from parsed texts.

best
Rupert

[1] http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite.html

On Thu, Nov 13, 2014 at 8:12 AM,  <co...@linkeddatatools.com> wrote:
> Hi, I have read the documentation as you directed, and have one further
> question just to clarify. I am asking this as I am less experienced with
> this sort of problem.
>
> I believe the example FOAF graph I have given as a way of storing the data
> is probably inefficient, as the way I am storing the role someone has in the
> FOAF graph means I have to 'look up' the entity every time someone does a
> search to match both the literal string in the FOAF graph, and the literal
> string that someone tries to search for.
>
> Would it be better to do something more similar to this example, where I
> define a URI for 'Managing Director' myself:
>
> Example graph, showing person with role:
>
> <foaf:Person rdf:ID="johnsmith">
>  <foaf:firstName>John</foaf:firstName>
>  <foaf:lastName>Smith</foaf:lastName>
>  <ex:role
> rdf:resource="http://www.linkeddatatools.com/role/managingdirector"/>
> </foaf:Person>
>
> And then in my SKOS vocabulary I would define:
>
> ex:managingdirector rdf:type skos:Concept;
>  skos:prefLabel "Managing Director"@en;
>  skos:altLabel "MD"@en;
>  skos:altLabel "President"@en;
>  skos:altLabel "CEO"@en.
>
> I would then, rather than merge both graphs, upload the SKOS vocabulary to
> my EntityHub site. When a user made a search, I would carry out entity
> extraction on the search string (e.g. user searches for 'President') which
> would then return the http://www.linkeddatatools.com/role/managingdirector
> entity match.
>
> I would then use SPARQL to query the FOAF graph for those with
> http://www.linkeddatatools.com/role/managingdirector as ex:role.
>
> Is this not more efficient? Further, would I need to use EntityHub for this,
> or would it be better simply to query my SKOS vocabulary myself using
> SPARQL?
>
> Thanks for your patience and input on this, as I say I am relatively new to
> this sort of problem and really do value any advice.
>
>
> Best wishes
>
> Mark
>
>
>
> Quoting contact@linkeddatatools.com:
>
>> Hi Rafa/Reto,
>>
>> Thanks very much for your replies - so I will look into:
>>
>> - Merging both graphs.
>> - Uploading to a Stanbol Entity site.
>> - Using entityhub/site/find/ in the documentation to return the subjects
>> that match an ex:role with that SKOS label.
>>
>> I'm not clear from the replies how I would use SPARQL, if you have any
>> further guidance then please let me know if this is a better option.
>>
>>
>> Again thanks,
>>
>> Mark
>>
>> Quoting Rafa Haro <rh...@apache.org>:
>>
>>> Hi Mark, Reto,
>>>
>>>
>>> En 12 de noviembre de 2014 en 11:45:43, Reto Gmür (reto@apache.org)
>>> escrito:
>>>
>>> On Wed, Nov 12, 2014 at 10:16 AM, Rafa Haro <rh...@apache.org> wrote:
>>>
>>>> Hi Mark,
>>>>
>>>> You can solve your problem in Stanbol if you link or merge together both
>>>> graphs in a single one and you create a site with it. After indexing the
>>>> merged graph, you can use the EntityHub API and specifically the find
>>>> (/entityhub/site/find) service to search for your label and then move to
>>>> all the nodes associated to that skos label using an LDPath expression.
>>>> Please take a look to the EntityHub REST API documentation.
>>>>
>>>
>>> Just for completeness: After meging the two graphs (or even without) you
>>> can also use SPARQL.
>>>
>>>
>>> Yep, that’s true :-). I probably forgot to mention that if you are
>>> planning to enrich documents using both graphs, the LDPath approach is also
>>> available.
>>>
>>> Cheers,
>>> Rafa
>>>
>>>
>>> Cheers,
>>> Reto
>>>
>>>
>>>
>>>>
>>>> Hope that helps. Cheers,
>>>> Rafa
>>>>
>>>>
>>>> En 11 de noviembre de 2014 en 20:34:01, contact@linkeddatatools.com (
>>>> contact@linkeddatatools.com) escrito:
>>>>
>>>> Hi, here is an example of what I'm trying to achieve. Does Fusepool,
>>>> or another solution, achieve this goal?
>>>>
>>>> I have an RDF graph in a graph store:
>>>>
>>>> ==============================
>>>>
>>>> <foaf:Person rdf:ID="johnsmith">
>>>> <foaf:firstName>John</foaf:firstName>
>>>> <foaf:lastName>Smith</foaf:lastName>
>>>> <ex:role>Managing Director</ex:role>
>>>> </foaf:Person>
>>>>
>>>> ==============================
>>>>
>>>> I have the following SKOS vocabulary:
>>>>
>>>> ==============================
>>>>
>>>> ex:role rdf:type skos:Concept;
>>>> skos:prefLabel "Managing Director"@en;
>>>> skos:altLabel "MD"@en;
>>>> skos:altLabel "President"@en;
>>>> skos:altLabel "CEO"@en.
>>>>
>>>> ==============================
>>>>
>>>> If I search for anyone with the role 'President', I want to return
>>>> John Smith (rdf:ID="johnsmith") - because 'President' is an
>>>> alternative label for 'Managing Director'.
>>>>
>>>> Is this possible using an already established best practice, or
>>>> framework?
>>>>
>>>> Please let me know if any further examples are required.
>>>>
>>>>
>>>> Best wishes
>>>>
>>>> Mark
>>>>
>>>> Quoting Reto Gmür <re...@apache.org>:
>>>>
>>>>> Hi Linked Data Tools
>>>>>
>>>>> One difficulty might arise because ContentHub has the index and the
>>>>
>>>> facets
>>>>>
>>>>> in lucene only and other metadata in an RDF graph. So for example if
>>>>> contenthub provides a facet "Paris" you only have the label without any
>>>>> association to the URI, so it won't be possible to get additional
>>>>> properties of the resource. This is way in the fusepool project we've
>>>>> chosen to build a store that stores all the data in an RDF graph and
>>>>
>>>> builds
>>>>>
>>>>> a lucene index on top of it. The code is here
>>>>> https://github.com/fusepool/fusepool-ecs, its apache licensed and btw.
>>>>> fusepool would be happy to donate it to the stanbol project.
>>>>>
>>>>> Cheers,
>>>>> Reto
>>>>>
>>>>> On Mon, Nov 10, 2014 at 7:54 PM, <co...@linkeddatatools.com> wrote:
>>>>>
>>>>>> Hi, I posted a similar message to the IKS mailing list, but understand
>>>>>> from the response that this mailing list is no longer administrated.
>>>>>>
>>>>>> Stanbol is a great tool and I'm having some success with it;
>>>>
>>>> particularly
>>>>>>
>>>>>> the entity extractor tool.
>>>>>>
>>>>>> I have a requirement and, I am not sure the best way to approach this
>>>>
>>>> and
>>>>>>
>>>>>> whether a best practice for this sort of problem has already been
>>>>>> established.
>>>>>>
>>>>>> I have an RDF graph - one in accordance with the FOAF ontology - and I
>>>>>> have a controlled vocabulary in the form of a SKOS RDF graph, which
>>>>>> contains a set of literal string terms and their semantic equivalents
>>>>
>>>> (e.g.
>>>>>>
>>>>>> 'President' <-> 'Managing Director' <-> 'Chief Executive' <-> 'MD' <->
>>>>>> etc.).
>>>>>>
>>>>>> I would like to search the literal strings in the FOAF graph for the
>>>>>> occurrence of the string literals, and their equivalents as defined by
>>>>
>>>> the
>>>>>>
>>>>>> SKOS thesaurus.
>>>>>>
>>>>>> I can suggest one approach to this problem, but I fear it may be quite
>>>>>> inefficient and take a long time, namely:
>>>>>>
>>>>>> - Query the RDF graph using SPARQL for all string literals.
>>>>>> - Pass each string literal to the Stanbol Entity Extractor, having
>>>>>> uploaded the SKOS thesaurus to the Stanbol Entity Hub.
>>>>>>
>>>>>> Now this seems quite a long winded. Further, I'm not even clear from
>>>>>> the
>>>>>> documentation whether the Stanbol Entity Extractor is capable of using
>>>>
>>>> SKOS
>>>>>>
>>>>>> vocabularies to map string literals to entities. Is Stanbol capable of
>>>>>> extracting entities using a SKOS vocabulary?
>>>>>>
>>>>>> This seems a fairly common thing to do (semantic search of an RDF
>>>>>> graph
>>>>>> using a thesaurus) - is there some better way of solving this problem
>>>>
>>>> using
>>>>>>
>>>>>> an already established strategy?
>>>>>>
>>>>>>
>>>>>> Many thanks!
>>>>>>
>>>>>> Linked Data Tools
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/

Re: Searching RDF graph using SKOS thesaurus

Posted by co...@linkeddatatools.com.
Hi, I have read the documentation as you directed, and have one  
further question just to clarify. I am asking this as I am less  
experienced with this sort of problem.

I believe the example FOAF graph I have given as a way of storing the  
data is probably inefficient, as the way I am storing the role someone  
has in the FOAF graph means I have to 'look up' the entity every time  
someone does a search to match both the literal string in the FOAF  
graph, and the literal string that someone tries to search for.

Would it be better to do something more similar to this example, where  
I define a URI for 'Managing Director' myself:

Example graph, showing person with role:

<foaf:Person rdf:ID="johnsmith">
  <foaf:firstName>John</foaf:firstName>
  <foaf:lastName>Smith</foaf:lastName>
  <ex:role  
rdf:resource="http://www.linkeddatatools.com/role/managingdirector"/>
</foaf:Person>

And then in my SKOS vocabulary I would define:

ex:managingdirector rdf:type skos:Concept;
  skos:prefLabel "Managing Director"@en;
  skos:altLabel "MD"@en;
  skos:altLabel "President"@en;
  skos:altLabel "CEO"@en.

I would then, rather than merge both graphs, upload the SKOS  
vocabulary to my EntityHub site. When a user made a search, I would  
carry out entity extraction on the search string (e.g. user searches  
for 'President') which would then return the  
http://www.linkeddatatools.com/role/managingdirector entity match.

I would then use SPARQL to query the FOAF graph for those with  
http://www.linkeddatatools.com/role/managingdirector as ex:role.

Is this not more efficient? Further, would I need to use EntityHub for  
this, or would it be better simply to query my SKOS vocabulary myself  
using SPARQL?

Thanks for your patience and input on this, as I say I am relatively  
new to this sort of problem and really do value any advice.


Best wishes

Mark


Quoting contact@linkeddatatools.com:

> Hi Rafa/Reto,
>
> Thanks very much for your replies - so I will look into:
>
> - Merging both graphs.
> - Uploading to a Stanbol Entity site.
> - Using entityhub/site/find/ in the documentation to return the  
> subjects that match an ex:role with that SKOS label.
>
> I'm not clear from the replies how I would use SPARQL, if you have  
> any further guidance then please let me know if this is a better  
> option.
>
>
> Again thanks,
>
> Mark
>
> Quoting Rafa Haro <rh...@apache.org>:
>
>> Hi Mark, Reto,
>>
>>
>> En 12 de noviembre de 2014 en 11:45:43, Reto Gmür (reto@apache.org) escrito:
>>
>> On Wed, Nov 12, 2014 at 10:16 AM, Rafa Haro <rh...@apache.org> wrote:
>>
>>> Hi Mark,
>>>
>>> You can solve your problem in Stanbol if you link or merge together both
>>> graphs in a single one and you create a site with it. After indexing the
>>> merged graph, you can use the EntityHub API and specifically the find
>>> (/entityhub/site/find) service to search for your label and then move to
>>> all the nodes associated to that skos label using an LDPath expression.
>>> Please take a look to the EntityHub REST API documentation.
>>>
>>
>> Just for completeness: After meging the two graphs (or even without) you
>> can also use SPARQL.
>>
>>
>> Yep, that’s true :-). I probably forgot to mention that if you are  
>> planning to enrich documents using both graphs, the LDPath approach  
>> is also available.
>>
>> Cheers,
>> Rafa
>>
>>
>> Cheers,
>> Reto
>>
>>
>>
>>>
>>> Hope that helps. Cheers,
>>> Rafa
>>>
>>>
>>> En 11 de noviembre de 2014 en 20:34:01, contact@linkeddatatools.com (
>>> contact@linkeddatatools.com) escrito:
>>>
>>> Hi, here is an example of what I'm trying to achieve. Does Fusepool,
>>> or another solution, achieve this goal?
>>>
>>> I have an RDF graph in a graph store:
>>>
>>> ==============================
>>>
>>> <foaf:Person rdf:ID="johnsmith">
>>> <foaf:firstName>John</foaf:firstName>
>>> <foaf:lastName>Smith</foaf:lastName>
>>> <ex:role>Managing Director</ex:role>
>>> </foaf:Person>
>>>
>>> ==============================
>>>
>>> I have the following SKOS vocabulary:
>>>
>>> ==============================
>>>
>>> ex:role rdf:type skos:Concept;
>>> skos:prefLabel "Managing Director"@en;
>>> skos:altLabel "MD"@en;
>>> skos:altLabel "President"@en;
>>> skos:altLabel "CEO"@en.
>>>
>>> ==============================
>>>
>>> If I search for anyone with the role 'President', I want to return
>>> John Smith (rdf:ID="johnsmith") - because 'President' is an
>>> alternative label for 'Managing Director'.
>>>
>>> Is this possible using an already established best practice, or framework?
>>>
>>> Please let me know if any further examples are required.
>>>
>>>
>>> Best wishes
>>>
>>> Mark
>>>
>>> Quoting Reto Gmür <re...@apache.org>:
>>>
>>>> Hi Linked Data Tools
>>>>
>>>> One difficulty might arise because ContentHub has the index and the
>>> facets
>>>> in lucene only and other metadata in an RDF graph. So for example if
>>>> contenthub provides a facet "Paris" you only have the label without any
>>>> association to the URI, so it won't be possible to get additional
>>>> properties of the resource. This is way in the fusepool project we've
>>>> chosen to build a store that stores all the data in an RDF graph and
>>> builds
>>>> a lucene index on top of it. The code is here
>>>> https://github.com/fusepool/fusepool-ecs, its apache licensed and btw.
>>>> fusepool would be happy to donate it to the stanbol project.
>>>>
>>>> Cheers,
>>>> Reto
>>>>
>>>> On Mon, Nov 10, 2014 at 7:54 PM, <co...@linkeddatatools.com> wrote:
>>>>
>>>>> Hi, I posted a similar message to the IKS mailing list, but understand
>>>>> from the response that this mailing list is no longer administrated.
>>>>>
>>>>> Stanbol is a great tool and I'm having some success with it;
>>> particularly
>>>>> the entity extractor tool.
>>>>>
>>>>> I have a requirement and, I am not sure the best way to approach this
>>> and
>>>>> whether a best practice for this sort of problem has already been
>>>>> established.
>>>>>
>>>>> I have an RDF graph - one in accordance with the FOAF ontology - and I
>>>>> have a controlled vocabulary in the form of a SKOS RDF graph, which
>>>>> contains a set of literal string terms and their semantic equivalents
>>> (e.g.
>>>>> 'President' <-> 'Managing Director' <-> 'Chief Executive' <-> 'MD' <->
>>>>> etc.).
>>>>>
>>>>> I would like to search the literal strings in the FOAF graph for the
>>>>> occurrence of the string literals, and their equivalents as defined by
>>> the
>>>>> SKOS thesaurus.
>>>>>
>>>>> I can suggest one approach to this problem, but I fear it may be quite
>>>>> inefficient and take a long time, namely:
>>>>>
>>>>> - Query the RDF graph using SPARQL for all string literals.
>>>>> - Pass each string literal to the Stanbol Entity Extractor, having
>>>>> uploaded the SKOS thesaurus to the Stanbol Entity Hub.
>>>>>
>>>>> Now this seems quite a long winded. Further, I'm not even clear from the
>>>>> documentation whether the Stanbol Entity Extractor is capable of using
>>> SKOS
>>>>> vocabularies to map string literals to entities. Is Stanbol capable of
>>>>> extracting entities using a SKOS vocabulary?
>>>>>
>>>>> This seems a fairly common thing to do (semantic search of an RDF graph
>>>>> using a thesaurus) - is there some better way of solving this problem
>>> using
>>>>> an already established strategy?
>>>>>
>>>>>
>>>>> Many thanks!
>>>>>
>>>>> Linked Data Tools
>>>>>
>>>>
>>>
>>>
>>
>
>


Re: Searching RDF graph using SKOS thesaurus

Posted by co...@linkeddatatools.com.
Hi Rafa/Reto,

Thanks very much for your replies - so I will look into:

- Merging both graphs.
- Uploading to a Stanbol Entity site.
- Using entityhub/site/find/ in the documentation to return the  
subjects that match an ex:role with that SKOS label.

I'm not clear from the replies how I would use SPARQL, if you have any  
further guidance then please let me know if this is a better option.


Again thanks,

Mark

Quoting Rafa Haro <rh...@apache.org>:

> Hi Mark, Reto,
>
>
> En 12 de noviembre de 2014 en 11:45:43, Reto Gmür (reto@apache.org) escrito:
>
> On Wed, Nov 12, 2014 at 10:16 AM, Rafa Haro <rh...@apache.org> wrote:
>
>> Hi Mark,
>>
>> You can solve your problem in Stanbol if you link or merge together both
>> graphs in a single one and you create a site with it. After indexing the
>> merged graph, you can use the EntityHub API and specifically the find
>> (/entityhub/site/find) service to search for your label and then move to
>> all the nodes associated to that skos label using an LDPath expression.
>> Please take a look to the EntityHub REST API documentation.
>>
>
> Just for completeness: After meging the two graphs (or even without) you
> can also use SPARQL.
>
>
> Yep, that’s true :-). I probably forgot to mention that if you are  
> planning to enrich documents using both graphs, the LDPath approach  
> is also available.
>
> Cheers,
> Rafa
>
>
> Cheers,
> Reto
>
>
>
>>
>> Hope that helps. Cheers,
>> Rafa
>>
>>
>> En 11 de noviembre de 2014 en 20:34:01, contact@linkeddatatools.com (
>> contact@linkeddatatools.com) escrito:
>>
>> Hi, here is an example of what I'm trying to achieve. Does Fusepool,
>> or another solution, achieve this goal?
>>
>> I have an RDF graph in a graph store:
>>
>> ==============================
>>
>> <foaf:Person rdf:ID="johnsmith">
>> <foaf:firstName>John</foaf:firstName>
>> <foaf:lastName>Smith</foaf:lastName>
>> <ex:role>Managing Director</ex:role>
>> </foaf:Person>
>>
>> ==============================
>>
>> I have the following SKOS vocabulary:
>>
>> ==============================
>>
>> ex:role rdf:type skos:Concept;
>> skos:prefLabel "Managing Director"@en;
>> skos:altLabel "MD"@en;
>> skos:altLabel "President"@en;
>> skos:altLabel "CEO"@en.
>>
>> ==============================
>>
>> If I search for anyone with the role 'President', I want to return
>> John Smith (rdf:ID="johnsmith") - because 'President' is an
>> alternative label for 'Managing Director'.
>>
>> Is this possible using an already established best practice, or framework?
>>
>> Please let me know if any further examples are required.
>>
>>
>> Best wishes
>>
>> Mark
>>
>> Quoting Reto Gmür <re...@apache.org>:
>>
>> > Hi Linked Data Tools
>> >
>> > One difficulty might arise because ContentHub has the index and the
>> facets
>> > in lucene only and other metadata in an RDF graph. So for example if
>> > contenthub provides a facet "Paris" you only have the label without any
>> > association to the URI, so it won't be possible to get additional
>> > properties of the resource. This is way in the fusepool project we've
>> > chosen to build a store that stores all the data in an RDF graph and
>> builds
>> > a lucene index on top of it. The code is here
>> > https://github.com/fusepool/fusepool-ecs, its apache licensed and btw.
>> > fusepool would be happy to donate it to the stanbol project.
>> >
>> > Cheers,
>> > Reto
>> >
>> > On Mon, Nov 10, 2014 at 7:54 PM, <co...@linkeddatatools.com> wrote:
>> >
>> >> Hi, I posted a similar message to the IKS mailing list, but understand
>> >> from the response that this mailing list is no longer administrated.
>> >>
>> >> Stanbol is a great tool and I'm having some success with it;
>> particularly
>> >> the entity extractor tool.
>> >>
>> >> I have a requirement and, I am not sure the best way to approach this
>> and
>> >> whether a best practice for this sort of problem has already been
>> >> established.
>> >>
>> >> I have an RDF graph - one in accordance with the FOAF ontology - and I
>> >> have a controlled vocabulary in the form of a SKOS RDF graph, which
>> >> contains a set of literal string terms and their semantic equivalents
>> (e.g.
>> >> 'President' <-> 'Managing Director' <-> 'Chief Executive' <-> 'MD' <->
>> >> etc.).
>> >>
>> >> I would like to search the literal strings in the FOAF graph for the
>> >> occurrence of the string literals, and their equivalents as defined by
>> the
>> >> SKOS thesaurus.
>> >>
>> >> I can suggest one approach to this problem, but I fear it may be quite
>> >> inefficient and take a long time, namely:
>> >>
>> >> - Query the RDF graph using SPARQL for all string literals.
>> >> - Pass each string literal to the Stanbol Entity Extractor, having
>> >> uploaded the SKOS thesaurus to the Stanbol Entity Hub.
>> >>
>> >> Now this seems quite a long winded. Further, I'm not even clear from the
>> >> documentation whether the Stanbol Entity Extractor is capable of using
>> SKOS
>> >> vocabularies to map string literals to entities. Is Stanbol capable of
>> >> extracting entities using a SKOS vocabulary?
>> >>
>> >> This seems a fairly common thing to do (semantic search of an RDF graph
>> >> using a thesaurus) - is there some better way of solving this problem
>> using
>> >> an already established strategy?
>> >>
>> >>
>> >> Many thanks!
>> >>
>> >> Linked Data Tools
>> >>
>> >
>>
>>
>


Re: Searching RDF graph using SKOS thesaurus

Posted by Rafa Haro <rh...@apache.org>.
Hi Mark, Reto,


En 12 de noviembre de 2014 en 11:45:43, Reto Gmür (reto@apache.org) escrito:

On Wed, Nov 12, 2014 at 10:16 AM, Rafa Haro <rh...@apache.org> wrote:  

> Hi Mark,  
>  
> You can solve your problem in Stanbol if you link or merge together both  
> graphs in a single one and you create a site with it. After indexing the  
> merged graph, you can use the EntityHub API and specifically the find  
> (/entityhub/site/find) service to search for your label and then move to  
> all the nodes associated to that skos label using an LDPath expression.  
> Please take a look to the EntityHub REST API documentation.  
>  

Just for completeness: After meging the two graphs (or even without) you  
can also use SPARQL.  


Yep, that’s true :-). I probably forgot to mention that if you are planning to enrich documents using both graphs, the LDPath approach is also available.

Cheers,
Rafa


Cheers,  
Reto  



>  
> Hope that helps. Cheers,  
> Rafa  
>  
>  
> En 11 de noviembre de 2014 en 20:34:01, contact@linkeddatatools.com (  
> contact@linkeddatatools.com) escrito:  
>  
> Hi, here is an example of what I'm trying to achieve. Does Fusepool,  
> or another solution, achieve this goal?  
>  
> I have an RDF graph in a graph store:  
>  
> ==============================  
>  
> <foaf:Person rdf:ID="johnsmith">  
> <foaf:firstName>John</foaf:firstName>  
> <foaf:lastName>Smith</foaf:lastName>  
> <ex:role>Managing Director</ex:role>  
> </foaf:Person>  
>  
> ==============================  
>  
> I have the following SKOS vocabulary:  
>  
> ==============================  
>  
> ex:role rdf:type skos:Concept;  
> skos:prefLabel "Managing Director"@en;  
> skos:altLabel "MD"@en;  
> skos:altLabel "President"@en;  
> skos:altLabel "CEO"@en.  
>  
> ==============================  
>  
> If I search for anyone with the role 'President', I want to return  
> John Smith (rdf:ID="johnsmith") - because 'President' is an  
> alternative label for 'Managing Director'.  
>  
> Is this possible using an already established best practice, or framework?  
>  
> Please let me know if any further examples are required.  
>  
>  
> Best wishes  
>  
> Mark  
>  
> Quoting Reto Gmür <re...@apache.org>:  
>  
> > Hi Linked Data Tools  
> >  
> > One difficulty might arise because ContentHub has the index and the  
> facets  
> > in lucene only and other metadata in an RDF graph. So for example if  
> > contenthub provides a facet "Paris" you only have the label without any  
> > association to the URI, so it won't be possible to get additional  
> > properties of the resource. This is way in the fusepool project we've  
> > chosen to build a store that stores all the data in an RDF graph and  
> builds  
> > a lucene index on top of it. The code is here  
> > https://github.com/fusepool/fusepool-ecs, its apache licensed and btw.  
> > fusepool would be happy to donate it to the stanbol project.  
> >  
> > Cheers,  
> > Reto  
> >  
> > On Mon, Nov 10, 2014 at 7:54 PM, <co...@linkeddatatools.com> wrote:  
> >  
> >> Hi, I posted a similar message to the IKS mailing list, but understand  
> >> from the response that this mailing list is no longer administrated.  
> >>  
> >> Stanbol is a great tool and I'm having some success with it;  
> particularly  
> >> the entity extractor tool.  
> >>  
> >> I have a requirement and, I am not sure the best way to approach this  
> and  
> >> whether a best practice for this sort of problem has already been  
> >> established.  
> >>  
> >> I have an RDF graph - one in accordance with the FOAF ontology - and I  
> >> have a controlled vocabulary in the form of a SKOS RDF graph, which  
> >> contains a set of literal string terms and their semantic equivalents  
> (e.g.  
> >> 'President' <-> 'Managing Director' <-> 'Chief Executive' <-> 'MD' <->  
> >> etc.).  
> >>  
> >> I would like to search the literal strings in the FOAF graph for the  
> >> occurrence of the string literals, and their equivalents as defined by  
> the  
> >> SKOS thesaurus.  
> >>  
> >> I can suggest one approach to this problem, but I fear it may be quite  
> >> inefficient and take a long time, namely:  
> >>  
> >> - Query the RDF graph using SPARQL for all string literals.  
> >> - Pass each string literal to the Stanbol Entity Extractor, having  
> >> uploaded the SKOS thesaurus to the Stanbol Entity Hub.  
> >>  
> >> Now this seems quite a long winded. Further, I'm not even clear from the  
> >> documentation whether the Stanbol Entity Extractor is capable of using  
> SKOS  
> >> vocabularies to map string literals to entities. Is Stanbol capable of  
> >> extracting entities using a SKOS vocabulary?  
> >>  
> >> This seems a fairly common thing to do (semantic search of an RDF graph  
> >> using a thesaurus) - is there some better way of solving this problem  
> using  
> >> an already established strategy?  
> >>  
> >>  
> >> Many thanks!  
> >>  
> >> Linked Data Tools  
> >>  
> >  
>  
>  

Re: Searching RDF graph using SKOS thesaurus

Posted by Reto Gmür <re...@apache.org>.
On Wed, Nov 12, 2014 at 10:16 AM, Rafa Haro <rh...@apache.org> wrote:

> Hi Mark,
>
> You can solve your problem in Stanbol if you link or merge together both
> graphs in a single one and you create a site with it. After indexing the
> merged graph, you can use the EntityHub API and specifically the find
> (/entityhub/site/find) service to search for your label and then move to
> all the nodes associated to that skos label using an LDPath expression.
> Please take a look to the EntityHub REST API documentation.
>

Just for completeness: After meging the two graphs (or even without) you
can also use SPARQL.

Cheers,
Reto



>
> Hope that helps. Cheers,
> Rafa
>
>
> En 11 de noviembre de 2014 en 20:34:01, contact@linkeddatatools.com (
> contact@linkeddatatools.com) escrito:
>
> Hi, here is an example of what I'm trying to achieve. Does Fusepool,
> or another solution, achieve this goal?
>
> I have an RDF graph in a graph store:
>
> ==============================
>
> <foaf:Person rdf:ID="johnsmith">
> <foaf:firstName>John</foaf:firstName>
> <foaf:lastName>Smith</foaf:lastName>
> <ex:role>Managing Director</ex:role>
> </foaf:Person>
>
> ==============================
>
> I have the following SKOS vocabulary:
>
> ==============================
>
> ex:role rdf:type skos:Concept;
> skos:prefLabel "Managing Director"@en;
> skos:altLabel "MD"@en;
> skos:altLabel "President"@en;
> skos:altLabel "CEO"@en.
>
> ==============================
>
> If I search for anyone with the role 'President', I want to return
> John Smith (rdf:ID="johnsmith") - because 'President' is an
> alternative label for 'Managing Director'.
>
> Is this possible using an already established best practice, or framework?
>
> Please let me know if any further examples are required.
>
>
> Best wishes
>
> Mark
>
> Quoting Reto Gmür <re...@apache.org>:
>
> > Hi Linked Data Tools
> >
> > One difficulty might arise because ContentHub has the index and the
> facets
> > in lucene only and other metadata in an RDF graph. So for example if
> > contenthub provides a facet "Paris" you only have the label without any
> > association to the URI, so it won't be possible to get additional
> > properties of the resource. This is way in the fusepool project we've
> > chosen to build a store that stores all the data in an RDF graph and
> builds
> > a lucene index on top of it. The code is here
> > https://github.com/fusepool/fusepool-ecs, its apache licensed and btw.
> > fusepool would be happy to donate it to the stanbol project.
> >
> > Cheers,
> > Reto
> >
> > On Mon, Nov 10, 2014 at 7:54 PM, <co...@linkeddatatools.com> wrote:
> >
> >> Hi, I posted a similar message to the IKS mailing list, but understand
> >> from the response that this mailing list is no longer administrated.
> >>
> >> Stanbol is a great tool and I'm having some success with it;
> particularly
> >> the entity extractor tool.
> >>
> >> I have a requirement and, I am not sure the best way to approach this
> and
> >> whether a best practice for this sort of problem has already been
> >> established.
> >>
> >> I have an RDF graph - one in accordance with the FOAF ontology - and I
> >> have a controlled vocabulary in the form of a SKOS RDF graph, which
> >> contains a set of literal string terms and their semantic equivalents
> (e.g.
> >> 'President' <-> 'Managing Director' <-> 'Chief Executive' <-> 'MD' <->
> >> etc.).
> >>
> >> I would like to search the literal strings in the FOAF graph for the
> >> occurrence of the string literals, and their equivalents as defined by
> the
> >> SKOS thesaurus.
> >>
> >> I can suggest one approach to this problem, but I fear it may be quite
> >> inefficient and take a long time, namely:
> >>
> >> - Query the RDF graph using SPARQL for all string literals.
> >> - Pass each string literal to the Stanbol Entity Extractor, having
> >> uploaded the SKOS thesaurus to the Stanbol Entity Hub.
> >>
> >> Now this seems quite a long winded. Further, I'm not even clear from the
> >> documentation whether the Stanbol Entity Extractor is capable of using
> SKOS
> >> vocabularies to map string literals to entities. Is Stanbol capable of
> >> extracting entities using a SKOS vocabulary?
> >>
> >> This seems a fairly common thing to do (semantic search of an RDF graph
> >> using a thesaurus) - is there some better way of solving this problem
> using
> >> an already established strategy?
> >>
> >>
> >> Many thanks!
> >>
> >> Linked Data Tools
> >>
> >
>
>

Re: Searching RDF graph using SKOS thesaurus

Posted by Rafa Haro <rh...@apache.org>.
Hi Mark,

You can solve your problem in Stanbol if you link or merge together both graphs in a single one and you create a site with it. After indexing the merged graph, you can use the EntityHub API and specifically the find (/entityhub/site/find) service to search for your label and then move to all the nodes associated to that skos label using an LDPath expression. Please take a look to the EntityHub REST API documentation.

Hope that helps. Cheers,
Rafa


En 11 de noviembre de 2014 en 20:34:01, contact@linkeddatatools.com (contact@linkeddatatools.com) escrito:

Hi, here is an example of what I'm trying to achieve. Does Fusepool,  
or another solution, achieve this goal?  

I have an RDF graph in a graph store:  

==============================  

<foaf:Person rdf:ID="johnsmith">  
<foaf:firstName>John</foaf:firstName>  
<foaf:lastName>Smith</foaf:lastName>  
<ex:role>Managing Director</ex:role>  
</foaf:Person>  

==============================  

I have the following SKOS vocabulary:  

==============================  

ex:role rdf:type skos:Concept;  
skos:prefLabel "Managing Director"@en;  
skos:altLabel "MD"@en;  
skos:altLabel "President"@en;  
skos:altLabel "CEO"@en.  

==============================  

If I search for anyone with the role 'President', I want to return  
John Smith (rdf:ID="johnsmith") - because 'President' is an  
alternative label for 'Managing Director'.  

Is this possible using an already established best practice, or framework?  

Please let me know if any further examples are required.  


Best wishes  

Mark  

Quoting Reto Gmür <re...@apache.org>:  

> Hi Linked Data Tools  
>  
> One difficulty might arise because ContentHub has the index and the facets  
> in lucene only and other metadata in an RDF graph. So for example if  
> contenthub provides a facet "Paris" you only have the label without any  
> association to the URI, so it won't be possible to get additional  
> properties of the resource. This is way in the fusepool project we've  
> chosen to build a store that stores all the data in an RDF graph and builds  
> a lucene index on top of it. The code is here  
> https://github.com/fusepool/fusepool-ecs, its apache licensed and btw.  
> fusepool would be happy to donate it to the stanbol project.  
>  
> Cheers,  
> Reto  
>  
> On Mon, Nov 10, 2014 at 7:54 PM, <co...@linkeddatatools.com> wrote:  
>  
>> Hi, I posted a similar message to the IKS mailing list, but understand  
>> from the response that this mailing list is no longer administrated.  
>>  
>> Stanbol is a great tool and I'm having some success with it; particularly  
>> the entity extractor tool.  
>>  
>> I have a requirement and, I am not sure the best way to approach this and  
>> whether a best practice for this sort of problem has already been  
>> established.  
>>  
>> I have an RDF graph - one in accordance with the FOAF ontology - and I  
>> have a controlled vocabulary in the form of a SKOS RDF graph, which  
>> contains a set of literal string terms and their semantic equivalents (e.g.  
>> 'President' <-> 'Managing Director' <-> 'Chief Executive' <-> 'MD' <->  
>> etc.).  
>>  
>> I would like to search the literal strings in the FOAF graph for the  
>> occurrence of the string literals, and their equivalents as defined by the  
>> SKOS thesaurus.  
>>  
>> I can suggest one approach to this problem, but I fear it may be quite  
>> inefficient and take a long time, namely:  
>>  
>> - Query the RDF graph using SPARQL for all string literals.  
>> - Pass each string literal to the Stanbol Entity Extractor, having  
>> uploaded the SKOS thesaurus to the Stanbol Entity Hub.  
>>  
>> Now this seems quite a long winded. Further, I'm not even clear from the  
>> documentation whether the Stanbol Entity Extractor is capable of using SKOS  
>> vocabularies to map string literals to entities. Is Stanbol capable of  
>> extracting entities using a SKOS vocabulary?  
>>  
>> This seems a fairly common thing to do (semantic search of an RDF graph  
>> using a thesaurus) - is there some better way of solving this problem using  
>> an already established strategy?  
>>  
>>  
>> Many thanks!  
>>  
>> Linked Data Tools  
>>  
>  


Re: Searching RDF graph using SKOS thesaurus

Posted by co...@linkeddatatools.com.
Hi, here is an example of what I'm trying to achieve. Does Fusepool,  
or another solution, achieve this goal?

I have an RDF graph in a graph store:

==============================

<foaf:Person rdf:ID="johnsmith">
     <foaf:firstName>John</foaf:firstName>
     <foaf:lastName>Smith</foaf:lastName>
     <ex:role>Managing Director</ex:role>
</foaf:Person>

==============================

I have the following SKOS vocabulary:

==============================

ex:role rdf:type skos:Concept;
   skos:prefLabel "Managing Director"@en;
   skos:altLabel "MD"@en;
   skos:altLabel "President"@en;
   skos:altLabel "CEO"@en.

==============================

If I search for anyone with the role 'President', I want to return  
John Smith (rdf:ID="johnsmith") - because 'President' is an  
alternative label for 'Managing Director'.

Is this possible using an already established best practice, or framework?

Please let me know if any further examples are required.


Best wishes

Mark

Quoting Reto Gmür <re...@apache.org>:

> Hi Linked Data Tools
>
> One difficulty might arise because ContentHub has the index and the facets
> in lucene only and other metadata in an RDF graph. So for example if
> contenthub provides a facet "Paris" you only have the label without any
> association to the URI, so it won't be possible to get additional
> properties of the resource. This is way in the fusepool project we've
> chosen to build a store that stores all the data in an RDF graph and builds
> a lucene index on top of it. The code is here
> https://github.com/fusepool/fusepool-ecs, its apache licensed and btw.
> fusepool would be happy to donate it to the stanbol project.
>
> Cheers,
> Reto
>
> On Mon, Nov 10, 2014 at 7:54 PM, <co...@linkeddatatools.com> wrote:
>
>> Hi, I posted a similar message to the IKS mailing list, but understand
>> from the response that this mailing list is no longer administrated.
>>
>> Stanbol is a great tool and I'm having some success with it; particularly
>> the entity extractor tool.
>>
>> I have a requirement and, I am not sure the best way to approach this and
>> whether a best practice for this sort of problem has already been
>> established.
>>
>> I have an RDF graph - one in accordance with the FOAF ontology - and I
>> have a controlled vocabulary in the form of a SKOS RDF graph, which
>> contains a set of literal string terms and their semantic equivalents (e.g.
>> 'President' <-> 'Managing Director' <-> 'Chief Executive' <-> 'MD' <->
>> etc.).
>>
>> I would like to search the literal strings in the FOAF graph for the
>> occurrence of the string literals, and their equivalents as defined by the
>> SKOS thesaurus.
>>
>> I can suggest one approach to this problem, but I fear it may be quite
>> inefficient and take a long time, namely:
>>
>> - Query the RDF graph using SPARQL for all string literals.
>> - Pass each string literal to the Stanbol Entity Extractor, having
>> uploaded the SKOS thesaurus to the Stanbol Entity Hub.
>>
>> Now this seems quite a long winded. Further, I'm not even clear from the
>> documentation whether the Stanbol Entity Extractor is capable of using SKOS
>> vocabularies to map string literals to entities. Is Stanbol capable of
>> extracting entities using a SKOS vocabulary?
>>
>> This seems a fairly common thing to do (semantic search of an RDF graph
>> using a thesaurus) - is there some better way of solving this problem using
>> an already established strategy?
>>
>>
>> Many thanks!
>>
>> Linked Data Tools
>>
>


Re: Searching RDF graph using SKOS thesaurus

Posted by Reto Gmür <re...@apache.org>.
Hi Linked Data Tools

One difficulty might arise because ContentHub has the index and the facets
in lucene only and other metadata in an RDF graph. So for example if
contenthub provides a facet "Paris" you only have the label without any
association to the URI, so it won't be possible to get additional
properties of the resource. This is way in the fusepool project we've
chosen to build a store that stores all the data in an RDF graph and builds
a lucene index on top of it. The code is here
https://github.com/fusepool/fusepool-ecs, its apache licensed and btw.
fusepool would be happy to donate it to the stanbol project.

Cheers,
Reto

On Mon, Nov 10, 2014 at 7:54 PM, <co...@linkeddatatools.com> wrote:

> Hi, I posted a similar message to the IKS mailing list, but understand
> from the response that this mailing list is no longer administrated.
>
> Stanbol is a great tool and I'm having some success with it; particularly
> the entity extractor tool.
>
> I have a requirement and, I am not sure the best way to approach this and
> whether a best practice for this sort of problem has already been
> established.
>
> I have an RDF graph - one in accordance with the FOAF ontology - and I
> have a controlled vocabulary in the form of a SKOS RDF graph, which
> contains a set of literal string terms and their semantic equivalents (e.g.
> 'President' <-> 'Managing Director' <-> 'Chief Executive' <-> 'MD' <->
> etc.).
>
> I would like to search the literal strings in the FOAF graph for the
> occurrence of the string literals, and their equivalents as defined by the
> SKOS thesaurus.
>
> I can suggest one approach to this problem, but I fear it may be quite
> inefficient and take a long time, namely:
>
> - Query the RDF graph using SPARQL for all string literals.
> - Pass each string literal to the Stanbol Entity Extractor, having
> uploaded the SKOS thesaurus to the Stanbol Entity Hub.
>
> Now this seems quite a long winded. Further, I'm not even clear from the
> documentation whether the Stanbol Entity Extractor is capable of using SKOS
> vocabularies to map string literals to entities. Is Stanbol capable of
> extracting entities using a SKOS vocabulary?
>
> This seems a fairly common thing to do (semantic search of an RDF graph
> using a thesaurus) - is there some better way of solving this problem using
> an already established strategy?
>
>
> Many thanks!
>
> Linked Data Tools
>

Re: Searching RDF graph using SKOS thesaurus

Posted by Rafa Haro <rh...@apache.org>.
Hi, 

Can you clarify this with an example please?

Cheers,
Rafa


En 10 de noviembre de 2014 en 19:55:26, contact@linkeddatatools.com (contact@linkeddatatools.com) escrito:

Hi, I posted a similar message to the IKS mailing list, but understand  
from the response that this mailing list is no longer administrated.  

Stanbol is a great tool and I'm having some success with it;  
particularly the entity extractor tool.  

I have a requirement and, I am not sure the best way to approach this  
and whether a best practice for this sort of problem has already been  
established.  

I have an RDF graph - one in accordance with the FOAF ontology - and I  
have a controlled vocabulary in the form of a SKOS RDF graph, which  
contains a set of literal string terms and their semantic equivalents  
(e.g. 'President' <-> 'Managing Director' <-> 'Chief Executive' <->  
'MD' <-> etc.).  

I would like to search the literal strings in the FOAF graph for the  
occurrence of the string literals, and their equivalents as defined by  
the SKOS thesaurus.  

I can suggest one approach to this problem, but I fear it may be quite  
inefficient and take a long time, namely:  

- Query the RDF graph using SPARQL for all string literals.  
- Pass each string literal to the Stanbol Entity Extractor, having  
uploaded the SKOS thesaurus to the Stanbol Entity Hub.  

Now this seems quite a long winded. Further, I'm not even clear from  
the documentation whether the Stanbol Entity Extractor is capable of  
using SKOS vocabularies to map string literals to entities. Is Stanbol  
capable of extracting entities using a SKOS vocabulary?  

This seems a fairly common thing to do (semantic search of an RDF  
graph using a thesaurus) - is there some better way of solving this  
problem using an already established strategy?  


Many thanks!  

Linked Data Tools