You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Rupert Westenthaler <ru...@gmail.com> on 2015/06/02 17:03:08 UTC

Re: Clerezza Yard setup and SPARQL

Hi Rajan,

Sorry I do not have enough time for a detailed answer. But the
baseline is. EntityLinking does not work with the Clerezza Yard. Even
if you would not encounter errors both performance and results would
be much worse as with a SolrYard. This is because EntityLinking
depends on features that are Solr Exclusive (e.g. the Solr Analyzers
doing Stemming ... and the ranking of query results).

If you find failing SPARQL queries in the log feel free to report as
Issues in Jira. I will have a look.

best
Rupert

On Sat, May 30, 2015 at 11:14 PM, Rajan Shah <ra...@gmail.com> wrote:
> Hi,
>
> I can create Clerezza Yard successfully and query the data using SPARQL. Now,
> when it comes to Named Entity Recognition the same issue persists.
>
> I would appreciate, if someone can provide some insight or potential
> resolution.
>
> Thanks in advance,
> Rajan
>
> These are the steps I followed:
>
> 1. Uploaded relevant ontology to local ontonet
>
> 2. Created Managed Site, uploaded triples
>
> 3. Verified the data exists via SPARQL query:
>
> <binding>
> <result>
> <binding name="ticker"><literal>AAPL</literal>
> </binding><binding name="issuer"><literal>Apple Inc.</literal>
> </binding><binding name="exchange"><literal>NASDAQ</literal></binding>
> <binding name="currency"><literal>USD</literal>
> </binding><binding name="instr">
> <uri>http://finance.intellimind.io/secmaster/djia/AAPL</uri>
> </binding>
> </result>
> </results></sparql>
>
> 4. Entityhub Linking
>
> Assuming prefix imind being http://finance.intellimind.io/secmaster (so
> that namespace prefix can be verified)
>
> In the entityhub linking setup, within type mapping I am trying to map
>
> a. Type Mapping Setup
> imind:ticker > rdfs:label
> imind:exchange > rdfs:label
> ...
>
> b. Select "Case Sensitivity"
>
>
> 5. Chain setup
>
> When included it in the list chain, it doesn't capture single entity
> whereas it spent most of the time in this paricular chain.
>
>
>
>    - *tika* ( optional , TikaEngine)
>    - *langdetect* ( required , LanguageDetectionEnhancementEngine)
>    - *opennlp-sentence* ( required , OpenNlpSentenceDetectionEngine)
>    - *opennlp-token* ( required , OpenNlpTokenizerEngine)
>    - *opennlp-pos* ( required , OpenNlpPosTaggingEngine)
>    - *opennlp-ner* ( required , NamedEntityExtractionEnhancementEngine)
>    - *refdata-linking* ( required , EntityLinkingEngine)
>    -
>
>
> *Sample Text:*
>
> The Apple Inc. CEO Tim Cook spoke at dev conference. The Apple Inc. has
> headquarter in US. It's ticker symbol is AAPL, which trades on NASDAQ.
>
> On Mon, May 25, 2015 at 12:04 AM, Rajan Shah <ra...@gmail.com> wrote:
>
>> Hi,
>>
>> In order to use Clerezza Yard setup, I tried very simple example outlined
>> at the end.
>>
>> I would really appreciate, if someone can shed some light on
>>
>> a. Is there anything I am just completely missing here pertaining to
>> "Named Graph" vs "Unions of Graphs" and reference? If that's the case,
>> could you please clarify what would be relevant URI/IRI?
>>
>> b. What is the best way to debug such an issue? If SPARQL query fails,
>> where should I see the logs indicating any issue as it doesn't appear in
>> stdout logs?
>>
>> c. Is there any other simple alternative compare to this to achieve
>> similar functionality? Is storing in Kiwi beneficial compared to this
>> approach or do I have to have Apache Maramotta installed in order to use
>> Kiwi?
>>
>> Thanks in advance,
>> Rajan
>>
>>
>> *1. Apache Stanbol Entityhub Yard: Clerezza Yard Configuration*
>>
>> Set following parameters
>>
>> ID: testYard
>> Graph URI: http://test.io/ns/friends#
>>
>> *2. Setup Clerezza - SCB Jena TDB Storage Provider*
>>
>> Jena TDB directory: /<stanbol_dir>/<tdb_store>
>> Default Graph Name: http://test.io/ns
>> Weight: 105
>>
>> *3. Save the .ttl file into /<stanbol_dir>/<tdb_store>*
>>
>> @prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
>> @prefix rdfa: <http://www.w3.org/ns/rdfa#> .
>> @prefix friends: <http://test.io/ns/friends#> .
>>
>> <http://test.io/ns/friends#AndrewSmith> a vcard:Individual;
>>                     vcard:fn "Andrew Smith";
>>                     vcard:title "Founder";
>>                     vcard:org "ABC LLC";
>>                     vcard:orgunit "Startup";
>>                     vcard:hasAddress [
>>                                         a vcard:Work;
>>                                         vcard:country-name "USA";
>>                                         vcard:locality "New York";
>>                                         vcard:region "New York"
>>                     ] .
>>
>> *4. I do see that, upon startup, it creates necessary index files within *
>> /<stanbol_dir>/<tdb_store>
>> directory. In addition, within UI, it also registers following
>> TripleCollections in SPARQL Endpoint
>>
>> http://test.io/ns/friends#
>>
>> *5. SPARQL Query*
>> -- query1
>> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
>> PREFIX friends: <http://test.io/ns/friends#>
>> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>>
>> SELECT ?fn ?title ?org
>> WHERE {
>>   ?s vcard:fn ?fn ;
>>     vcard:title ?title ;
>>     vcard:org ?org .
>> }
>>
>> OR
>>
>> -- query2
>> PREFIX hmgr: <http://test.io/ns/friends#>
>> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
>>
>> SELECT ?Individual ?title
>> WHERE { ?title  vcard:title  "Founder" }
>>
>>
>> *Observations:*
>>
>> The above queries work perfectly fine on either command-line or Jena Fuseki
>> as follows
>>
>> a. tdbquery --loc /<stanbol_dir/<tdb_store> --query query1
>> b. using fuseki user interface
>>
>> I tried couple alternatives such as GRAPH, NAMED, etc... however nothing
>> helps. Is there any specific syntax need to be used for the SPARQL stanbol
>> interface?
>>
>>
>>
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/

Re: Clerezza Yard setup and SPARQL

Posted by Rajan Shah <ra...@gmail.com>.

Hi Rupert,

Thanks a lot for the clarification!

It makes sense.

With best regards,
Rajan

On Mon, Jun 8, 2015 at 3:50 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi,
>
> The SolrYard does not support BNodes and the VCard RDF tends to use those.
>
> If you use the Entityhub Indexing Tool for importing the data you can
> try to set the "bnode-prefix" for the rdf indexing source (see
> STANBOL-765 [1] for details)
>
> best
> Rupert
>
> [1] https://issues.apache.org/jira/browse/STANBOL-765
>
> On Tue, Jun 2, 2015 at 6:13 PM, Rajan Shah <ra...@gmail.com> wrote:
> > Hi Rupert,
> >
> > Thanks again for the response.
> >
> > At present, it's just an observation that mainly with vcard I had issue
> > with queries. At the same time, I could get results with either custom
> > entities or even foaf.
> >
> > I will keep an eye on it and if observe it again, will submit JIRA issue.
> >
> > With best regards,
> > Rajan
> >
> >
> >
> > On Tue, Jun 2, 2015 at 11:03 AM, Rupert Westenthaler <
> > rupert.westenthaler@gmail.com> wrote:
> >
> >> Hi Rajan,
> >>
> >> Sorry I do not have enough time for a detailed answer. But the
> >> baseline is. EntityLinking does not work with the Clerezza Yard. Even
> >> if you would not encounter errors both performance and results would
> >> be much worse as with a SolrYard. This is because EntityLinking
> >> depends on features that are Solr Exclusive (e.g. the Solr Analyzers
> >> doing Stemming ... and the ranking of query results).
> >>
> >> If you find failing SPARQL queries in the log feel free to report as
> >> Issues in Jira. I will have a look.
> >>
> >> best
> >> Rupert
> >>
> >> On Sat, May 30, 2015 at 11:14 PM, Rajan Shah <ra...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I can create Clerezza Yard successfully and query the data using
> SPARQL.
> >> Now,
> >> > when it comes to Named Entity Recognition the same issue persists.
> >> >
> >> > I would appreciate, if someone can provide some insight or potential
> >> > resolution.
> >> >
> >> > Thanks in advance,
> >> > Rajan
> >> >
> >> > These are the steps I followed:
> >> >
> >> > 1. Uploaded relevant ontology to local ontonet
> >> >
> >> > 2. Created Managed Site, uploaded triples
> >> >
> >> > 3. Verified the data exists via SPARQL query:
> >> >
> >> > <binding>
> >> > <result>
> >> > <binding name="ticker"><literal>AAPL</literal>
> >> > </binding><binding name="issuer"><literal>Apple Inc.</literal>
> >> > </binding><binding name="exchange"><literal>NASDAQ</literal></binding>
> >> > <binding name="currency"><literal>USD</literal>
> >> > </binding><binding name="instr">
> >> > <uri>http://finance.intellimind.io/secmaster/djia/AAPL</uri>
> >> > </binding>
> >> > </result>
> >> > </results></sparql>
> >> >
> >> > 4. Entityhub Linking
> >> >
> >> > Assuming prefix imind being http://finance.intellimind.io/secmaster
> (so
> >> > that namespace prefix can be verified)
> >> >
> >> > In the entityhub linking setup, within type mapping I am trying to map
> >> >
> >> > a. Type Mapping Setup
> >> > imind:ticker > rdfs:label
> >> > imind:exchange > rdfs:label
> >> > ...
> >> >
> >> > b. Select "Case Sensitivity"
> >> >
> >> >
> >> > 5. Chain setup
> >> >
> >> > When included it in the list chain, it doesn't capture single entity
> >> > whereas it spent most of the time in this paricular chain.
> >> >
> >> >
> >> >
> >> >    - *tika* ( optional , TikaEngine)
> >> >    - *langdetect* ( required , LanguageDetectionEnhancementEngine)
> >> >    - *opennlp-sentence* ( required , OpenNlpSentenceDetectionEngine)
> >> >    - *opennlp-token* ( required , OpenNlpTokenizerEngine)
> >> >    - *opennlp-pos* ( required , OpenNlpPosTaggingEngine)
> >> >    - *opennlp-ner* ( required ,
> NamedEntityExtractionEnhancementEngine)
> >> >    - *refdata-linking* ( required , EntityLinkingEngine)
> >> >    -
> >> >
> >> >
> >> > *Sample Text:*
> >> >
> >> > The Apple Inc. CEO Tim Cook spoke at dev conference. The Apple Inc.
> has
> >> > headquarter in US. It's ticker symbol is AAPL, which trades on NASDAQ.
> >> >
> >> > On Mon, May 25, 2015 at 12:04 AM, Rajan Shah <ra...@gmail.com>
> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> In order to use Clerezza Yard setup, I tried very simple example
> >> outlined
> >> >> at the end.
> >> >>
> >> >> I would really appreciate, if someone can shed some light on
> >> >>
> >> >> a. Is there anything I am just completely missing here pertaining to
> >> >> "Named Graph" vs "Unions of Graphs" and reference? If that's the
> case,
> >> >> could you please clarify what would be relevant URI/IRI?
> >> >>
> >> >> b. What is the best way to debug such an issue? If SPARQL query
> fails,
> >> >> where should I see the logs indicating any issue as it doesn't
> appear in
> >> >> stdout logs?
> >> >>
> >> >> c. Is there any other simple alternative compare to this to achieve
> >> >> similar functionality? Is storing in Kiwi beneficial compared to this
> >> >> approach or do I have to have Apache Maramotta installed in order to
> use
> >> >> Kiwi?
> >> >>
> >> >> Thanks in advance,
> >> >> Rajan
> >> >>
> >> >>
> >> >> *1. Apache Stanbol Entityhub Yard: Clerezza Yard Configuration*
> >> >>
> >> >> Set following parameters
> >> >>
> >> >> ID: testYard
> >> >> Graph URI: http://test.io/ns/friends#
> >> >>
> >> >> *2. Setup Clerezza - SCB Jena TDB Storage Provider*
> >> >>
> >> >> Jena TDB directory: /<stanbol_dir>/<tdb_store>
> >> >> Default Graph Name: http://test.io/ns
> >> >> Weight: 105
> >> >>
> >> >> *3. Save the .ttl file into /<stanbol_dir>/<tdb_store>*
> >> >>
> >> >> @prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
> >> >> @prefix rdfa: <http://www.w3.org/ns/rdfa#> .
> >> >> @prefix friends: <http://test.io/ns/friends#> .
> >> >>
> >> >> <http://test.io/ns/friends#AndrewSmith> a vcard:Individual;
> >> >>                     vcard:fn "Andrew Smith";
> >> >>                     vcard:title "Founder";
> >> >>                     vcard:org "ABC LLC";
> >> >>                     vcard:orgunit "Startup";
> >> >>                     vcard:hasAddress [
> >> >>                                         a vcard:Work;
> >> >>                                         vcard:country-name "USA";
> >> >>                                         vcard:locality "New York";
> >> >>                                         vcard:region "New York"
> >> >>                     ] .
> >> >>
> >> >> *4. I do see that, upon startup, it creates necessary index files
> >> within *
> >> >> /<stanbol_dir>/<tdb_store>
> >> >> directory. In addition, within UI, it also registers following
> >> >> TripleCollections in SPARQL Endpoint
> >> >>
> >> >> http://test.io/ns/friends#
> >> >>
> >> >> *5. SPARQL Query*
> >> >> -- query1
> >> >> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
> >> >> PREFIX friends: <http://test.io/ns/friends#>
> >> >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> >> >>
> >> >> SELECT ?fn ?title ?org
> >> >> WHERE {
> >> >>   ?s vcard:fn ?fn ;
> >> >>     vcard:title ?title ;
> >> >>     vcard:org ?org .
> >> >> }
> >> >>
> >> >> OR
> >> >>
> >> >> -- query2
> >> >> PREFIX hmgr: <http://test.io/ns/friends#>
> >> >> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
> >> >>
> >> >> SELECT ?Individual ?title
> >> >> WHERE { ?title  vcard:title  "Founder" }
> >> >>
> >> >>
> >> >> *Observations:*
> >> >>
> >> >> The above queries work perfectly fine on either command-line or Jena
> >> Fuseki
> >> >> as follows
> >> >>
> >> >> a. tdbquery --loc /<stanbol_dir/<tdb_store> --query query1
> >> >> b. using fuseki user interface
> >> >>
> >> >> I tried couple alternatives such as GRAPH, NAMED, etc... however
> nothing
> >> >> helps. Is there any specific syntax need to be used for the SPARQL
> >> stanbol
> >> >> interface?
> >> >>
> >> >>
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                              ++43-699-11108907
> >> | A-5500 Bischofshofen
> >> | REDLINK.CO
> >>
> ..........................................................................
> >> | http://redlink.co/
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                              ++43-699-11108907
> | A-5500 Bischofshofen
> | REDLINK.CO
> ..........................................................................
> | http://redlink.co/
>

Re: Clerezza Yard setup and SPARQL

Posted by Rupert Westenthaler <ru...@gmail.com>.

Hi,

The SolrYard does not support BNodes and the VCard RDF tends to use those.

If you use the Entityhub Indexing Tool for importing the data you can
try to set the "bnode-prefix" for the rdf indexing source (see
STANBOL-765 [1] for details)

best
Rupert

[1] https://issues.apache.org/jira/browse/STANBOL-765

On Tue, Jun 2, 2015 at 6:13 PM, Rajan Shah <ra...@gmail.com> wrote:
> Hi Rupert,
>
> Thanks again for the response.
>
> At present, it's just an observation that mainly with vcard I had issue
> with queries. At the same time, I could get results with either custom
> entities or even foaf.
>
> I will keep an eye on it and if observe it again, will submit JIRA issue.
>
> With best regards,
> Rajan
>
>
>
> On Tue, Jun 2, 2015 at 11:03 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi Rajan,
>>
>> Sorry I do not have enough time for a detailed answer. But the
>> baseline is. EntityLinking does not work with the Clerezza Yard. Even
>> if you would not encounter errors both performance and results would
>> be much worse as with a SolrYard. This is because EntityLinking
>> depends on features that are Solr Exclusive (e.g. the Solr Analyzers
>> doing Stemming ... and the ranking of query results).
>>
>> If you find failing SPARQL queries in the log feel free to report as
>> Issues in Jira. I will have a look.
>>
>> best
>> Rupert
>>
>> On Sat, May 30, 2015 at 11:14 PM, Rajan Shah <ra...@gmail.com> wrote:
>> > Hi,
>> >
>> > I can create Clerezza Yard successfully and query the data using SPARQL.
>> Now,
>> > when it comes to Named Entity Recognition the same issue persists.
>> >
>> > I would appreciate, if someone can provide some insight or potential
>> > resolution.
>> >
>> > Thanks in advance,
>> > Rajan
>> >
>> > These are the steps I followed:
>> >
>> > 1. Uploaded relevant ontology to local ontonet
>> >
>> > 2. Created Managed Site, uploaded triples
>> >
>> > 3. Verified the data exists via SPARQL query:
>> >
>> > <binding>
>> > <result>
>> > <binding name="ticker"><literal>AAPL</literal>
>> > </binding><binding name="issuer"><literal>Apple Inc.</literal>
>> > </binding><binding name="exchange"><literal>NASDAQ</literal></binding>
>> > <binding name="currency"><literal>USD</literal>
>> > </binding><binding name="instr">
>> > <uri>http://finance.intellimind.io/secmaster/djia/AAPL</uri>
>> > </binding>
>> > </result>
>> > </results></sparql>
>> >
>> > 4. Entityhub Linking
>> >
>> > Assuming prefix imind being http://finance.intellimind.io/secmaster (so
>> > that namespace prefix can be verified)
>> >
>> > In the entityhub linking setup, within type mapping I am trying to map
>> >
>> > a. Type Mapping Setup
>> > imind:ticker > rdfs:label
>> > imind:exchange > rdfs:label
>> > ...
>> >
>> > b. Select "Case Sensitivity"
>> >
>> >
>> > 5. Chain setup
>> >
>> > When included it in the list chain, it doesn't capture single entity
>> > whereas it spent most of the time in this paricular chain.
>> >
>> >
>> >
>> >    - *tika* ( optional , TikaEngine)
>> >    - *langdetect* ( required , LanguageDetectionEnhancementEngine)
>> >    - *opennlp-sentence* ( required , OpenNlpSentenceDetectionEngine)
>> >    - *opennlp-token* ( required , OpenNlpTokenizerEngine)
>> >    - *opennlp-pos* ( required , OpenNlpPosTaggingEngine)
>> >    - *opennlp-ner* ( required , NamedEntityExtractionEnhancementEngine)
>> >    - *refdata-linking* ( required , EntityLinkingEngine)
>> >    -
>> >
>> >
>> > *Sample Text:*
>> >
>> > The Apple Inc. CEO Tim Cook spoke at dev conference. The Apple Inc. has
>> > headquarter in US. It's ticker symbol is AAPL, which trades on NASDAQ.
>> >
>> > On Mon, May 25, 2015 at 12:04 AM, Rajan Shah <ra...@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> In order to use Clerezza Yard setup, I tried very simple example
>> outlined
>> >> at the end.
>> >>
>> >> I would really appreciate, if someone can shed some light on
>> >>
>> >> a. Is there anything I am just completely missing here pertaining to
>> >> "Named Graph" vs "Unions of Graphs" and reference? If that's the case,
>> >> could you please clarify what would be relevant URI/IRI?
>> >>
>> >> b. What is the best way to debug such an issue? If SPARQL query fails,
>> >> where should I see the logs indicating any issue as it doesn't appear in
>> >> stdout logs?
>> >>
>> >> c. Is there any other simple alternative compare to this to achieve
>> >> similar functionality? Is storing in Kiwi beneficial compared to this
>> >> approach or do I have to have Apache Maramotta installed in order to use
>> >> Kiwi?
>> >>
>> >> Thanks in advance,
>> >> Rajan
>> >>
>> >>
>> >> *1. Apache Stanbol Entityhub Yard: Clerezza Yard Configuration*
>> >>
>> >> Set following parameters
>> >>
>> >> ID: testYard
>> >> Graph URI: http://test.io/ns/friends#
>> >>
>> >> *2. Setup Clerezza - SCB Jena TDB Storage Provider*
>> >>
>> >> Jena TDB directory: /<stanbol_dir>/<tdb_store>
>> >> Default Graph Name: http://test.io/ns
>> >> Weight: 105
>> >>
>> >> *3. Save the .ttl file into /<stanbol_dir>/<tdb_store>*
>> >>
>> >> @prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
>> >> @prefix rdfa: <http://www.w3.org/ns/rdfa#> .
>> >> @prefix friends: <http://test.io/ns/friends#> .
>> >>
>> >> <http://test.io/ns/friends#AndrewSmith> a vcard:Individual;
>> >>                     vcard:fn "Andrew Smith";
>> >>                     vcard:title "Founder";
>> >>                     vcard:org "ABC LLC";
>> >>                     vcard:orgunit "Startup";
>> >>                     vcard:hasAddress [
>> >>                                         a vcard:Work;
>> >>                                         vcard:country-name "USA";
>> >>                                         vcard:locality "New York";
>> >>                                         vcard:region "New York"
>> >>                     ] .
>> >>
>> >> *4. I do see that, upon startup, it creates necessary index files
>> within *
>> >> /<stanbol_dir>/<tdb_store>
>> >> directory. In addition, within UI, it also registers following
>> >> TripleCollections in SPARQL Endpoint
>> >>
>> >> http://test.io/ns/friends#
>> >>
>> >> *5. SPARQL Query*
>> >> -- query1
>> >> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
>> >> PREFIX friends: <http://test.io/ns/friends#>
>> >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>> >>
>> >> SELECT ?fn ?title ?org
>> >> WHERE {
>> >>   ?s vcard:fn ?fn ;
>> >>     vcard:title ?title ;
>> >>     vcard:org ?org .
>> >> }
>> >>
>> >> OR
>> >>
>> >> -- query2
>> >> PREFIX hmgr: <http://test.io/ns/friends#>
>> >> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
>> >>
>> >> SELECT ?Individual ?title
>> >> WHERE { ?title  vcard:title  "Founder" }
>> >>
>> >>
>> >> *Observations:*
>> >>
>> >> The above queries work perfectly fine on either command-line or Jena
>> Fuseki
>> >> as follows
>> >>
>> >> a. tdbquery --loc /<stanbol_dir/<tdb_store> --query query1
>> >> b. using fuseki user interface
>> >>
>> >> I tried couple alternatives such as GRAPH, NAMED, etc... however nothing
>> >> helps. Is there any specific syntax need to be used for the SPARQL
>> stanbol
>> >> interface?
>> >>
>> >>
>> >>
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                              ++43-699-11108907
>> | A-5500 Bischofshofen
>> | REDLINK.CO
>> ..........................................................................
>> | http://redlink.co/
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/

Re: Clerezza Yard setup and SPARQL

Posted by Rajan Shah <ra...@gmail.com>.

Hi Rupert,

Thanks again for the response.

At present, it's just an observation that mainly with vcard I had issue
with queries. At the same time, I could get results with either custom
entities or even foaf.

I will keep an eye on it and if observe it again, will submit JIRA issue.

With best regards,
Rajan



On Tue, Jun 2, 2015 at 11:03 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Rajan,
>
> Sorry I do not have enough time for a detailed answer. But the
> baseline is. EntityLinking does not work with the Clerezza Yard. Even
> if you would not encounter errors both performance and results would
> be much worse as with a SolrYard. This is because EntityLinking
> depends on features that are Solr Exclusive (e.g. the Solr Analyzers
> doing Stemming ... and the ranking of query results).
>
> If you find failing SPARQL queries in the log feel free to report as
> Issues in Jira. I will have a look.
>
> best
> Rupert
>
> On Sat, May 30, 2015 at 11:14 PM, Rajan Shah <ra...@gmail.com> wrote:
> > Hi,
> >
> > I can create Clerezza Yard successfully and query the data using SPARQL.
> Now,
> > when it comes to Named Entity Recognition the same issue persists.
> >
> > I would appreciate, if someone can provide some insight or potential
> > resolution.
> >
> > Thanks in advance,
> > Rajan
> >
> > These are the steps I followed:
> >
> > 1. Uploaded relevant ontology to local ontonet
> >
> > 2. Created Managed Site, uploaded triples
> >
> > 3. Verified the data exists via SPARQL query:
> >
> > <binding>
> > <result>
> > <binding name="ticker"><literal>AAPL</literal>
> > </binding><binding name="issuer"><literal>Apple Inc.</literal>
> > </binding><binding name="exchange"><literal>NASDAQ</literal></binding>
> > <binding name="currency"><literal>USD</literal>
> > </binding><binding name="instr">
> > <uri>http://finance.intellimind.io/secmaster/djia/AAPL</uri>
> > </binding>
> > </result>
> > </results></sparql>
> >
> > 4. Entityhub Linking
> >
> > Assuming prefix imind being http://finance.intellimind.io/secmaster (so
> > that namespace prefix can be verified)
> >
> > In the entityhub linking setup, within type mapping I am trying to map
> >
> > a. Type Mapping Setup
> > imind:ticker > rdfs:label
> > imind:exchange > rdfs:label
> > ...
> >
> > b. Select "Case Sensitivity"
> >
> >
> > 5. Chain setup
> >
> > When included it in the list chain, it doesn't capture single entity
> > whereas it spent most of the time in this paricular chain.
> >
> >
> >
> >    - *tika* ( optional , TikaEngine)
> >    - *langdetect* ( required , LanguageDetectionEnhancementEngine)
> >    - *opennlp-sentence* ( required , OpenNlpSentenceDetectionEngine)
> >    - *opennlp-token* ( required , OpenNlpTokenizerEngine)
> >    - *opennlp-pos* ( required , OpenNlpPosTaggingEngine)
> >    - *opennlp-ner* ( required , NamedEntityExtractionEnhancementEngine)
> >    - *refdata-linking* ( required , EntityLinkingEngine)
> >    -
> >
> >
> > *Sample Text:*
> >
> > The Apple Inc. CEO Tim Cook spoke at dev conference. The Apple Inc. has
> > headquarter in US. It's ticker symbol is AAPL, which trades on NASDAQ.
> >
> > On Mon, May 25, 2015 at 12:04 AM, Rajan Shah <ra...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> In order to use Clerezza Yard setup, I tried very simple example
> outlined
> >> at the end.
> >>
> >> I would really appreciate, if someone can shed some light on
> >>
> >> a. Is there anything I am just completely missing here pertaining to
> >> "Named Graph" vs "Unions of Graphs" and reference? If that's the case,
> >> could you please clarify what would be relevant URI/IRI?
> >>
> >> b. What is the best way to debug such an issue? If SPARQL query fails,
> >> where should I see the logs indicating any issue as it doesn't appear in
> >> stdout logs?
> >>
> >> c. Is there any other simple alternative compare to this to achieve
> >> similar functionality? Is storing in Kiwi beneficial compared to this
> >> approach or do I have to have Apache Maramotta installed in order to use
> >> Kiwi?
> >>
> >> Thanks in advance,
> >> Rajan
> >>
> >>
> >> *1. Apache Stanbol Entityhub Yard: Clerezza Yard Configuration*
> >>
> >> Set following parameters
> >>
> >> ID: testYard
> >> Graph URI: http://test.io/ns/friends#
> >>
> >> *2. Setup Clerezza - SCB Jena TDB Storage Provider*
> >>
> >> Jena TDB directory: /<stanbol_dir>/<tdb_store>
> >> Default Graph Name: http://test.io/ns
> >> Weight: 105
> >>
> >> *3. Save the .ttl file into /<stanbol_dir>/<tdb_store>*
> >>
> >> @prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
> >> @prefix rdfa: <http://www.w3.org/ns/rdfa#> .
> >> @prefix friends: <http://test.io/ns/friends#> .
> >>
> >> <http://test.io/ns/friends#AndrewSmith> a vcard:Individual;
> >>                     vcard:fn "Andrew Smith";
> >>                     vcard:title "Founder";
> >>                     vcard:org "ABC LLC";
> >>                     vcard:orgunit "Startup";
> >>                     vcard:hasAddress [
> >>                                         a vcard:Work;
> >>                                         vcard:country-name "USA";
> >>                                         vcard:locality "New York";
> >>                                         vcard:region "New York"
> >>                     ] .
> >>
> >> *4. I do see that, upon startup, it creates necessary index files
> within *
> >> /<stanbol_dir>/<tdb_store>
> >> directory. In addition, within UI, it also registers following
> >> TripleCollections in SPARQL Endpoint
> >>
> >> http://test.io/ns/friends#
> >>
> >> *5. SPARQL Query*
> >> -- query1
> >> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
> >> PREFIX friends: <http://test.io/ns/friends#>
> >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> >>
> >> SELECT ?fn ?title ?org
> >> WHERE {
> >>   ?s vcard:fn ?fn ;
> >>     vcard:title ?title ;
> >>     vcard:org ?org .
> >> }
> >>
> >> OR
> >>
> >> -- query2
> >> PREFIX hmgr: <http://test.io/ns/friends#>
> >> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
> >>
> >> SELECT ?Individual ?title
> >> WHERE { ?title  vcard:title  "Founder" }
> >>
> >>
> >> *Observations:*
> >>
> >> The above queries work perfectly fine on either command-line or Jena
> Fuseki
> >> as follows
> >>
> >> a. tdbquery --loc /<stanbol_dir/<tdb_store> --query query1
> >> b. using fuseki user interface
> >>
> >> I tried couple alternatives such as GRAPH, NAMED, etc... however nothing
> >> helps. Is there any specific syntax need to be used for the SPARQL
> stanbol
> >> interface?
> >>
> >>
> >>
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                              ++43-699-11108907
> | A-5500 Bischofshofen
> | REDLINK.CO
> ..........................................................................
> | http://redlink.co/
>