You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by florent andré <fl...@4sengines.com> on 2011/02/14 13:13:07 UTC

Entityhub : find VS query on skos:preflabel

Hi,

First, big thanks Rupert for implementing this !

I have try it today I remark this :
When I use the find endpoint :

$ curl -X POST -d
"name=MOELLE&field=http://www.w3.org/2004/02/skos/core#prefLabel"
http://localh\
ost:8080/entityhub/sites/find
{
    "query": {
        "selected": ["http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"],
        "constraints": [{
            "type": "text",
            "patternType": "wildcard",
            "text": "MOELLE",
            "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
        }],
        "limit": 5
    },
    "results": []

==> I have no answers

When I use the query endpoint :

$ curl -X POST -F "query=@fieldQuery.json"
http://localhost:8080/entityhub/sites/query
{
    "query": {
        "selected": [
            "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type",
            "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label",
            "http:\/\/www.w3.org\/2004\/02\/skos\/core#Concept"
        ],
        "constraints": [{
            "type": "value",
            "value": "MOELLE",
            "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
        }],
        "limit": 30
    },
    "results": [{
        "id": "http:\/\/gasoil.edf.fr\/thesaurus\/0.1\/entree\/31367",
        "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type": [{
            "type": "reference",
            "value": "http:\/\/www.w3.org\/2004\/02\/skos\/core#Concept"
        }]
    }]

==> I have a good result.

I imagine that it's come from the "selected" field in /find that is on
skos:prefLabel and not on skos:concept.

I share with you this remark, without know if it's "normal comportment"
or not.
In any case, we can search on any ns with /query, and that's good ! :)

++

Re: Entityhub : find VS query on skos:preflabel

Posted by Florent André <fl...@apache.org>.
Hi Rupert,


Sorry for time to answer, and thanks for this in(form|vestig)ations.

Here comes result of my test.
I use d2rq 0.7 and the snorql endpoint (http://localhost:8082/snorql)

On 02/14/2011 02:56 PM, Rupert Westenthaler wrote:
> Hi
> 
> Andreas Gruber ask me today, that he would like to use the DBLP
> Bibliography Database via the Stanbol Entityhub. Becuse of that I was
> looking for the Linked Data Endpoint and found http://dblp.l3s.de/d2r/
> hosting this data. They actually use a D2R server to host DBLP.
> 
> I mentioning this, because I had exactly the same problems with
> "/find" requests and did some digging! First investigations of this
> showed the the SPARQL endpoint of the D2R server does have problems
> with REGEX filters.
> 
> You can use the following two queries to test if your problem is of a
> similar nature.
> 
> The first query resembles the query used by the Entityhub for
>>        {
>>            "type": "text",
>>            "patternType": "wildcard",
>>            "text": "MOELLE",
>>            "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
>>        }
> 
> SELECT DISTINCT * WHERE {
>   ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?o
>   FILTER(regex(str(?o),"MOELLE","i"))
> }
> LIMIT 10
> 
> I assume, that you will not get the expected results for this one
> (NOTE: that for my tests on the DBLP dataset it seams to work for some
> search strings, but not for others). You can also try to modify the
> search string e.g. to "MOELLE.*" or something like that, but I had no
> success with tests like that on the DBLP dataset.

==> the query
SELECT DISTINCT * WHERE {
  ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?o
  FILTER(regex(str(?o),"MOELLE","i"))
}
LIMIT 10

==> give me good results, as :
S				O
gasoil:entree/31367 [http]	"MOELLE"

I also try
==> FILTER(regex(str(?o),"M*","i"))
==> FILTER(regex(str(?o),"M.*","i"))
==> FILTER(regex(str(?o),"^M*","i"))
==> FILTER(regex(str(?o),".*M$","i"))

And they all give me goods results... or at least result that match with
the regex rule.

Do you remember with kind of regex don't give you expected results ?

> 
> The next Query resembles the query used by the Entityhub for
> 
>>        {
>>            "type": "value",
>>            "value": "MOELLE",
>>            "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
>>        }
> 
> SELECT DISTINCT * WHERE {
>   ?s <http://www.w3.org/2004/02/skos/core#prefLabel> "MOELLE"
> }
> LIMIT 10
> 
> This should give the expected results. Most likely because is does not
> use an FILTER, but directly checks for the parsed value.

Yes, I have the attended result.

I'm not sure that this answers will be really helpful...

Please keep in touch with your investigations, will be happy to help if
I can.

Great thanks for your corrections.
Cheers


> 
> Currently I have no Idea why this is the case. I need to have a
> detailed look at the documentation of the D2R server. Maybe they use a
> special syntax for full text searches or there are some limitations
> with REGEX filters. It may also be the case, that the REGEX filter
> syntax used by the Entityhub does not confirm with the standard. I
> will do some additional investigations in the coming days.
> 
> Regardless of that I would strongly suggest you to use the second
> variant (that uses a value constraint) because this type of queries
> should have much shorter response times on the D2R server.
> 
> BTW: I noticed, that this queries does include some errors. Here is an
> corrected version:
> {
>     "selected": [
>         "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type",
>         "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
>     ],
>     "constraints": [{
>         "type": "value",
>         "value": "MOELLE",
>         "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
>     }],
>     "limit": 30
> }
> 
> You might also want to select related, broader, and narrower concepts by adding
>     "http://www.w3.org/2004/02/skos/core#related"
>     "http://www.w3.org/2004/02/skos/core#narrower"
>     "http://www.w3.org/2004/02/skos/core#broader"
> to "selected"
> 
> best
> Rupert Westenthaler
> 
> On Mon, Feb 14, 2011 at 1:13 PM, florent andré
> <fl...@4sengines.com> wrote:
>> Hi,
>>
>> First, big thanks Rupert for implementing this !
>>
>> I have try it today I remark this :
>> When I use the find endpoint :
>>
>> $ curl -X POST -d
>> "name=MOELLE&field=http://www.w3.org/2004/02/skos/core#prefLabel"
>> http://localh\
>> ost:8080/entityhub/sites/find
>> {
>>    "query": {
>>        "selected": ["http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"],
>>        "constraints": [{
>>            "type": "text",
>>            "patternType": "wildcard",
>>            "text": "MOELLE",
>>            "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
>>        }],
>>        "limit": 5
>>    },
>>    "results": []
>>
>> ==> I have no answers
>>
>> When I use the query endpoint :
>>
>> $ curl -X POST -F "query=@fieldQuery.json"
>> http://localhost:8080/entityhub/sites/query
>> {
>>    "query": {
>>        "selected": [
>>            "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type",
>>            "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label",
>>            "http:\/\/www.w3.org\/2004\/02\/skos\/core#Concept"
>>        ],
>>        "constraints": [{
>>            "type": "value",
>>            "value": "MOELLE",
>>            "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
>>        }],
>>        "limit": 30
>>    },
>>    "results": [{
>>        "id": "http:\/\/gasoil.edf.fr\/thesaurus\/0.1\/entree\/31367",
>>        "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type": [{
>>            "type": "reference",
>>            "value": "http:\/\/www.w3.org\/2004\/02\/skos\/core#Concept"
>>        }]
>>    }]
>>
>> ==> I have a good result.
>>
>> I imagine that it's come from the "selected" field in /find that is on
>> skos:prefLabel and not on skos:concept.
>>
>> I share with you this remark, without know if it's "normal comportment"
>> or not.
>> In any case, we can search on any ns with /query, and that's good ! :)
>>
>> ++
>>
> 
> 
> 

Re: Entityhub : find VS query on skos:preflabel

Posted by Rupert Westenthaler <rw...@apache.org>.
Hi

Andreas Gruber ask me today, that he would like to use the DBLP
Bibliography Database via the Stanbol Entityhub. Becuse of that I was
looking for the Linked Data Endpoint and found http://dblp.l3s.de/d2r/
hosting this data. They actually use a D2R server to host DBLP.

I mentioning this, because I had exactly the same problems with
"/find" requests and did some digging! First investigations of this
showed the the SPARQL endpoint of the D2R server does have problems
with REGEX filters.

You can use the following two queries to test if your problem is of a
similar nature.

The first query resembles the query used by the Entityhub for
>        {
>            "type": "text",
>            "patternType": "wildcard",
>            "text": "MOELLE",
>            "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
>        }

SELECT DISTINCT * WHERE {
  ?s <http://www.w3.org/2004/02/skos/core#prefLabel> ?o
  FILTER(regex(str(?o),"MOELLE","i"))
}
LIMIT 10

I assume, that you will not get the expected results for this one
(NOTE: that for my tests on the DBLP dataset it seams to work for some
search strings, but not for others). You can also try to modify the
search string e.g. to "MOELLE.*" or something like that, but I had no
success with tests like that on the DBLP dataset.

The next Query resembles the query used by the Entityhub for

>        {
>            "type": "value",
>            "value": "MOELLE",
>            "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
>        }

SELECT DISTINCT * WHERE {
  ?s <http://www.w3.org/2004/02/skos/core#prefLabel> "MOELLE"
}
LIMIT 10

This should give the expected results. Most likely because is does not
use an FILTER, but directly checks for the parsed value.

Currently I have no Idea why this is the case. I need to have a
detailed look at the documentation of the D2R server. Maybe they use a
special syntax for full text searches or there are some limitations
with REGEX filters. It may also be the case, that the REGEX filter
syntax used by the Entityhub does not confirm with the standard. I
will do some additional investigations in the coming days.

Regardless of that I would strongly suggest you to use the second
variant (that uses a value constraint) because this type of queries
should have much shorter response times on the D2R server.

BTW: I noticed, that this queries does include some errors. Here is an
corrected version:
{
    "selected": [
        "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type",
        "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
    ],
    "constraints": [{
        "type": "value",
        "value": "MOELLE",
        "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
    }],
    "limit": 30
}

You might also want to select related, broader, and narrower concepts by adding
    "http://www.w3.org/2004/02/skos/core#related"
    "http://www.w3.org/2004/02/skos/core#narrower"
    "http://www.w3.org/2004/02/skos/core#broader"
to "selected"

best
Rupert Westenthaler

On Mon, Feb 14, 2011 at 1:13 PM, florent andré
<fl...@4sengines.com> wrote:
> Hi,
>
> First, big thanks Rupert for implementing this !
>
> I have try it today I remark this :
> When I use the find endpoint :
>
> $ curl -X POST -d
> "name=MOELLE&field=http://www.w3.org/2004/02/skos/core#prefLabel"
> http://localh\
> ost:8080/entityhub/sites/find
> {
>    "query": {
>        "selected": ["http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"],
>        "constraints": [{
>            "type": "text",
>            "patternType": "wildcard",
>            "text": "MOELLE",
>            "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
>        }],
>        "limit": 5
>    },
>    "results": []
>
> ==> I have no answers
>
> When I use the query endpoint :
>
> $ curl -X POST -F "query=@fieldQuery.json"
> http://localhost:8080/entityhub/sites/query
> {
>    "query": {
>        "selected": [
>            "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type",
>            "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label",
>            "http:\/\/www.w3.org\/2004\/02\/skos\/core#Concept"
>        ],
>        "constraints": [{
>            "type": "value",
>            "value": "MOELLE",
>            "field": "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel"
>        }],
>        "limit": 30
>    },
>    "results": [{
>        "id": "http:\/\/gasoil.edf.fr\/thesaurus\/0.1\/entree\/31367",
>        "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type": [{
>            "type": "reference",
>            "value": "http:\/\/www.w3.org\/2004\/02\/skos\/core#Concept"
>        }]
>    }]
>
> ==> I have a good result.
>
> I imagine that it's come from the "selected" field in /find that is on
> skos:prefLabel and not on skos:concept.
>
> I share with you this remark, without know if it's "normal comportment"
> or not.
> In any case, we can search on any ns with /query, and that's good ! :)
>
> ++
>



-- 
| Rupert Westenthaler                            rwesten@apache.org
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen