You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Olivier Rossel <ol...@gmail.com> on 2012/12/17 12:41:42 UTC

Matching literals of unknown langs

Hello.

The SPARQL spec says:
"Florence" is not the same RDF literal as "Florence"@fr

To illustrate that, I have tried these queries on the french dbPedia:

##### This one returns no result! ######
SELECT  ?Locality WHERE {
	BIND ("Florence" AS ?Label)
	SERVICE <http://fr.dbpedia.org/sparql>{
		?Locality <http://www.w3.org/2000/01/rdf-schema#label> ?Label .
	 }
 }


##### This one returns 2 results! ######
SELECT  ?Locality WHERE {
	BIND ("Florence"@fr AS ?Label)
	SERVICE <http://fr.dbpedia.org/sparql>{
		?Locality <http://www.w3.org/2000/01/rdf-schema#label> ?Label .
	 }
 }

Obviously, we can conclude that the french dbPedia encodes its
literals with @fr lang.


Now I have to federate-query an italian dataset and the french dbPedia.

As seen above, my french dbPedia contains this literal: "Florence"@fr
My italian dataset contains this literal: "Florence" (with no lang tag).

Here is the federated query:
SELECT DISTINCT ?LocalityITA ?LocalityFR WHERE {
 SERVICE <http://91.121.14.47:6665/sparql/> {
    ?Address <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/2006/vcard/ns#Address> .
    ?Address <http://www.w3.org/2006/vcard/ns#locality> ?LocalityITA .
    ?LocalityITA <http://www.w3.org/2000/01/rdf-schema#label> ?LabelITA .
 }
 BIND (strbefore(?LabelITA, "(") AS ?Label)
 SERVICE <http://fr.dbpedia.org/sparql>{
    ?LocalityFR <http://www.w3.org/2000/01/rdf-schema#label> ?Label
 }
}

How can I tune the query so the literal matching works across lang tags?
Thanks for your help.

Re: Matching literals of unknown langs

Posted by Olivier Rossel <ol...@gmail.com>.
Le 18 déc. 2012 à 13:02, Andy Seaborne <an...@apache.org> a écrit :

> On 17/12/12 22:04, Olivier Rossel wrote:
>> Thqt sounds like q
>> 
>> On Mon, Dec 17, 2012 at 10:03 PM, Andy Seaborne <an...@apache.org> wrote:
>>> On 17/12/12 11:41, Olivier Rossel wrote:
>>>> 
>>>> Hello.
>>>> 
>>>> The SPARQL spec says:
>>>> "Florence" is not the same RDF literal as "Florence"@fr
>>> 
>>> 
>>> and it's just repeating RDF.
>>> 
>>> ...
>>> 
>>> 
>>>> Now I have to federate-query an italian dataset and the french dbPedia.
>>>> 
>>>> As seen above, my french dbPedia contains this literal: "Florence"@fr
>>>> My italian dataset contains this literal: "Florence" (with no lang tag).
>>>> 
>>>> Here is the federated query:
>>>> SELECT DISTINCT ?LocalityITA ?LocalityFR WHERE {
>>>> SERVICE <http://91.121.14.47:6665/sparql/> {
>>>>    ?Address <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://www.w3.org/2006/vcard/ns#Address> .
>>>>    ?Address <http://www.w3.org/2006/vcard/ns#locality> ?LocalityITA .
>>>>    ?LocalityITA <http://www.w3.org/2000/01/rdf-schema#label> ?LabelITA .
>>>> }
>>>> BIND (strbefore(?LabelITA, "(") AS ?Label)
>>>> SERVICE <http://fr.dbpedia.org/sparql>{
>>>>    ?LocalityFR <http://www.w3.org/2000/01/rdf-schema#label> ?Label
>>>> }
>>>> }
>>> 
>>> 
>>> Reformatted:
>>> 
>>> 
>>> SELECT DISTINCT  ?LocalityITA ?LocalityFR
>>> WHERE
>>> { SERVICE <http://91.121.14.47:6665/sparql/>
>>>     { ?Address rdf:type vc:Address .
>>>       ?Address vc:locality ?LocalityITA .
>>>       ?LocalityITA rdfs:label ?LabelITA
>>>     }
>>>   BIND(strbefore(?LabelITA, "(") AS ?Label)
>>>   SERVICE <http://fr.dbpedia.org/sparql>
>>>     { ?LocalityFR rdfs:label ?Label }
>>> }
>>> 
>>> 
>>> You can use str() to get just the lexical form:
>>> 
>>> 
>>>> 
>>>> How can I tune the query so the literal matching works across lang tags?
>>>> Thanks for your help.
>>> 
>>> 
>>> You could canonicalise to the simple literal in each SERVICE
>>> 
>>> SERVICE <....> {
>>>  ....
>>>  ?Address vc:locality ?l .
>>>   BIND(str(?l) AS ?locality)
>>> }
>>> 
>>>> 
>>> 
>>> Except fr.dbpedia.org/Virtuoso does not support BIND.
>>> 
>>> You can get the same effect with a subquery:
>>> 
>>> SERVICE <....> {
>>>  SELECT (str(?l as ?locality)
>>>  { ...
>>>     ?Address vc:locality ?l .
>>>  }
>>> }
>>> 
>>> Now you have ?locality without a language tag and can use it as the
>>> canonical term (if yoru app thinks that's safe enough).
>>> 
>>>       Andy
>>> 
>> 
>> Ok for the first SERVICE<..> block : str(...) binds a "raw string"
>> into ?locality.
>> Now the seconde SERVICE<...> block:
>> Is a strlang(..., "fr") required to match ?locality against the @fr strings ???
>> Like this:
>> 
>> SERVICE <myItalianData> {
>>   SELECT (str(?l) as ?locality)
>>   {
>>      ?Address vc:locality ?l .
>>   }
>> }
>> SERVICE <fr.dbpedia.org> {
>>   SELECT (?LocalityFR)
>>   {
>>      ?LocalityFR rdfs:label strlang(?locality,"fr") .
>>   }
>> }
>> 
>> Or is the ?locality variable bound in a way that says "i am a raw
>> string, compare me without taking lang-tag into account"?
>> 
>> As usual, thanks for your help, Andy.
>> 
> 
> ?locality is not passed from the first SERVICE to the second - evaluation is bottom up and (logically) each SERVICE is evaluted and then the results combined in the client.
> 
> So I suggest making the SERVICE calls extract the lexical form and then
> equate them (via a join) in the client by using ?locality as the output of each SERVICE call.
> 
>  Andy

I use the basic federation of Jena.
And my query runs reasonnably fast between a 4store and dbpedia.org (something like 1 minute for something like 900 results).
Given the fact that the second SERVICE<...> block is insane to resolve by itself (give me all the labels of dbpedia then i will join them with the ones from block 1), i suppose some magic optimization is at work during its evaluation. 

FYI, my final query has a BIND inbetween the two SERVICE<...> blocks:

SERVICE <http://91.121.14.47:6665/sparql/> {
 SELECT DISTINCT ?LocalityITA  ?LabelITA WHERE {
   ?Address <vc:locality> ?LocalityITA .
   ?LocalityITA <rdfs:label> ?LabelITA . 
 }}

BIND (strlang(str(?LabelITA),"en") as ?LabelEN)

SERVICE <http://dbpedia.org/sparql>{
   ?Locality <rdfs:label> ?LabelEN 
}

Could you explain how basic federation  works in that case?
I was pretty sure basic federation was resolving the SERVICE<...> blocks first-to-last ?



Re: Matching literals of unknown langs

Posted by Andy Seaborne <an...@apache.org>.
On 17/12/12 22:04, Olivier Rossel wrote:
> Thqt sounds like q
>
> On Mon, Dec 17, 2012 at 10:03 PM, Andy Seaborne <an...@apache.org> wrote:
>> On 17/12/12 11:41, Olivier Rossel wrote:
>>>
>>> Hello.
>>>
>>> The SPARQL spec says:
>>> "Florence" is not the same RDF literal as "Florence"@fr
>>
>>
>> and it's just repeating RDF.
>>
>> ...
>>
>>
>>> Now I have to federate-query an italian dataset and the french dbPedia.
>>>
>>> As seen above, my french dbPedia contains this literal: "Florence"@fr
>>> My italian dataset contains this literal: "Florence" (with no lang tag).
>>>
>>> Here is the federated query:
>>> SELECT DISTINCT ?LocalityITA ?LocalityFR WHERE {
>>>    SERVICE <http://91.121.14.47:6665/sparql/> {
>>>       ?Address <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> <http://www.w3.org/2006/vcard/ns#Address> .
>>>       ?Address <http://www.w3.org/2006/vcard/ns#locality> ?LocalityITA .
>>>       ?LocalityITA <http://www.w3.org/2000/01/rdf-schema#label> ?LabelITA .
>>>    }
>>>    BIND (strbefore(?LabelITA, "(") AS ?Label)
>>>    SERVICE <http://fr.dbpedia.org/sparql>{
>>>       ?LocalityFR <http://www.w3.org/2000/01/rdf-schema#label> ?Label
>>>    }
>>> }
>>
>>
>> Reformatted:
>>
>>
>> SELECT DISTINCT  ?LocalityITA ?LocalityFR
>> WHERE
>>    { SERVICE <http://91.121.14.47:6665/sparql/>
>>        { ?Address rdf:type vc:Address .
>>          ?Address vc:locality ?LocalityITA .
>>          ?LocalityITA rdfs:label ?LabelITA
>>        }
>>      BIND(strbefore(?LabelITA, "(") AS ?Label)
>>      SERVICE <http://fr.dbpedia.org/sparql>
>>        { ?LocalityFR rdfs:label ?Label }
>>    }
>>
>>
>> You can use str() to get just the lexical form:
>>
>>
>>>
>>> How can I tune the query so the literal matching works across lang tags?
>>> Thanks for your help.
>>
>>
>> You could canonicalise to the simple literal in each SERVICE
>>
>> SERVICE <....> {
>>     ....
>>     ?Address vc:locality ?l .
>>      BIND(str(?l) AS ?locality)
>> }
>>
>>>
>>
>> Except fr.dbpedia.org/Virtuoso does not support BIND.
>>
>> You can get the same effect with a subquery:
>>
>> SERVICE <....> {
>>     SELECT (str(?l as ?locality)
>>     { ...
>>        ?Address vc:locality ?l .
>>     }
>> }
>>
>> Now you have ?locality without a language tag and can use it as the
>> canonical term (if yoru app thinks that's safe enough).
>>
>>          Andy
>>
>
> Ok for the first SERVICE<..> block : str(...) binds a "raw string"
> into ?locality.
> Now the seconde SERVICE<...> block:
> Is a strlang(..., "fr") required to match ?locality against the @fr strings ???
> Like this:
>
> SERVICE <myItalianData> {
>      SELECT (str(?l) as ?locality)
>      {
>         ?Address vc:locality ?l .
>      }
>   }
> SERVICE <fr.dbpedia.org> {
>      SELECT (?LocalityFR)
>      {
>         ?LocalityFR rdfs:label strlang(?locality,"fr") .
>      }
>   }
>
> Or is the ?locality variable bound in a way that says "i am a raw
> string, compare me without taking lang-tag into account"?
>
> As usual, thanks for your help, Andy.
>

?locality is not passed from the first SERVICE to the second - 
evaluation is bottom up and (logically) each SERVICE is evaluted and 
then the results combined in the client.

So I suggest making the SERVICE calls extract the lexical form and then
equate them (via a join) in the client by using ?locality as the output 
of each SERVICE call.

	Andy



Re: Matching literals of unknown langs

Posted by Olivier Rossel <ol...@gmail.com>.
Thqt sounds like q

On Mon, Dec 17, 2012 at 10:03 PM, Andy Seaborne <an...@apache.org> wrote:
> On 17/12/12 11:41, Olivier Rossel wrote:
>>
>> Hello.
>>
>> The SPARQL spec says:
>> "Florence" is not the same RDF literal as "Florence"@fr
>
>
> and it's just repeating RDF.
>
> ...
>
>
>> Now I have to federate-query an italian dataset and the french dbPedia.
>>
>> As seen above, my french dbPedia contains this literal: "Florence"@fr
>> My italian dataset contains this literal: "Florence" (with no lang tag).
>>
>> Here is the federated query:
>> SELECT DISTINCT ?LocalityITA ?LocalityFR WHERE {
>>   SERVICE <http://91.121.14.47:6665/sparql/> {
>>      ?Address <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.w3.org/2006/vcard/ns#Address> .
>>      ?Address <http://www.w3.org/2006/vcard/ns#locality> ?LocalityITA .
>>      ?LocalityITA <http://www.w3.org/2000/01/rdf-schema#label> ?LabelITA .
>>   }
>>   BIND (strbefore(?LabelITA, "(") AS ?Label)
>>   SERVICE <http://fr.dbpedia.org/sparql>{
>>      ?LocalityFR <http://www.w3.org/2000/01/rdf-schema#label> ?Label
>>   }
>> }
>
>
> Reformatted:
>
>
> SELECT DISTINCT  ?LocalityITA ?LocalityFR
> WHERE
>   { SERVICE <http://91.121.14.47:6665/sparql/>
>       { ?Address rdf:type vc:Address .
>         ?Address vc:locality ?LocalityITA .
>         ?LocalityITA rdfs:label ?LabelITA
>       }
>     BIND(strbefore(?LabelITA, "(") AS ?Label)
>     SERVICE <http://fr.dbpedia.org/sparql>
>       { ?LocalityFR rdfs:label ?Label }
>   }
>
>
> You can use str() to get just the lexical form:
>
>
>>
>> How can I tune the query so the literal matching works across lang tags?
>> Thanks for your help.
>
>
> You could canonicalise to the simple literal in each SERVICE
>
> SERVICE <....> {
>    ....
>    ?Address vc:locality ?l .
>     BIND(str(?l) AS ?locality)
> }
>
>>
>
> Except fr.dbpedia.org/Virtuoso does not support BIND.
>
> You can get the same effect with a subquery:
>
> SERVICE <....> {
>    SELECT (str(?l as ?locality)
>    { ...
>       ?Address vc:locality ?l .
>    }
> }
>
> Now you have ?locality without a language tag and can use it as the
> canonical term (if yoru app thinks that's safe enough).
>
>         Andy
>

Ok for the first SERVICE<..> block : str(...) binds a "raw string"
into ?locality.
Now the seconde SERVICE<...> block:
Is a strlang(..., "fr") required to match ?locality against the @fr strings ???
Like this:

SERVICE <myItalianData> {
    SELECT (str(?l) as ?locality)
    {
       ?Address vc:locality ?l .
    }
 }
SERVICE <fr.dbpedia.org> {
    SELECT (?LocalityFR)
    {
       ?LocalityFR rdfs:label strlang(?locality,"fr") .
    }
 }

Or is the ?locality variable bound in a way that says "i am a raw
string, compare me without taking lang-tag into account"?

As usual, thanks for your help, Andy.

Re: Matching literals of unknown langs

Posted by Andy Seaborne <an...@apache.org>.
On 17/12/12 11:41, Olivier Rossel wrote:
> Hello.
>
> The SPARQL spec says:
> "Florence" is not the same RDF literal as "Florence"@fr

and it's just repeating RDF.

...

> Now I have to federate-query an italian dataset and the french dbPedia.
>
> As seen above, my french dbPedia contains this literal: "Florence"@fr
> My italian dataset contains this literal: "Florence" (with no lang tag).
>
> Here is the federated query:
> SELECT DISTINCT ?LocalityITA ?LocalityFR WHERE {
>   SERVICE <http://91.121.14.47:6665/sparql/> {
>      ?Address <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://www.w3.org/2006/vcard/ns#Address> .
>      ?Address <http://www.w3.org/2006/vcard/ns#locality> ?LocalityITA .
>      ?LocalityITA <http://www.w3.org/2000/01/rdf-schema#label> ?LabelITA .
>   }
>   BIND (strbefore(?LabelITA, "(") AS ?Label)
>   SERVICE <http://fr.dbpedia.org/sparql>{
>      ?LocalityFR <http://www.w3.org/2000/01/rdf-schema#label> ?Label
>   }
> }

Reformatted:

SELECT DISTINCT  ?LocalityITA ?LocalityFR
WHERE
   { SERVICE <http://91.121.14.47:6665/sparql/>
       { ?Address rdf:type vc:Address .
         ?Address vc:locality ?LocalityITA .
         ?LocalityITA rdfs:label ?LabelITA
       }
     BIND(strbefore(?LabelITA, "(") AS ?Label)
     SERVICE <http://fr.dbpedia.org/sparql>
       { ?LocalityFR rdfs:label ?Label }
   }


You can use str() to get just the lexical form:

>
> How can I tune the query so the literal matching works across lang tags?
> Thanks for your help.

You could canonicalise to the simple literal in each SERVICE

SERVICE <....> {
    ....
    ?Address vc:locality ?l .
     BIND(str(?l) AS ?locality)
}

>

Except fr.dbpedia.org/Virtuoso does not support BIND.

You can get the same effect with a subquery:

SERVICE <....> {
    SELECT (str(?l as ?locality)
    { ...
       ?Address vc:locality ?l .
    }
}

Now you have ?locality without a language tag and can use it as the 
canonical term (if yoru app thinks that's safe enough).

	Andy