You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Andy Seaborne <an...@apache.org> on 2011/10/01 22:48:10 UTC

Re: Suspected Bug(s): dealing with UTF8 IRIs in HTTP Sparql Queries

On 30/09/11 17:08, Alexandru Todor wrote:
> Hi,
>
> Seems to be a Virtuoso issue with the RDF/XML serializer. Both the Greek
> and German endpoints seem to have the garbled data in the XML files and
> this issue only arises when choosing RDF/XML as output. Thanks for the
> tip, I'll report the issue to the Virtuoso devs.

Could you also report that

1/ asking for N-triples does not return N-triples.  It returns something 
Turtle-ish.

2/ The SPARQL XML results has the same encoding problem as RDF/XML.

These have somewhat slowed down the bug hunting.

> There's still the problem with QueryExecutionFactory.sparqlService
> returning no results.

Yes - I found it (in turning queries into strings).  I need to do some 
careful testing to make sure the fix does not break something elsewhere 
that incorrectly depends on the effect.

	Andy



>
> Kind Regards,
> Alexandru
>
> On 09/30/2011 05:33 PM, Andy Seaborne wrote:
>> On 30/09/11 16:17, Alexandru Todor wrote:
>>> Hi,
>>>
>>> I maintain the German language DBpedia endpoint, and have gotten some
>>> mails from users complaining that they don't get any results from the
>>> endpoint when they query for resources like:
>>>
>>> http://de.dbpedia.org/resource/München
>>
>> This message and your message are ISO-8859-1
>>
>> ü = 0xFC in ISO-8859-1 which is the same as a Unicode codepoint and
>> 0xC3 0xBC in UTF-8.
>>
>> I tried http://de.dbpedia.org/resource/München in my browser and got:
>>
>> to http://de.dbpedia.org/data/M%C3%BCnchen.xml
>>
>> which returns:
>>
>> RDF/XML in UTF-8 but it contains e.g. line 3:
>>
>> rdf:resource="http://de.dbpedia.org/resource/München"
>>
>> in Firefox. That looks corrupt to me.
>>
>>> This is the code they sent me:
>>>
>>> String queryString= "SELECT ?o WHERE
>>> {<http://de.dbpedia.org/resource/München>
>>> <http://purl.org/dc/terms/subject> ?o }";
>>> Query query = QueryFactory.create(queryString);
>>> QueryExecution qexec =
>>> QueryExecutionFactory.sparqlService("http://de.dbpedia.org/sparql",
>>> query);
>>> try {
>>> ResultSet results = qexec.execSelect();
>>> for (; results.hasNext();) {
>>> QuerySolution s = results.nextSolution();
>>> System.out.println(s.toString());
>>> }
>>> }
>>> finally {
>>> qexec.close();
>>> }
>>>
>>> I tried the code and it works for any IRI that contains no UTF8 chars
>>> (so only for URIs), but when you have UTF8 chars it returns no result.
>>> I've tried a couple of variations and it returns no result but also
>>> doesn't throw any kind of exception, it's just as if the data wasn't
>>> there.
>>>
>>> Then I proceeded to try an alternative method and used QueryEngineHTTP
>>> to execute the query and it worked. However, QueryEngineHTTP messes up
>>> the UTF8 encoding, so for example in the returned results you get
>>> München instead of München . My guess is that QueryEngineHTTP encodes
>>> the SPARQL results in ISO-8859-1 instead of UTF8, so decoding the
>>> strings as ISO-8859-1 and re-encoding it as UTF8 fixed this.
>>
>> the code seems to do:
>>
>> URLEncoder.encode(s, "UTF-8")
>>
>> but it's still working in strings. Something lower level (Sun
>> networking) does the string to bytes.
>>
>> Andy
>>
>>>
>>> Kind Regards,
>>> Alexandru Todor
>>>
>>> Research Associate
>>> AG Corporate Semantic Web
>>> Freie Universität Berlin
>>>
>>>
>>>
>>>
>>>
>>
>


Re: Suspected Bug(s): dealing with UTF8 IRIs in HTTP Sparql Queries

Posted by Andy Seaborne <an...@apache.org>.
On 01/10/11 21:48, Andy Seaborne wrote:
> On 30/09/11 17:08, Alexandru Todor wrote:
>> Hi,
>>
>> Seems to be a Virtuoso issue with the RDF/XML serializer. Both the Greek
>> and German endpoints seem to have the garbled data in the XML files and
>> this issue only arises when choosing RDF/XML as output. Thanks for the
>> tip, I'll report the issue to the Virtuoso devs.
>
> Could you also report that
>
> 1/ asking for N-triples does not return N-triples. It returns something
> Turtle-ish.
>
> 2/ The SPARQL XML results has the same encoding problem as RDF/XML.
>
> These have somewhat slowed down the bug hunting.
>
>> There's still the problem with QueryExecutionFactory.sparqlService
>> returning no results.
>
> Yes - I found it (in turning queries into strings). I need to do some
> careful testing to make sure the fix does not break something elsewhere
> that incorrectly depends on the effect.
>
> Andy

Fix committed to Apache SVN.

	Andy