You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Laura Morales <la...@mail.com> on 2020/12/31 07:50:41 UTC

Is it possible to use UTF8 IRIs in Turtle?

Is there a way to write UTF8 IRIs with Turtle without all the %-encoded characters? I mean like this <alice smith> or <alizè $mith> or ex:"alice smith"? The only way that I know to write those characters is like this <alice%20smith>, ie. by writing the encoded URI myself. Is there any syntax that I can use to write UTF8 characters instead, and have those characters automatically be parsed as IRIs? Like when I type a string in my browser, I type UTF8 but it's automatically url-encoded to a URL?

Re: Is it possible to use UTF8 IRIs in Turtle?

Posted by Jean-Marc Vanel <je...@gmail.com>.
As a practical complement to what Andy wrote,
UTF-8 is just like another character encoding.
If you use a well behaved text editor like gvim
<https://www.vim.org/download.php> , you can set file encoding to utf-8 by
typing
:set fileencoding=utf-8
and the Turtle or SPARQL you type will be OK.

As an exemple, on one Jena based site I run, searching "Corrençon"
generates under the hood an UTF-8 SPARQL query with text search :
http://semantic-forms.cc:1952/search?q=Corren%C3%A7on

As another exemple, this UTF-8 query on http://dbpedia.org/sparql/ runs
fine:
select distinct * where {
?S ?P <http://dbpedia.org/resource/Corren*ç*on-en-Vercors>
}

Jean-Marc Vanel
<http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me>
+33
(0)6 89 16 29 52


Le jeu. 31 déc. 2020 à 08:50, Laura Morales <la...@mail.com> a écrit :

> Is there a way to write UTF8 IRIs with Turtle without all the %-encoded
> characters? I mean like this <alice smith> or <alizè $mith> or ex:"alice
> smith"? The only way that I know to write those characters is like this
> <alice%20smith>, ie. by writing the encoded URI myself. Is there any syntax
> that I can use to write UTF8 characters instead, and have those characters
> automatically be parsed as IRIs? Like when I type a string in my browser, I
> type UTF8 but it's automatically url-encoded to a URL?
>

Re: Is it possible to use UTF8 IRIs in Turtle?

Posted by Andy Seaborne <an...@apache.org>.

On 31/12/2020 07:50, Laura Morales wrote:
> Is there a way to write UTF8 IRIs with Turtle without all the %-encoded characters? I mean like this <alice smith> or <alizè $mith> or ex:"alice smith"? 

Just write the codepoint you want.

Spaces are never legal in IRIs.

https://www.w3.org/TR/turtle/#grammar-production-IRIREF

then it gets checked as a legal IRI.

\uABCD escapes are allowed.

 > The only way that I know to write those characters is like this 
<alice%20smith>, ie. by writing the encoded URI myself.

No.
That puts 3 characters '%'-'2'-'0' into the URI. Not a space.

encode != escape.

 >
  Is there any syntax that I can use to write UTF8 characters instead, 
and have those characters automatically be parsed as IRIs? Like when I 
type a string in my browser, I type UTF8 but it's automatically 
url-encoded to a URL?

If you want to encode, there's code in IRILib.

     Andy