You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Umutcan Şimşek <um...@mni.thm.de> on 2015/05/28 19:21:48 UTC
stanbol enhancer/entityhub and german characters
Hello All,
According to N-Triples standart [1], it's not allowed to use Extended
ASCII characters in literals. (refer EBNF)Therefore, when I extract
triples from CMS database, I cannot represent characters like ö ü ä
properly. (I replace it with a bytecode )
Can stanbol process these characters? If I configure NLP modules for
German, is it going to be able to recognize, for instance, the word "Jäger"?
[1] http://www.w3.org/2001/sw/RDFCore/ntriples
Best Regards
Umutcan
Re: stanbol enhancer/entityhub and german characters
Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi,
Stanbol uses the Apache Jena Parsers (via Clerezza) for parsing. If
you have non ASCII characters I recommend to store the file as UTF-8
and process it telling Stanbol that it is Turtle formatted. N-Triples
is a sub-set of Turtle so any N-Triples file is also a valid Turtle
file. However Turtle does support charsets. At least this is the trick
I use when loading RDF to a Sesame based triple store. With Stanbol
(Apache Jena based) I never had a problem like that.
best
Rupert
On Thu, May 28, 2015 at 7:21 PM, Umutcan Şimşek
<um...@mni.thm.de> wrote:
> Hello All,
>
> According to N-Triples standart [1], it's not allowed to use Extended ASCII
> characters in literals. (refer EBNF)Therefore, when I extract triples from
> CMS database, I cannot represent characters like ö ü ä properly. (I replace
> it with a bytecode )
>
> Can stanbol process these characters? If I configure NLP modules for German,
> is it going to be able to recognize, for instance, the word "Jäger"?
>
> [1] http://www.w3.org/2001/sw/RDFCore/ntriples
>
> Best Regards
>
> Umutcan
>
--
| Rupert Westenthaler rupert.westenthaler@gmail.com
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO ..........................................................................
| http://redlink.co/