You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by srecko joksimovic <sr...@gmail.com> on 2012/01/25 08:36:42 UTC
Stanbol and Unicode
Hi Rupert,
I have to say that Stanbol works just fine, but I have new question. Is it
possible to annotate Unicode text? For me, the most interesting is Serbian
Cyrillic. I know that is usually a big problem, but I would like to know if
there is a possibility.
Best,
Srecko
Re: Stanbol and Unicode
Posted by Olivier Grisel <ol...@ensta.org>.
2012/1/25 srecko joksimovic <sr...@gmail.com>:
> Hi Rupert,
>
> I have to say that Stanbol works just fine, but I have new question. Is it
> possible to annotate Unicode text? For me, the most interesting is Serbian
> Cyrillic. I know that is usually a big problem, but I would like to know if
> there is a possibility.
UTF-8 input should be properly decoded by the enhancer (be sure to
give the charset in the HTTP request headers). If it's not the case it
is a bug. Please report it with an example to reproduce the issue on
the jira:
https://issues.apache.org/jira/browse/STANBOL
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Fwd: Stanbol and Unicode
Posted by Rupert Westenthaler <ru...@gmail.com>.
Only noticed after Olivier's replay that I have not sent my initial
replay to srecko also to the stanbol-dev list. Sorry for that.
Even that Olivier answer also correctly describes it let me add my
replay for completeness.
Note that I have corrected a typo in the link provided for [1]
---------- Forwarded message ----------
From: Rupert Westenthaler <ru...@gmail.com>
Date: Wed, Jan 25, 2012 at 9:22 AM
Subject: Re: Stanbol and Unicode
To: srecko joksimovic <sr...@gmail.com>
Hi
Stanbol internally uses UTF-8 and currently also all returned data are
encoded using UTF8. As far as I know the "Accept-Charset" [1] is
currently ignored.
If you parse data in other formats that UTF-8 to Stanbols RESTful
services make sure that you specify the correct charset in the
"Content-Type" http header [2].
best
Rupert
[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.2
[2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17
On Wed, Jan 25, 2012 at 8:36 AM, srecko joksimovic
<sr...@gmail.com> wrote:
> Hi Rupert,
>
> I have to say that Stanbol works just fine, but I have new question. Is it
> possible to annotate Unicode text? For me, the most interesting is Serbian
> Cyrillic. I know that is usually a big problem, but I would like to know if
> there is a possibility.
>
> Best,
> Srecko
--
| Rupert Westenthaler rupert.westenthaler@gmail.com
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen
--
| Rupert Westenthaler rupert.westenthaler@gmail.com
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen