You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by srecko joksimovic <sr...@gmail.com> on 2012/01/25 08:36:42 UTC

Stanbol and Unicode

Hi Rupert,

I have to say that Stanbol works just fine, but I have new question. Is it
possible to annotate Unicode text? For me, the most interesting is Serbian
Cyrillic. I know that is usually a big problem, but I would like to know if
there is a possibility.

Best,
Srecko

Re: Stanbol and Unicode

Posted by Olivier Grisel <ol...@ensta.org>.
2012/1/25 srecko joksimovic <sr...@gmail.com>:
> Hi Rupert,
>
> I have to say that Stanbol works just fine, but I have new question. Is it
> possible to annotate Unicode text? For me, the most interesting is Serbian
> Cyrillic. I know that is usually a big problem, but I would like to know if
> there is a possibility.

UTF-8 input should be properly decoded by the enhancer (be sure to
give the charset in the HTTP request headers). If it's not the case it
is a bug. Please report it with an example to reproduce the issue on
the jira:

https://issues.apache.org/jira/browse/STANBOL

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Fwd: Stanbol and Unicode

Posted by Rupert Westenthaler <ru...@gmail.com>.
Only noticed after Olivier's replay that I have not sent my initial
replay to srecko also to the stanbol-dev list. Sorry for that.

Even that Olivier answer also correctly describes it let me add my
replay for completeness.

Note that I have corrected a typo in the link provided for  [1]

---------- Forwarded message ----------
From: Rupert Westenthaler <ru...@gmail.com>
Date: Wed, Jan 25, 2012 at 9:22 AM
Subject: Re: Stanbol and Unicode
To: srecko joksimovic <sr...@gmail.com>


Hi

Stanbol internally uses UTF-8 and currently also all returned data are
encoded using UTF8. As far as I know the "Accept-Charset" [1] is
currently ignored.

If you parse data in other formats that UTF-8 to Stanbols RESTful
services make sure that you specify the correct charset in the
"Content-Type" http header [2].

best
Rupert

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.2
[2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17


On Wed, Jan 25, 2012 at 8:36 AM, srecko joksimovic
<sr...@gmail.com> wrote:
> Hi Rupert,
>
> I have to say that Stanbol works just fine, but I have new question. Is it
> possible to annotate Unicode text? For me, the most interesting is Serbian
> Cyrillic. I know that is usually a big problem, but I would like to know if
> there is a possibility.
>
> Best,
> Srecko



--
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen


-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen