You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by György Chityil <gy...@gmail.com> on 2011/12/11 21:13:36 UTC

Re: Linux: UTF8 text file fed to opennlp comes back as ANSI

Hi Jörn,

Meanwhile I researched the output encoding issue, and found this
http://stackoverflow.com/questions/2415597/java-how-to-detect-and-change-encoding-of-system-console

perhaps the output encoding could be passed in as an arg for the
opennlp console, and utf-8 could be defined as the default.

Re: Linux: UTF8 text file fed to opennlp comes back as ANSI

Posted by György Chityil <gy...@gmail.com>.
Sounds great James, will be happy to test it. Reading that
stackoverflow issue I also found this other one:
http://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream

that mentions:
 http://code.google.com/p/juniversalchardet/

This could be used to detect the encoding of the source doc, and
output the result in the same format.


On Mon, Dec 12, 2011 at 3:11 AM, James Kosin <ja...@gmail.com> wrote:
> Hi,
>
> I'm currently working on patches to try and fix all this.  It really
> depends on the platform running the java application and not so much a
> problem with Java.
> I've found other references to other articles that do into great detail
> on this issue.
>
> James
>
> On 12/11/2011 3:13 PM, György Chityil wrote:
>> Hi Jörn,
>>
>> Meanwhile I researched the output encoding issue, and found this
>> http://stackoverflow.com/questions/2415597/java-how-to-detect-and-change-encoding-of-system-console
>>
>> perhaps the output encoding could be passed in as an arg for the
>> opennlp console, and utf-8 could be defined as the default.
>



-- 
Gyuri
274 44 98
06 30 5888 744

Re: Linux: UTF8 text file fed to opennlp comes back as ANSI

Posted by James Kosin <ja...@gmail.com>.
Hi,

I'm currently working on patches to try and fix all this.  It really
depends on the platform running the java application and not so much a
problem with Java.
I've found other references to other articles that do into great detail
on this issue.

James

On 12/11/2011 3:13 PM, György Chityil wrote:
> Hi Jörn,
>
> Meanwhile I researched the output encoding issue, and found this
> http://stackoverflow.com/questions/2415597/java-how-to-detect-and-change-encoding-of-system-console
>
> perhaps the output encoding could be passed in as an arg for the
> opennlp console, and utf-8 could be defined as the default.