You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Ronald Vyhmeister <rv...@gmail.com> on 2008/11/27 01:47:07 UTC

Setting encoding for tomcat compiler

In looking through the documentation, it looks like the default encoding for
the compiler is ISO-8859-1.  I need to use Windows-1251 (Russian input). The
javac compiler takes an encoding option, but I have not figured out (maybe
it's just too late) how to make it use that encoding for all files (only one
application on the server, so no need to have multiple choices)...

The database (postgresql) is UTF8, and will auto convert from WIN1251, but
right now it's receiving the stuff as LATIN1 (8859-1)...

Any help is appreciated!

Ron



---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Setting encoding for tomcat compiler

Posted by Michael Ludwig <mi...@gmx.de>.
Ronald Vyhmeister schrieb am 27.11.2008 um 08:47:07 (+0800):
> In looking through the documentation, it looks like the default
> encoding for the compiler is ISO-8859-1.

Not quite. The javac man page (1.4, 1.6 ...) has this to say:

  -encoding encoding
    Set the source file encoding name, such as EUC-JP and UTF-8. If
    -encoding is not specified, the platform default converter is used.

>  I need to use Windows-1251 (Russian input). The javac compiler takes
> an encoding option, but I have not figured out (maybe it's just too
> late) how to make it use that encoding for all files (only one
> application on the server, so no need to have multiple choices)...

Always use that option. Or define an alias, if you're on UNIX. Or write
a shell script calling javac with your options. Or if you use an IDE,
configure it accordingly.

> The database (postgresql) is UTF8, and will auto convert from WIN1251,
> but right now it's receiving the stuff as LATIN1 (8859-1)...

That doesn't have anything to do with javac, where you specify the
*source file* encoding.

An application dealing with different encodings has to be made aware of
the issue. When reading text data, always specify the correct character
encoding. If you read CP1251 and have your application believe it is
Latin-1, your results won't make much sense.

You must have code like this, which takes the encoding as parameter:

C:\dev\Java\Encoding :: more /t1 Convert.java
/*
 * Konvertiert von einer Zeichenkodierung in die andere.
 */

import java.io.*;

public class Convert {
 public static void main( String[] args) throws IOException {
  assert args.length > 3 :
   "Argumente: Quelldatei Quellkodierung Zieldatei Ziellkodierung";
  Reader in = null;
  Writer out = null;
  try {
   in = new BufferedReader(
     new InputStreamReader(
      new FileInputStream( args[0]), args[1]));
   out = new BufferedWriter(
     new OutputStreamWriter(
      new FileOutputStream( args[2]), args[3]));
   int c;
   while ( (c = in.read()) != -1 )
    out.write( c);
  }
  finally {
   if ( in  != null ) in.close();
   if ( out != null ) out.close();
  }
 }
}

C:\dev\Java\Encoding :: java -cp . Convert CP1251.txt latin1 Murks.txt
utf-8

C:\dev\Java\Encoding :: more Murks.txt
????€???­ ???°? ?Š?€? ?­???­ ?????®?­???? ???®?????? ???® ???°???¬??
???§?°?»???
®?? ?? ?????¬??? ??

Michael Ludwig

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org