You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Carsten Klein <c....@datagis.com> on 2021/05/28 07:14:03 UTC
Encoding of LocalStrings_xy.properties files
Hi there,
I'm facing character set encoding problems in quite a recent Tomcat 10
setup. I noticed that with the http://localhost:8080/manager/html
application in a browser (my browser) set to German language.
My Tomcat runs from within Eclipse, built with the official build.xml
file. I'm using my forked cklein05/tomcat GitHub repository, which is
nearly up to date with your main branch.
In the Manager application, there are texts which contain German
umlauts, like "Lösche Sitzungen" (Expire sessions, aka
htmlManagerServlet.appsExpire).
These buttons now have captions that look like "Lösche Sitzungen".
Obviously that's an UTF-8 <-> ISO-xxxx-y conversion issue.
I'm pretty sure that my setup is not causing that problems. After
digging into GitHub, I found that recently someone converted many (or
all) messages files to UTF-8:
https://github.com/apache/tomcat/commit/90fe08bdee0494110bb8145d2f067b61f74ae429
However, since these language files are actually java.util.Properties
files, these must be encoded as ISO-8859-1:
https://docs.oracle.com/javase/8/docs/api/java/util/Properties.html#load-java.io.InputStream-
That's also true for more recent versions of Java.
The language files are actually Properties files in a (according do
Javadoc) "simple line-oriented format". These must be loaded with the
Properties.load method(s) and must always be in ISO-8859-1. In contrast,
there are XML-based Properties files, that must be loaded with method(s)
loadFromXML(...). Only these must be encoded in UTF-8.
Although editing international language files in ISO-8859-1 requires
many \uXXXX escapes and is a hassle, for my mind, converting these
plain-text language files to UFT-8 was likely not a good idea.
But why don't others report that problem? Am I overlooking something?
According to my explanation above, that problem is neither limited to
German language nor to the Manager application. It should occur with any
language using non-ascii characters (> 127) and with all localized text
resources Tomcat is using.
Carsten
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: Encoding of LocalStrings_xy.properties files
Posted by Carsten Klein <c....@datagis.com>.
Mark,
On 01/06/2021 09:15, Mark Thomas wrote:
</snip>
> Start Tomcat with:
> catalina jpda run
> (or start but I typically use run as I nearly always want to see what is
> logged to the console)
>
> In Eclipse go to Debug > Debug Configurations > Remote Java Application
> > New Configuration. Browse to the project and then click Debug.
> Tomcat's default jpda config matches Eclipse's so so should then have a
> remote debug session set up with your Tomcat instance.
Trying that soon. Many thanks.
Carsten
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: Encoding of LocalStrings_xy.properties files
Posted by Mark Thomas <ma...@apache.org>.
On 28/05/2021 10:13, Carsten Klein wrote:
>
> Mark,
>
> On 28/05/2021 10:35, Mark Thomas wrote:
>
> </not quoting anything>
>
> No doubt that UTF-8 is the better encoding for messages and language
> files. And yes, my Eclipse actually does not use the version built by
> Ant. I use the start-tomcat.launch configuration file for starting
> Tomcat. Actually it only takes a startup-class name. So, it must
> obviously use the JARs built by Eclipse.
>
> The trick is, that in the build.xml file, you are actually converting
> message files:
>
> <!-- Convert the message files from UTF-8 to ASCII. This can be removed
> after upgrading to Java 9+ as the minimum JRE and specifying the
> encoding when loading the ResourceBundles -->
>
> Simple. However, you do that after having them copied. While copying,
> you use filtering-copy and specify ISO-8859-1 as the file's encoding:
>
> <!-- Copy static resource files -->
> <copy todir="${tomcat.classes}" encoding="ISO-8859-1">
> <filterset refid="version.filters"/>
> <fileset dir="java">
> <include name="**/*.properties"/>
> <exclude name="**/LocalStrings*.properties"/>
> [...]
>
> Should be UTF-8 now?
Strictly, yes. Practically, it makes no difference because the filters
that are applied do find and replacement with ASCII strings and are
highly unlikely to ever be anything other than ASCII.
I'll get that updated.
> Back to the Eclipse. I guess there is not much difference between
> calling Ant from the console and using Eclipse's Ant support (Run As ->
> Ant build). But, how to start that with support for debugging in Eclipse
> (may be a dumb questing, I know)?
Start Tomcat with:
catalina jpda run
(or start but I typically use run as I nearly always want to see what is
logged to the console)
In Eclipse go to Debug > Debug Configurations > Remote Java Application
> New Configuration. Browse to the project and then click Debug.
Tomcat's default jpda config matches Eclipse's so so should then have a
remote debug session set up with your Tomcat instance.
Mark
>
> Carsten
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: Encoding of LocalStrings_xy.properties files
Posted by Carsten Klein <c....@datagis.com>.
Mark,
On 28/05/2021 10:35, Mark Thomas wrote:
</not quoting anything>
No doubt that UTF-8 is the better encoding for messages and language
files. And yes, my Eclipse actually does not use the version built by
Ant. I use the start-tomcat.launch configuration file for starting
Tomcat. Actually it only takes a startup-class name. So, it must
obviously use the JARs built by Eclipse.
The trick is, that in the build.xml file, you are actually converting
message files:
<!-- Convert the message files from UTF-8 to ASCII. This can be removed
after upgrading to Java 9+ as the minimum JRE and specifying the
encoding when loading the ResourceBundles -->
Simple. However, you do that after having them copied. While copying,
you use filtering-copy and specify ISO-8859-1 as the file's encoding:
<!-- Copy static resource files -->
<copy todir="${tomcat.classes}" encoding="ISO-8859-1">
<filterset refid="version.filters"/>
<fileset dir="java">
<include name="**/*.properties"/>
<exclude name="**/LocalStrings*.properties"/>
[...]
Should be UTF-8 now?
Back to the Eclipse. I guess there is not much difference between
calling Ant from the console and using Eclipse's Ant support (Run As ->
Ant build). But, how to start that with support for debugging in Eclipse
(may be a dumb questing, I know)?
Carsten
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: Encoding of LocalStrings_xy.properties files
Posted by Mark Thomas <ma...@apache.org>.
On 28/05/2021 08:14, Carsten Klein wrote:
> Hi there,
>
> I'm facing character set encoding problems in quite a recent Tomcat 10
> setup. I noticed that with the http://localhost:8080/manager/html
> application in a browser (my browser) set to German language.
>
> My Tomcat runs from within Eclipse, built with the official build.xml
> file.
I suspect that that is not actually the case and that Eclipse is running
from its own copy of the source and compiled classes.
> I'm using my forked cklein05/tomcat GitHub repository, which is
> nearly up to date with your main branch.
>
> In the Manager application, there are texts which contain German
> umlauts, like "Lösche Sitzungen" (Expire sessions, aka
> htmlManagerServlet.appsExpire).
>
> These buttons now have captions that look like "Lösche Sitzungen".
> Obviously that's an UTF-8 <-> ISO-xxxx-y conversion issue.
>
> I'm pretty sure that my setup is not causing that problems.
Yes, it is.
> After
> digging into GitHub, I found that recently someone converted many (or
> all) messages files to UTF-8:
>
> https://github.com/apache/tomcat/commit/90fe08bdee0494110bb8145d2f067b61f74ae429
>
>
> However, since these language files are actually java.util.Properties
> files,
Not quite. They are java.util.ResourceBundle files.
> these must be encoded as ISO-8859-1:
>
> https://docs.oracle.com/javase/8/docs/api/java/util/Properties.html#load-java.io.InputStream-
>
> That's also true for more recent versions of Java.
Not for ResourceBundle. As of Java 9, an encoding can be specified. As
soon as the minimum required version of Java is >=9, we'll switch to
that method of loading.
> The language files are actually Properties files in a (according do
> Javadoc) "simple line-oriented format". These must be loaded with the
> Properties.load method(s) and must always be in ISO-8859-1. In contrast,
> there are XML-based Properties files, that must be loaded with method(s)
> loadFromXML(...). Only these must be encoded in UTF-8.
>
> Although editing international language files in ISO-8859-1 requires
> many \uXXXX escapes and is a hassle, for my mind, converting these
> plain-text language files to UFT-8 was likely not a good idea.
The Tomcat maintainers disagree. Using UTF-8 makes maintenance
significantly simpler and allowed integration with poeditor.com that has
enabled 175 contributors (at today's count) to contribute new and
improved translations including complete translations in Chinese and Korean.
One thing you do need to be aware of is the use of MessageFormat. Any
string that contains {n} will be passed through MessageFormat so any
single quote characters in the string need to be escaped with a second
single quote. Apart from a few special cases, any instance of {n} is
surrounded by [] to give [{n}] so that replaced values are clearly
delimited. This is to help with issues around empty values and
leading/trailing spaces that are otherwise not immediately obvious in
the logs.
> But why don't others report that problem?
A few people have. It has always been when running from the source
within an IDE.
> Am I overlooking something?
https://github.com/apache/tomcat/blob/main/build.xml#L998
> According to my explanation above, that problem is neither limited to
> German language nor to the Manager application. It should occur with any
> language using non-ascii characters (> 127) and with all localized text
> resources Tomcat is using.
The issue is going to be some variation of Eclipse loading the
ResourceBundle instances from the original source files rather than from
the transformed versions created by the build process.
Not strictly relevant here but while Eclipse is my IDE of choice, I have
always built Tomcat from the command line and used remote debugging if I
need to step through the code. My (admittedly quite dated) experience
with the various plug-ins that can be used run Tomcat inside Eclipse has
never been good. The problems were usually around picking up updates to
code and/or figuring out where configuration files were being read from.
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org