You are viewing a plain text version of this content. The canonical link for it is here.
Posted to community@apache.org by "J.Pietschmann" <pi...@apache.org> on 2004/09/02 22:51:39 UTC
Non-ASCII chars in Java comments
Adam R. B. Jack wrote:
> Some projects with issues (some JDK 1.5, some not) are listed here:
> http://brutus.apache.org/gump/jdk15/project_todos.html
Neat!
I did a quick check of the BCEL issues, and they are exclusively
problems with non-ASCII characters. While the BCEL problems are
easily fixed (bullet characters, probably cut&pasted from a HTML
page, and a few german umlauts), we had a similar problem with
a FOP source file some times ago, which was not as easily resolved,
because it was an email address containing the characters causing
the troubles. Ultimately, the originator allowed to pull the
address and have his name respelled in a romanized form.
Related questions:
1. Javac allows Java source file encodings with a greater range
of characters, in particular UTF-8. Unfortunately, there is
no standardized auto-detection mechanism (as for XML).
Does anybody wants to discuss how projects/the whole ASF should
deal with non-ASCII encodings for Java files?
2. How should situations be handled where characters which can't
be encoded are important, like in email addresses or IRLs
(internationalized URLs)?
How do Perl developers with this issues?
Regards
J.Pietschmann
---------------------------------------------------------------------
To unsubscribe, e-mail: community-unsubscribe@apache.org
For additional commands, e-mail: community-help@apache.org
Re: Non-ASCII chars in Java comments
Posted by Santiago Gala <sg...@apache.org>.
J.Pietschmann wrote:
> Adam R. B. Jack wrote:
>
>> Some projects with issues (some JDK 1.5, some not) are listed here:
>> http://brutus.apache.org/gump/jdk15/project_todos.html
>
>
> Neat!
>
> I did a quick check of the BCEL issues, and they are exclusively
> problems with non-ASCII characters. While the BCEL problems are
> easily fixed (bullet characters, probably cut&pasted from a HTML
> page, and a few german umlauts), we had a similar problem with
> a FOP source file some times ago, which was not as easily resolved,
> because it was an email address containing the characters causing
> the troubles. Ultimately, the originator allowed to pull the
> address and have his name respelled in a romanized form.
>
> Related questions:
> 1. Javac allows Java source file encodings with a greater range
> of characters, in particular UTF-8. Unfortunately, there is
> no standardized auto-detection mechanism (as for XML).
> Does anybody wants to discuss how projects/the whole ASF should
> deal with non-ASCII encodings for Java files?
Typically, IMO, the only way to deal with it involves adopting the
convention that all files in a project are UTF-8 (which can hold any
character). The Java books I read recommend using the \uXXXX convention
for high characters in source code, so that no character in a java
source is non-ASCII. I think that this convention should work in
javadocs, but never tested it.
I've just found a similar bug with OpenOffice.org java files, which
refused to compile in my es_ES.utf8 machine unless I prefixed the build
with LC_ALL=C or a similar non-utf encoding.
> 2. How should situations be handled where characters which can't
> be encoded are important, like in email addresses or IRLs
> (internationalized URLs)?
>
IMO, adopting the convention that each project tarball uses a given
encoding (UTF-8 ideally, since it minimizes breakage), and (for linux)
using LC_ALL=en_US.utf8 before building (this was the issue I found,
that some files in OpenOffice come encoded in iso-8859-1 but with no
meta-information saying so). For window I have no idea if the encoding
can be changed for a session or something.
> How do Perl developers with this issues?
>
> Regards
> J.Pietschmann
>
This issue is language independent, it is a problem that will exist
until a common encoding is used or meta information for all files is
available.
---------------------------------------------------------------------
To unsubscribe, e-mail: community-unsubscribe@apache.org
For additional commands, e-mail: community-help@apache.org