You are viewing a plain text version of this content. The canonical link for it is here.
Posted to community@apache.org by "J.Pietschmann" <pi...@apache.org> on 2004/09/02 22:51:39 UTC

Non-ASCII chars in Java comments

Adam R. B. Jack wrote:
> Some projects with issues (some JDK 1.5, some not) are listed here:
>     http://brutus.apache.org/gump/jdk15/project_todos.html

Neat!

I did a quick check of the BCEL issues, and they are exclusively
problems with non-ASCII characters. While the BCEL problems are
easily fixed (bullet characters, probably cut&pasted from a HTML
page, and a few german umlauts), we had a similar problem with
a FOP source file some times ago, which was not as easily resolved,
because it was an email address containing the characters causing
the troubles. Ultimately, the originator allowed to pull the
address and have his name respelled in a romanized form.

Related questions:
1. Javac allows Java source file encodings with a greater range
  of characters, in particular UTF-8. Unfortunately, there is
  no standardized auto-detection mechanism (as for XML).
  Does anybody wants to discuss how projects/the whole ASF should
  deal with non-ASCII encodings for Java files?
2. How should situations be handled where characters which can't
  be encoded are important, like in email addresses or IRLs
  (internationalized URLs)?

How do Perl developers with this issues?

Regards
J.Pietschmann


---------------------------------------------------------------------
To unsubscribe, e-mail: community-unsubscribe@apache.org
For additional commands, e-mail: community-help@apache.org


Re: Non-ASCII chars in Java comments

Posted by Santiago Gala <sg...@apache.org>.
J.Pietschmann wrote:

> Adam R. B. Jack wrote:
>
>> Some projects with issues (some JDK 1.5, some not) are listed here:
>>     http://brutus.apache.org/gump/jdk15/project_todos.html
>
>
> Neat!
>
> I did a quick check of the BCEL issues, and they are exclusively
> problems with non-ASCII characters. While the BCEL problems are
> easily fixed (bullet characters, probably cut&pasted from a HTML
> page, and a few german umlauts), we had a similar problem with
> a FOP source file some times ago, which was not as easily resolved,
> because it was an email address containing the characters causing
> the troubles. Ultimately, the originator allowed to pull the
> address and have his name respelled in a romanized form.
>
> Related questions:
> 1. Javac allows Java source file encodings with a greater range
>  of characters, in particular UTF-8. Unfortunately, there is
>  no standardized auto-detection mechanism (as for XML).
>  Does anybody wants to discuss how projects/the whole ASF should
>  deal with non-ASCII encodings for Java files?

Typically, IMO, the only way to deal with it involves adopting the 
convention that all files in a project are UTF-8 (which can hold any 
character). The Java books I read recommend using the \uXXXX convention 
for high characters in source code, so that no character in a java 
source is non-ASCII. I think that this convention should work in 
javadocs, but never tested it.

I've just found a similar bug with OpenOffice.org java files, which 
refused to compile in my es_ES.utf8 machine unless I prefixed the build 
with LC_ALL=C or a similar non-utf encoding.


> 2. How should situations be handled where characters which can't
>  be encoded are important, like in email addresses or IRLs
>  (internationalized URLs)?
>
IMO, adopting the convention that each project tarball uses a given 
encoding (UTF-8 ideally, since it minimizes breakage), and (for linux) 
using LC_ALL=en_US.utf8 before building (this was the issue I found, 
that some files in OpenOffice come encoded in iso-8859-1 but with no 
meta-information saying so). For window I have no idea if the encoding 
can be changed for a session or something.


> How do Perl developers with this issues?
>
> Regards
> J.Pietschmann
>
This issue is language independent, it is a problem that will exist 
until a common encoding is used or meta information for all files is 
available.

---------------------------------------------------------------------
To unsubscribe, e-mail: community-unsubscribe@apache.org
For additional commands, e-mail: community-help@apache.org