You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Steve Cohen <sc...@javactivity.org> on 2005/04/07 03:09:42 UTC

Is there an Apache or java standard for expressing non-English String literals

Neeme Praks wrote:

> Also, I noticed that your java sources are in some strange encoding. If 
> I open those tests that use french letters in my Eclipse and save them 
> then they become corrupt and will fail.
> My configuration assumes that all source files are in UTF8 and I think 
> that should be the most reasonable assumption, no?

The files in question here are 
org.apache.commons.net.ftp.parser.FTPTimestampParserImplTest.java
and
org.apache.commons.net.ftp.FTPClientConfig.java
in the jakarta-commons-net project.

Mr. Praks is correctly pointing out that my test code (and other source 
code) depends sometimes on typing string literals in languages other 
than English.  What is the CORRECT way to handle this in source code, 
and what can I do to make editors such as Eclipse handle it correctly?





---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: Is there an Apache or java standard for expressing non-English String literals

Posted by Mario Ivankovits <ma...@ops.co.at>.
Steve Cohen wrote:

> Mr. Praks is correctly pointing out that my test code (and other 
> source code) depends sometimes on typing string literals in languages 
> other than English.  What is the CORRECT way to handle this in source 
> code, and what can I do to make editors such as Eclipse handle it 
> correctly?

Why not move to UTF-8?
Else you might have to use "Unicode Escapes". Take a look at: 
http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html

---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: Is there an Apache or java standard for expressing non-English String literals

Posted by Mario Ivankovits <ma...@ops.co.at>.
Steve Cohen wrote:
> Would others agree with this?  Is the best editor setting for editing 
> code where i18n could be an issue to set the editor to UTF-8 or is it 
> better to leave it at its default local setting?  What are the pros 
> and cons here?  Had I set the editor for UTF-8 would I not have had 
> these issues?  Or is it best to consciously code with explicit unicode 
> escapes to avoid these issues on ANYONE's editor?  Or both?
I think the best is to encode source in UTF-8.
I am not sure about unicode escapes, maybe it might be best to be sure 
no wrong configured ide could destroy sensible data.
For javadoc I dont want to use this unicode-escapes, if you browse 
through the source it is bad to read.

What needs to be addressed to is the target encoding. If I recall 
correctly you could set the source-encoding and the target(?)-encoding 
of a source file. e.g. If you compile your source for target UTF-8 then 
you might have i18n issues again.

But I think we could ignore this problem and use UTF-8 for target 
encoding. There might only be a problem if we output utf-8 encoded 
string literals hardcoded into the source what is not very common. 
However, If it happens and it is not acceptable for a user it is easy to 
recompile a library with the desired encoding.

---
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: Is there an Apache or java standard for expressing non-English String literals

Posted by Steve Cohen <sc...@javactivity.org>.
Would others agree with this?  Is the best editor setting for editing 
code where i18n could be an issue to set the editor to UTF-8 or is it 
better to leave it at its default local setting?  What are the pros and 
cons here?  Had I set the editor for UTF-8 would I not have had these 
issues?  Or is it best to consciously code with explicit unicode escapes 
to avoid these issues on ANYONE's editor?  Or both?


Neeme Praks wrote:
> 
> Yes, now it should be ok.
> However, I would advise to change it anyway - all platform specific 
> settings are Bad(tm).
> :-)
> 
> Steve Cohen wrote:
> 
>> Okay, found it.  I assume, though, that it is not necessary to change 
>> this, now that I have replaced all the non-ASCII chars with unicode 
>> equivalents.  Or am I still missing something?
>>
>>
>> Neeme Praks wrote:
>>
>>> Window -> Preferences -> Workbench -> Editors
>>>
>>> and there is "Text file encoding", can be (platform) default or custom.
>>>
>>> Rgds,
>>> Neeme
>>>
>>> Steve Cohen wrote:
>>>
>>>> However, when you say "that depends on your file encoding", where is 
>>>> THAT defined, actually?  I looked through all the Eclipse options 
>>>> and found nothing indicating option to change encodings.  
>>>> Presumably, other editors I might use might have some other place to 
>>>> define this.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>>>
>>>>
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: Is there an Apache or java standard for expressing non-English String literals

Posted by Neeme Praks <ne...@apache.org>.
Yes, now it should be ok.
However, I would advise to change it anyway - all platform specific 
settings are Bad(tm).
:-)

Steve Cohen wrote:

> Okay, found it.  I assume, though, that it is not necessary to change 
> this, now that I have replaced all the non-ASCII chars with unicode 
> equivalents.  Or am I still missing something?
>
>
> Neeme Praks wrote:
>
>> Window -> Preferences -> Workbench -> Editors
>>
>> and there is "Text file encoding", can be (platform) default or custom.
>>
>> Rgds,
>> Neeme
>>
>> Steve Cohen wrote:
>>
>>> However, when you say "that depends on your file encoding", where is 
>>> THAT defined, actually?  I looked through all the Eclipse options 
>>> and found nothing indicating option to change encodings.  
>>> Presumably, other editors I might use might have some other place to 
>>> define this.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>>
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>
>

Re: Is there an Apache or java standard for expressing non-English String literals

Posted by Steve Cohen <sc...@javactivity.org>.
Okay, found it.  I assume, though, that it is not necessary to change 
this, now that I have replaced all the non-ASCII chars with unicode 
equivalents.  Or am I still missing something?


Neeme Praks wrote:
> Window -> Preferences -> Workbench -> Editors
> 
> and there is "Text file encoding", can be (platform) default or custom.
> 
> Rgds,
> Neeme
> 
> Steve Cohen wrote:
> 
>> However, when you say "that depends on your file encoding", where is 
>> THAT defined, actually?  I looked through all the Eclipse options and 
>> found nothing indicating option to change encodings.  Presumably, 
>> other editors I might use might have some other place to define this.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: Is there an Apache or java standard for expressing non-English String literals

Posted by Neeme Praks <ne...@apache.org>.
Window -> Preferences -> Workbench -> Editors

and there is "Text file encoding", can be (platform) default or custom.

Rgds,
Neeme

Steve Cohen wrote:

> However, when you say "that depends on your file encoding", where is 
> THAT defined, actually?  I looked through all the Eclipse options and 
> found nothing indicating option to change encodings.  Presumably, 
> other editors I might use might have some other place to define this.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>
>

Re: Is there an Apache or java standard for expressing non-English String literals

Posted by Steve Cohen <sc...@javactivity.org>.
robert burrell donkin wrote:
> On 7 Apr 2005, at 02:09, Steve Cohen wrote:
> 
>> Neeme Praks wrote:
>>
>>> Also, I noticed that your java sources are in some strange encoding. 
>>> If I open those tests that use french letters in my Eclipse and save 
>>> them then they become corrupt and will fail.
>>> My configuration assumes that all source files are in UTF8 and I 
>>> think that should be the most reasonable assumption, no?
>>
>>
>> The files in question here are 
>> org.apache.commons.net.ftp.parser.FTPTimestampParserImplTest.java
>> and
>> org.apache.commons.net.ftp.FTPClientConfig.java
>> in the jakarta-commons-net project.
>>
>> Mr. Praks is correctly pointing out that my test code (and other 
>> source code) depends sometimes on typing string literals in languages 
>> other than English.  What is the CORRECT way to handle this in source 
>> code, and what can I do to make editors such as Eclipse handle it 
>> correctly?
> 
> 
> that depends on your file encoding :)
> 
> if you use UFT-8 (which is typical) it's safest to use unicode escaping 
> when dealing with any non-ascii characters.
> 
> - robert
> 
> 

That's what I have done to fix this.  I converted all the non-ASCII 
chars (and also the HTML-escaped non-ASCIIs in the javadoc comments) to 
unicode.  Javadoc, apparently converts them back to HTML escaped chars 
when it creates the HTML.

However, when you say "that depends on your file encoding", where is 
THAT defined, actually?  I looked through all the Eclipse options and 
found nothing indicating option to change encodings.  Presumably, other 
editors I might use might have some other place to define this.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: Is there an Apache or java standard for expressing non-English String literals

Posted by robert burrell donkin <rd...@apache.org>.
On 7 Apr 2005, at 02:09, Steve Cohen wrote:

> Neeme Praks wrote:
>
>> Also, I noticed that your java sources are in some strange encoding. 
>> If I open those tests that use french letters in my Eclipse and save 
>> them then they become corrupt and will fail.
>> My configuration assumes that all source files are in UTF8 and I 
>> think that should be the most reasonable assumption, no?
>
> The files in question here are 
> org.apache.commons.net.ftp.parser.FTPTimestampParserImplTest.java
> and
> org.apache.commons.net.ftp.FTPClientConfig.java
> in the jakarta-commons-net project.
>
> Mr. Praks is correctly pointing out that my test code (and other 
> source code) depends sometimes on typing string literals in languages 
> other than English.  What is the CORRECT way to handle this in source 
> code, and what can I do to make editors such as Eclipse handle it 
> correctly?

that depends on your file encoding :)

if you use UFT-8 (which is typical) it's safest to use unicode escaping 
when dealing with any non-ascii characters.

- robert


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org