You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tapestry.apache.org by Denis Ponomarev <oz...@romsat.ua> on 2003/08/27 10:22:13 UTC

Re[4]: Non-latin charset in JavaScript messages

>> I am ready to add, but don't know where I can do it :)

MB> :)

MB> There is a link to the bug database on jakarta.apache.org

Probably I'll try it later...

>> Of course toString() returns strings in Unicode but resulting
>> HTML contains one-byte characters, isn't it?

MB> Based on this and the rest of your message I can conclude that you are using
MB> Tapestry 2.x.

No!
I am using 3.0. It seems like my previous explanations was not clear
enough :)
When I say "one-byte characters in HTML", I mean that when page
rendered you obtain HTML where no unicode, there are ASCII characters
only! Any non-ASCII characters are represented with special form named
"HTML entity". It consists of _several_ASCII_characters_ and looks like
&#1043;. It is not unicode! There is several ASCII chars instead of
one non-ASCII. Of course this ASCII chars describe non-ASCII chracter
by it's unicode value, but this is not unicode exactly - this is unicode
presentation, in the case of &#1043; it uses 7 bytes (one per char),
not 2 bytes as unicode do.
But there is no similar technic applied during javascript rendering.

MB> Version 3.0 has a number of features that, I believe, resolve the
MB> encoding/internationalization problem completely.

I don't think so!

Consider simple example:

Try to create a page with a simple form and put one ValidField on it.
Assign StringValidator to it. Set clientScriptingEnabled property to
true. (This will generate validation javascript.)
Now assume that application should have multilanguage support.
So you should specify localized versions of validation messages
for langugages you want to support. For example you can bind
requiredMessage property of the validator to the message from the
Page.properties file. To support another languages you simply add
localized properties files, for example Page_ru.properties.
Try to do it for any langugage with non-latin characters.

Now if you'll run application and open resulting HTML you'll see in the
source that validation script contains string literals with messages.
They are non-ASCII! So to force browser
understand them you should specify charset of the page - by http
content-type header or by http-equiv in the metatag.

Even if you are supporting ONE langugage and it is non-latin you
must add servlet filter to specify request encoding if you want
receive valid user input from the forms.

But we want to support _several_ languages, we can't just substitute
content-type of the output because it must support _all_ our langugages and
must have universal charset (utf-8).

So what can we do with Tapestry from this point? My answer is nothing.
And my proposed solution was quite obvious: change rendering of the
javascript. Such as it was done with HTML rendering (substituting of
non-latin with html-entities) it must be done with javascript
rendering (substituting of non-latin with [\u + code] sequences).
After doing so we don't need to specify charset other than utf-8. Our pages will
contain ASCII characters only!


Re[8]: Non-latin charset in JavaScript messages

Posted by Denis Ponomarev <oz...@romsat.ua>.
AG> What are you using as a server, Tomcat, Websphere, etc?

AG> In Tomcat, you could simply create the directory structure
AG> org/apache/tapestry/valid under common/classes in the tomcat directory and
AG> put the language specific files there.

At now I'm using JBoss with Jetty, but going to download newer
version of JBoss with Tomcat.

Thank you.


RE: Re[6]: Non-latin charset in JavaScript messages

Posted by Adam Greene <ag...@romulin.com>.
What are you using as a server, Tomcat, Websphere, etc?

In Tomcat, you could simply create the directory structure
org/apache/tapestry/valid under common/classes in the tomcat directory and
put the language specific files there.

-----Original Message-----
From: Denis Ponomarev [mailto:oz@romsat.ua]
Sent: Wednesday, August 27, 2003 11:19 AM
To: Tapestry users
Subject: Re[6]: Non-latin charset in JavaScript messages


>> I am using 3.0.

MB> I should have specified, I guess -- please use 3.0 beta-2. The earlier
MB> releases of 3.0 do not have the encoding facilities.

In 3.0 beta-2 everything works fine! Thanks a lot!

And last question on this topic - how can I add my localized version
of the ValidationStrings.properties file?

I understand that I can bind messasges from my Page.properties file to
the message-properties of validators, but this is inconvenience to do
it over all my application.


I found this in the BaseValidator class:

ResourceBundle strings =

ResourceBundle.getBundle("org.apache.tapestry.valid.ValidationStrings",
locale);

So default class loader used to load strings. As I understand it can't
see my application's classpath, isn't it?

Any suggestions?


---------------------------------------------------------------------
To unsubscribe, e-mail: tapestry-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tapestry-user-help@jakarta.apache.org


Re[6]: Non-latin charset in JavaScript messages

Posted by Denis Ponomarev <oz...@romsat.ua>.
>> I am using 3.0.

MB> I should have specified, I guess -- please use 3.0 beta-2. The earlier
MB> releases of 3.0 do not have the encoding facilities.

In 3.0 beta-2 everything works fine! Thanks a lot!

And last question on this topic - how can I add my localized version
of the ValidationStrings.properties file?

I understand that I can bind messasges from my Page.properties file to
the message-properties of validators, but this is inconvenience to do
it over all my application.


I found this in the BaseValidator class:

ResourceBundle strings =
         ResourceBundle.getBundle("org.apache.tapestry.valid.ValidationStrings", locale);

So default class loader used to load strings. As I understand it can't
see my application's classpath, isn't it?

Any suggestions?


RE: Re[4]: Non-latin charset in JavaScript messages

Posted by Mind Bridge <mi...@yahoo.com>.
Hi Denis,

> I am using 3.0.

I should have specified, I guess -- please use 3.0 beta-2. The earlier
releases of 3.0 do not have the encoding facilities.


With that version you could try the following:

Run the Workbench.
Open a new IE window, go to Tools/Internet Options/Languages, add 'Chinese
(Taiwan)' (zh-tw) as a language and place it at the top of the language list
(with highest priority). I am choosing this language as a demo, since
Tapestry contains ValidationStrings translated into Chinese (Taiwanese) by
default.
Open the Workbench in that browser, go to the Fields page, and without
typing anything (leaving client-side validation enabled), click on Continue.
What do you see?

I personally see an error message (field is required) in Chinese on a number
of different machines. This message is in the JavaScript of the page, and
thus it has to have been encoded properly in utf-8.

This is similar to what you were suggesting, I believe, and it does seem to
work consistently.


> So to force browser
> understand them you should specify charset of the page - by http
> content-type header or by http-equiv in the metatag.

3.0 beta 2 does the following:
- always sets the content-type in the header with the encoding used
- the Shell component includes http-equiv by default as well again including
the encoding (this is necessary for forms in IE)
- decodes the POST requests using that same encoding


What you are seeing is definitely very weird. I cannot think of a specific
reason why it should occur. The 'usual suspects' are the following (but they
don't quite fit what you are saying:

  - the property file with the messages has not been run through
native2ascii -- this is a requirement of the standard Java Properties
implementation. As a result, the

  - getResponseWriter() was overriden incorrectly in the page



I am not sure if this helps. If it does not, please contact me directly --
if this is a problem, we should get to the bottom of this.

Best regards,
-mb


-----Original Message-----
From: Denis Ponomarev [mailto:oz@romsat.ua]
Sent: Wednesday, August 27, 2003 11:22 AM
To: Tapestry users
Subject: Re[4]: Non-latin charset in JavaScript messages


>> I am ready to add, but don't know where I can do it :)

MB> :)

MB> There is a link to the bug database on jakarta.apache.org

Probably I'll try it later...

>> Of course toString() returns strings in Unicode but resulting
>> HTML contains one-byte characters, isn't it?

MB> Based on this and the rest of your message I can conclude that you are
using
MB> Tapestry 2.x.

No!
I am using 3.0. It seems like my previous explanations was not clear
enough :)
When I say "one-byte characters in HTML", I mean that when page
rendered you obtain HTML where no unicode, there are ASCII characters
only! Any non-ASCII characters are represented with special form named
"HTML entity". It consists of _several_ASCII_characters_ and looks like
&#1043;. It is not unicode! There is several ASCII chars instead of
one non-ASCII. Of course this ASCII chars describe non-ASCII chracter
by it's unicode value, but this is not unicode exactly - this is unicode
presentation, in the case of &#1043; it uses 7 bytes (one per char),
not 2 bytes as unicode do.
But there is no similar technic applied during javascript rendering.

MB> Version 3.0 has a number of features that, I believe, resolve the
MB> encoding/internationalization problem completely.

I don't think so!

Consider simple example:

Try to create a page with a simple form and put one ValidField on it.
Assign StringValidator to it. Set clientScriptingEnabled property to
true. (This will generate validation javascript.)
Now assume that application should have multilanguage support.
So you should specify localized versions of validation messages
for langugages you want to support. For example you can bind
requiredMessage property of the validator to the message from the
Page.properties file. To support another languages you simply add
localized properties files, for example Page_ru.properties.
Try to do it for any langugage with non-latin characters.

Now if you'll run application and open resulting HTML you'll see in the
source that validation script contains string literals with messages.
They are non-ASCII! So to force browser
understand them you should specify charset of the page - by http
content-type header or by http-equiv in the metatag.

Even if you are supporting ONE langugage and it is non-latin you
must add servlet filter to specify request encoding if you want
receive valid user input from the forms.

But we want to support _several_ languages, we can't just substitute
content-type of the output because it must support _all_ our langugages and
must have universal charset (utf-8).

So what can we do with Tapestry from this point? My answer is nothing.
And my proposed solution was quite obvious: change rendering of the
javascript. Such as it was done with HTML rendering (substituting of
non-latin with html-entities) it must be done with javascript
rendering (substituting of non-latin with [\u + code] sequences).
After doing so we don't need to specify charset other than utf-8. Our pages
will
contain ASCII characters only!


---------------------------------------------------------------------
To unsubscribe, e-mail: tapestry-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tapestry-user-help@jakarta.apache.org