You are viewing a plain text version of this content. The canonical link for it is here.
Posted to taglibs-dev@jakarta.apache.org by Tim Dawson <td...@yahoo.com> on 2002/01/22 23:05:19 UTC

RE: Programer-defined Locale->charset mapping (was: Re[2]: standard/fmt or i18n the problem is still there)

> BTW, i've just got one more idea, how to extend this and make the
> charset selection even more dynamic
>
> we define some interface, something like
>
> interface CharsetMapper{
>   String getCharset(HttpRequest rec, java.util.Locale loc);
> }

I generally like pluggable interfaces, and we could do that, but I don't see
many other taglibs using this kind of pattern. A pattern that I have seen a
lot, and would also give you more dynamic selection, is to search for
"org.apache.taglibs.i18n.CharsetMap.<locale>" in the (page, request,
session, application) hierarchy, then default to whatever is defined in the
Servlet Context if nothing else is available. (incidentally, I got this
approach from Jan; this is how the standard taglib is planning to do a
number of things)

This way you could use an init servlet to load the map from somewhere, say,
a relational database, into the application scope if the
ServletContext/web.xml approach is too static.

You could also provide user/session-based overrides, to allow

>   or even a more detailed switch
>
>   Choose charset for the xxx languuage: xxx
>                                         yyy
>                                         zzz

And even be able to have a JSP set something at the page/request level for

>   somewhere in the site there's an explicit charset switch:
>
>      enable highly-multilingual pages (use UTF-8)
>      optimize for speed               (use national encodings)
>

Would this work?

Tim


> -----Original Message-----
> From: tagunov [mailto:tagunov@motor.ru]
> Sent: Tuesday, January 22, 2002 2:56 PM
> To: Tim Dawson
> Cc: taglib-dev@jakarta.apache.org
> Subject: Programer-defined Locale->charset mapping (was: Re[2]:
> standard/fmt or i18n the problem is still there)
>
>
> Hello Tim!
>
> TD> Anton,
>
> TD> I've used WebLogic 6.1 and it seems to do the right
> conversion, at least for
> TD> japanese (SJIS), but that was only after I complained to them. :-)
> Do you have some updated version? The one that I have is a trial
> version that I downloaded for free a while ago. Mine still
> does not do it.
>
> TD> As far
> TD> as tomcat goes, you're right - it doesn't seem to ever
> change the charset.
>
> TD> I had also been thinking of putting the charset in the
> bundle, but also
> TD> didn't like it because the bundle is really about
> programmer-data, and
> TD> putting i18n-implementation data in there felt, to use
> your word, like a
> TD> kludge.
> :-)
> Yes, I agree, that's not good to mix i18n and programmer data.
>
> TD> I also agree that this is a servlet spec issue, and the
> charset mapping idea
> TD> for web.xml is a good solution. I've also sent in my
> comments to the
> TD> JSR-154.
>
> TD> The mapping suggested gave me an idea - in the interim
> with the i18n taglib
> TD> we can use a context-param in the web.xml that the locale
> bundle tags will
> TD> look at when calling setLocale() - if one is found for
> the locale (or just
> TD> the locale's language?) it will also call setContentType(). e.g.
>
> TD>   <context-param>
> TD>     <param-name>
> TD>       org.apache.taglibs.i18n.CharsetMap.ru
> TD>     </param-name>
> TD>     <param-value>
> TD>       ISO-8859-5
> TD>     </param-value>
> TD>   </context-param>
> That's just it, absolutely!
> Feels like Cheshire Cat to hear this :-)))
>
> TD> Can you resend the russian properties file?
> Here they are, with pleasure! If you want any other wording
> please let me know (in english :)
>   (The .orig file is encoded with windows-1251, the .properties
>   file was obtained from it with native2ascii -encoding
> windows-1251 xxx yyy)
> TD> I'll try that out with this solution.
> :-)))
>
> TD> The charset you'd map to is ISO-8859-5, right?
> windows-1251
> (The servlet engines that do perform Locale->charset mapping
> use ISO-8859-1, not windows-1251 or KOI8-R for russian that's why the
> programmer defined mapping is so wellcome!)
>
> TD> I should be able to put this in today if this approach is
> acceptable.
> IMO what has been proposed is highly wellcome.
> Still I'm mad enough to propose extra functionality,
> see the P.S. bellow.
> TD> Tim
>
> >> -----Original Message-----
> >> From: tagunov [mailto:tagunov@motor.ru]
> >> Sent: Tuesday, January 22, 2002 5:45 AM
> >> To: Tim Dawson
> >> Subject: standard/fmt or i18n the problem is still there
> (was: Re: _ja
> >> file is ISO2022JP, not SJIS coded)
> >>
> >>
> >> Hello Tim!
> >> Glad to hear from you again :-)
> >>
> >> TD> thanks for the note - I've checked it in and ensured that
> >> it works.
> >> You mean "works, but not in the way I expected it to work"? ;-)
> >> My english-russian dictionary says that a "kludge" is a "piece of
> >> code or a program that works despite it shouldn't". ;-))
> >> TD> not that it matters to you now that you're hot on the
> >> trail of the standard taglib.
> >> TD> :-)
> >> Well, the problems are still there, but they are no longer
> yours ;-) !
> >> (And I'm not too hot on the standard taglib after all :)
> >>
> >> And the problem is that it is highly desirable to have a
> >> programmer-defined
> >>  Locale->charset mapping
> >> both because developers often do not like the default one
> and because
> >> all Tomcats and Weblogics do not seem to perform _any_
> Locale->charset
> >> mapping themselves. (They all use iso-8859-1).
> >>
> >> Maybe I'll end up taking your (deprecated now :-) or standard/fmt
> >> taglib and replacing the calls to response.setLocale(locale) for
> >> MyUtil.setLocale(response,locale) where MyUtil will do this
> >> programmer-defined Locale->charset mapping.
> >>
> >> Jan Luehe's opinion on taglibs-dev was that such mapping is more
> >> appropriate in the core of the servlet spec. He recommended
> >> sending a proposal to jsr-154 (servlet 2.4 EG).
> >>
> >> I did.
> >>
> >> But both you and me can see that now, that's servlet 2.3 spec
> >> has been out for half a year already, great many people are
> >> still using
> >> servlet 2.2 and jsp 1.2 soft.
> >>
> >> So, 2.4 is far away. Even when it comes it will be a long time
> >> till it is wideley adopted. Even then some people will still use
> >> servlet 2.2 and 2.3 soft.
> >>
> >> So my opinion is that
> >>  1) yes, Locale->charset mapping is most appropriate
> >>     in the core of servlet 2.4 spec
> >>  2) it is a good idea to implement a temporary
> >>     substitute for it
> >>
> >> But then, standard taglib is an implementation of JSTL
> >> forthcoming spec. We can not expect support for
> >> such temporary work-around in the spec.
> >>
> >> Hence, two ways remain:
> >>  a) implement it in the going to be deprecated (your :-)
> i18n taglib
> >>  b) let everyone who needs it tailor the taglib on his own
> >>
> >> (See my P.S. section for a draft of this workaround I'm speaking
> >> about)
> >> >> -----Original Message-----
> >> >> From: tagunov [mailto:tagunov@motor.ru]
> >> >> Sent: Thursday, November 29, 2001 2:35 PM
> >> >> To: Tim Dawson
> >> >> Subject: _ja file is ISO2022JP, not SJIS coded
> >> >>
> >> >>
> >> >> Hello Tim!
> >> >>
> >> >> I have discovered that the sample bundle
> >> >>
> >> >>
> i18n\examples\src\org\apache\taglibs\i18n\i18n-test_ja.properties
> >> >>
> >> >> contains ISO2022JP coded text, not SJIS coded, that is why
> >> >> the <native2ascii encoding="SJIS".. in the build.xml
> >> >> japanese.encoding task does not work propelly on this file.
> >>
> >> Best regards,
> >>   Anton Tagunov     mailto:tagunov@motor.ru
> >>
> >>
> >> P.S. What this temporary workaround I'm speaking about could be:
> >> --------------------------------------------------------------
> >> ---------
> >> Jan Luehe's letter contained the following excerpt:
> >>   Kazuhiro Kazama wrote:
> >>
> >>   > ii) Some browsers uses an low-quality unicode font to
> >> display UTF-8
> >>   > encoded characters.
> >>   >
> >>   > And thus I would like to propose JSTL support multiple
> >> locale/multiple
> >>   > charset model and provide a database function to get a
> charset by
> >>   > specified locale. For example, Tomcat 4 provides
> >>   > org.apache.catalina.util.CharsetMapper internally for
> >> this purpose.
> >>   >
> >>   > But note that a locale may convined to multiple charsets.
> >> For example,
> >>   > "ja" locale is convined to one of "Shift_JIS", "ISO-2022-JP",
> >>   > "EUC-JP", "Windows-31J" etc. Because Shift_JIS has a difference
> >>   > mapping from Windows-31J, we must select one according to a Web
> >>   > application.
> >>   >
> >>   > Therefore it is a best solution to provide a database
> function to
> >>   > search a default charset-locale mapping and its override
> >> mechanism by
> >>   > a Web application.
> >>   >
> >>   > For example, in web.xml:
> >>   >     <charset-mapping>
> >>   >         <charset>ISO-8859-1</charset>
> >>   >         <locale>en</locale-type>
> >>   >     </charset-mapping>
> >>   >     <charset-mapping>
> >>   >         <charset>Shift_JIS</charset>
> >>   >         <locale>ja</locale-type>
> >>   >     </charset-mapping>
> >>   >
> >>   > This proposal may need more discussions in JSR-52, JSR-53
> >> and JSR-154
> >>   > experts and Apache committers.
> >>
> >> The idea is not that bad, and I beleive it could be implemented
> >> somewhere in the i18n taglib.
> >>
> >> My other solution was that the name of charset could be
> put into the
> >> bundle but that is not exactly the same, as in some cases
> the Locale
> >> is determined by the tags that format dates and numbers in
> the absense
> >> of a bundle (at least this is the case with standard/fmt).
> >>
>
>
>
> --
> Best regards,
>  Anton Tagunov                            mailto:tagunov@motor.ru
>
> P.S.
>
> BTW, i've just got one more idea, how to extend this and make the
> charset selection even more dynamic
>
> we define some interface, something like
>
> interface CharsetMapper{
>   String getCharset(HttpRequest rec, java.util.Locale loc);
> }
>
> (the HttpRequest is passed to enable examing the request,
> session parameters and cookies)
>
> The taglib searches the environment for some specially-named parameter
> (search is done in all the scopes: request, session and application)
> if an object is found it is cast to CharsetMapper and used.
> request.getSession(false) is passed as the first parameter.
>
> To handle reading the mapping data from the web.xml we could go one of
> the two following ways:
>
> 1) write a special servlet. it will in it's init method read its own
>    parameters:
>
>     <servlet>
>         <servlet-name>...</servlet-name>
>         <servlet-class>...</servlet-class>
>         <init-param>
>
> <param-name>org.apache.taglibs.i18n.CharsetMap.ru</param-name>
>             <param-value>windows-1251</param-value>
>         </init-param>
>     </servlet>
>
>    (or similar context parameters)
>    create an object implementing CharsetMapper interface and bind
>    it to the application scope.
>
>    This servlet will do nothing else, will have default doGet and
>    doPost and won't be bound to any path in the servlet engine.
>    (Hope it won't prevent it from being initialized? Then we'll bind
>    it to some unused path :-)
>
> 2) if no object has been found matching the special name in any scope
>    then the taglib code would search for the already described context
>    parameters.
>
> A use case for such dynamic charset selection:
>
>   somewhere in the site there's an explicit charset switch:
>
>      enable highly-multilingual pages (use UTF-8)
>      optimize for speed               (use national encodings)
>
>   or even a more detailed switch
>
>   Choose charset for the xxx languuage: xxx
>                                         yyy
>                                         zzz
>
>   5 years ago many russian sites did this.
>
>   I beleive that was due to incompatibilities in the browsers
>   and their failures to support cyrillics propelly. These
>   difficulties have been overcome by now and such selectors
>   have almost disappeared. Still I can imagine them being
>   implemented for some emergency cases.
>
> Your opinions? Is this an overkill?
>
> P.P.S. Maybe if enough people think this to be usefull
>        enough we could propose this to jsr-154 too?


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re[2]: Programer-defined Locale->charset mapping (was: Re[2]: standard/fmt or i18n the problem is still there)

Posted by tagunov <ta...@motor.ru>.
Hello Tim!

>> BTW, i've just got one more idea, how to extend this and make the
>> charset selection even more dynamic
>>
>> we define some interface, something like
>>
>> interface CharsetMapper{
>>   String getCharset(HttpRequest rec, java.util.Locale loc);
>> }

TD> I generally like pluggable interfaces, and we could do that, but I don't see
TD> many other taglibs using this kind of pattern. A pattern that I have seen a
TD> lot, and would also give you more dynamic selection, is to search for
TD> "org.apache.taglibs.i18n.CharsetMap.<locale>" in the (page, request,
TD> session, application) hierarchy, then default to whatever is defined in the
TD> Servlet Context if nothing else is available. (incidentally, I got this
TD> approach from Jan; this is how the standard taglib is planning to do a
TD> number of things)

BTW: should it be .CharsetMap or .charsetMap?
In the standard taglib they have "javax.servlet.jsp.jstl.i18n.request.charset"

TD> This way you could use an init servlet to load the map from somewhere, say,
TD> a relational database, into the application scope if the
TD> ServletContext/web.xml approach is too static.

Well, loading values from a database... It's for some
_very_ configurable application! I did not mean that! :-)

TD> You could also provide user/session-based overrides, to allow

>>   or even a more detailed switch
>>
>>   Choose charset for the xxx languuage: xxx
>>                                         yyy
>>                                         zzz

TD> And even be able to have a JSP set something at the page/request level for

>>   somewhere in the site there's an explicit charset switch:
>>
>>      enable highly-multilingual pages (use UTF-8)
>>      optimize for speed               (use national encodings)
>>

TD> Would this work?

This should work.

"org.apache.taglibs.i18n.CharsetMap.ru" -> "windows-1251"
"org.apache.taglibs.i18n.CharsetMap.ja" -> "SJIS"

How do we globally switch to UTF-8? Maybe a separate parameter

"org.apache.taglibs.i18n.CharsetMap" -> "UTF-8" ?

And the search order could be:
1) search page scope for "org.apache.taglibs.i18n.CharsetMap"
   if one is found use it
2) search page scope for "org.apache.taglibs.i18n.CharsetMap.<language>"
   if one is found use it
3) search session scope for "org.apache.taglibs.i18n.CharsetMap"
   if one is found use it
4) search session scope for "org.apache.taglibs.i18n.CharsetMap.<language>"
   if one is found use it
3) search application scope for "org.apache.taglibs.i18n.CharsetMap"
   if one is found use it
4) search application scope for "org.apache.taglibs.i18n.CharsetMap.<language>"
   if one is found use it
5) search for a context param with name "org.apache.taglibs.i18n.CharsetMap"
   if one is found use it
6) search for a context param with name "org.apache.taglibs.i18n.CharsetMap.<language>"
   if one is found use it
7) rely on the servlet container to perform the mapping.

Two more issues:

a) I beleive that such functionality should be exposed via some public
   static method on the tag support class
   so that it would be possible to invoke it from a servlet too.

b) Should we bother about other ContentTypes, for example about
   text/xml? If yes, oh <sigh>...

   b1) extra optional parameter to the <bundle> and <locale> tags?
       like <bundle .. contentType="text/xml"/>

       Keeping in mind the standard/fmt taglib design and trying
       to imagine it adopting this approach (just imagining :-)
       the <message>, <numberFormat> and <dataFormat> tags
       are also allowed to select a locale, if there is no
       enclosing <locale> or <bundle> tag. This means that either

       b1.1) <message>,<numberFormat>,<dataFormat> tags also get
             a charset parameter
       b1.2) we say: if anybody wants to explicitly set the
             contentType he has to use the <bundle> tag.
             But this does not cover the <numberFormat> functionality

   b2) "org.apache.taglibs.i18n.ContentType" parameter in some scope?
       Maybe just the request scope will be enough?
       (Meaning that someone sets it with
       some other tag from i18n or another taglib before the
       <i18n:bundle> or similar tag.)

   Frankly I like neither b1, nor b2  :-(
TD> Tim

Best regards, Anton Tagunov



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: Re[2]: Programer-defined Locale->charset mapping (was: Re[2]: standard/fmt or i18n the problem is still there)

Posted by Tim Dawson <td...@yahoo.com>.
>        <i18n:contentType contentType="text/xml"/>

I like this better -- it provides more fine-grained control, and would be
easier to remove when the servlet spec is updated to fix the underlying
problem.

By default it would use the current response locale to do the lookup
(meaning you first declare <i18n:bundle> or <i18n:locale> and THEN use
<i18n:contentType>). I'll also add an attribute for locale like the bundle
tag has.

I had it working under my original proposal (actually saw some cyrillic
chars!); I'll update it to use this.

Tim

> -----Original Message-----
> From: Anthony Tagunov [mailto:tagunov@newmail.ru]
> Sent: Thursday, January 24, 2002 7:21 AM
> To: Tim Dawson
> Cc: 'Tag Libraries Developers List'
> Subject: Re[2]: Programer-defined Locale->charset mapping (was: Re[2]:
> standard/fmt or i18n the problem is still there)
>
>
> Hello Tim,
>
> Wednesday, January 23, 2002, 9:42:52 AM, you wrote:
>
> TD> One hitch with this:
> >> > TD>   <context-param>
> >> > TD>     <param-name>
> >> > TD>       org.apache.taglibs.i18n.CharsetMap.ru
> >> > TD>     </param-name>
> >> > TD>     <param-value>
> >> > TD>       ISO-8859-5
> >> > TD>     </param-value>
> >> > TD>   </context-param>
>
> TD> I can't simply update the charset, because there's also
> no way to set the
> TD> charset without overriding the content type. Further,
> there's no way through
> TD> the Servlet API to determine what the current content
> type is on a response
> TD> (of course its not safe to assume "text/html").  So I
> can't even grab the
> TD> current content type & replace the charset.
> Very much true.
>
> TD> So I've had to make a few minor changes; it will have to
> look more like
> TD> this:
> >> > TD>   <context-param>
> >> > TD>     <param-name>
> >> > TD>       org.apache.taglibs.i18n.ContentTypeMap.ru
> >> > TD>     </param-name>
> >> > TD>     <param-value>
> >> > TD>       text/html; charset=ISO-8859-5
> >> > TD>     </param-value>
> >> > TD>   </context-param>
>
> TD> Unless someone has a better idea... which I'd be very
> glad to hear.
>
> A couple more ideas:
>
> 2)     extra parameter to the <bundle> and <locale> tags:
>        <bundle .. contentType="text/xml"/>
>
>
>        maybe <message>,<numberFormat>,<dataFormat> tags also
>        are allowed to have the contentType parameter, if they
>        are allowed to set the Locale/charset?
>
>
> 3)     a special parameter in the request scope.
>
>        to avoid adding the contentType parameter to all the tags
>        that may potentially need it we put it into a request
>        parameter.
>
>        to set it we may introduce an extra tag:
>
>        <%@ page %>
>        <i18n:contentType contentType="text/xml"/>
>        <i18n:bundle>
>        ...
>        </i18n:bundle>
>
> --
> Best regards,
>  Anton Tagunov                            mailto:tagunov@newmail.ru
>


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re[2]: Programer-defined Locale->charset mapping (was: Re[2]: standard/fmt or i18n the problem is still there)

Posted by Anthony Tagunov <ta...@newmail.ru>.
Hello Tim,

Wednesday, January 23, 2002, 9:42:52 AM, you wrote:

TD> One hitch with this:
>> > TD>   <context-param>
>> > TD>     <param-name>
>> > TD>       org.apache.taglibs.i18n.CharsetMap.ru
>> > TD>     </param-name>
>> > TD>     <param-value>
>> > TD>       ISO-8859-5
>> > TD>     </param-value>
>> > TD>   </context-param>

TD> I can't simply update the charset, because there's also no way to set the
TD> charset without overriding the content type. Further, there's no way through
TD> the Servlet API to determine what the current content type is on a response
TD> (of course its not safe to assume "text/html").  So I can't even grab the
TD> current content type & replace the charset.
Very much true.

TD> So I've had to make a few minor changes; it will have to look more like
TD> this:
>> > TD>   <context-param>
>> > TD>     <param-name>
>> > TD>       org.apache.taglibs.i18n.ContentTypeMap.ru
>> > TD>     </param-name>
>> > TD>     <param-value>
>> > TD>       text/html; charset=ISO-8859-5
>> > TD>     </param-value>
>> > TD>   </context-param>

TD> Unless someone has a better idea... which I'd be very glad to hear.

A couple more ideas:

2)     extra parameter to the <bundle> and <locale> tags:
       <bundle .. contentType="text/xml"/>


       maybe <message>,<numberFormat>,<dataFormat> tags also
       are allowed to have the contentType parameter, if they
       are allowed to set the Locale/charset?


3)     a special parameter in the request scope.

       to avoid adding the contentType parameter to all the tags
       that may potentially need it we put it into a request
       parameter.
       
       to set it we may introduce an extra tag:

       <%@ page %>
       <i18n:contentType contentType="text/xml"/>
       <i18n:bundle>
       ...
       </i18n:bundle>

-- 
Best regards,
 Anton Tagunov                            mailto:tagunov@newmail.ru



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: Programer-defined Locale->charset mapping (was: Re[2]: standard/fmt or i18n the problem is still there)

Posted by Tim Dawson <td...@yahoo.com>.
One hitch with this:
> > TD>   <context-param>
> > TD>     <param-name>
> > TD>       org.apache.taglibs.i18n.CharsetMap.ru
> > TD>     </param-name>
> > TD>     <param-value>
> > TD>       ISO-8859-5
> > TD>     </param-value>
> > TD>   </context-param>

I can't simply update the charset, because there's also no way to set the
charset without overriding the content type. Further, there's no way through
the Servlet API to determine what the current content type is on a response
(of course its not safe to assume "text/html").  So I can't even grab the
current content type & replace the charset.

So I've had to make a few minor changes; it will have to look more like
this:
> > TD>   <context-param>
> > TD>     <param-name>
> > TD>       org.apache.taglibs.i18n.ContentTypeMap.ru
> > TD>     </param-name>
> > TD>     <param-value>
> > TD>       text/html; charset=ISO-8859-5
> > TD>     </param-value>
> > TD>   </context-param>

Unless someone has a better idea... which I'd be very glad to hear.

Tim


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>