You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@velocity.apache.org by Guillaume Mathe <gm...@smartinnov.com> on 2003/01/20 15:58:20 UTC

Foreign characters in VTL references ?

Hi,

We're using Velocity in our server app for (very) various purposes, one 
is i18n.
Every string or sentence laid out on our web app is actually a VTL 
variable, which is the French index reference for that string/sentence.
For example: "${Bienvenue}" parses as "Welcome" if the selected Locale 
is en_US, or "Bienvenido" if the locale is spanish.
The problem is, the VTL parser stops at foreign characters in 
references, which are not in US-ASCII set, but (for example) only in 
ISO-8859-1. That includes accentuated characters.
For example, ${Accentué} (=Accentuated) is read by the parser like 
${Accentu}é, which is not "good", obviously -> the "Accentu" reference 
does not exist.
Currently the workaround is to strip accents from references put in the 
context, but there's a performance hit... Not to mention the lack of 
'intuitiveness' for the web designer who writes the templates.
So i've tried to force ISO-8859-1 everywhere, but still no positive sign 
from the parser...
Here are some test i've run. I dump the context to know if i can put 
accentuated references in it, and it works (see the dump). The problem 
lies in the .vm file parsing, which treats foreign characters like blank 
space - or token separators, even though the encoding is specified in 
both the .vm and velocity engine.

// Example start --
public class TestServlet extends VelocityServlet
{

protected Properties loadConfiguration (...) throws ...
{
   Properties p = new Properties();
[...]
   p.setProperty("input.encoding","ISO-8859-1");
   p.setProperty("output.encoding","ISO-8859-1");
   return p;
}

public Template handleRequest(...,Context ctx)
{
   response.setContentType("text/html; charset=ISO-8859-1");
[...]
  // Localizer is on object of mine...
   ctx.put("Accentué",localizer.getString("Accentué"));
[...]
   Object[] keys=ctx.getKeys();
   for(int i=0;i<keys.length;i++)
   {
   System.out.println("->key["+
    keys[i]+"]=["+
    ctx.get(keys[i].toString())+"]");
   }
   return parseTemplate("test.vm");
}
// Example end --

// output start --
  ->key[Accentué]=[Accentuated]
  ->key[browser]=[mozilla]
  ->key[req]=[org.apache.coyote.tomcat4.CoyoteRequestFacade@4acfcd]
  ->key[httpuseragent]=[Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) 
Gecko/20021016]
  ->key[res]=[org.apache.coyote.tomcat4.CoyoteResponseFacade@90d8ea]
[...]
// output end --

Side note: the result using an instance of VelocityEngine is the same, 
though i can specify the encoding:
// Example start --
// out is my response output stream...
BufferedWriter print=
   new BufferedWriter(new OutputStreamWriter(out,"ISO-8859-1"));
ve.mergeTemplate(template_name,"ISO-8859-1",ctx,print);
// Example end --


Well this is not *top priority* but, one day, i hope to be able to 
remove that ugly filter that stands between my servlet output and VTL 
macros...

Regards,
Guillaume Mathe
Smartinnov


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Foreign characters in VTL references ?

Posted by Daniel Dekany <dd...@freemail.hu>.
Wednesday, January 22, 2003, 3:06:23 PM, Guillaume Mathe wrote:

> Daniel Dekany wrote:
>
>> The problem is, AFAIK, that Velocity can't accept accented (and other
>> non-Latin) letters in identifiers because that would break backward
>> compatibility (say, $béka was $b + éka, and then it would be
>> $béka...).
>
> Hi Daniel,
>
> I agree. But in my opinion, people who use accentuated characters as 
> syntax separators deserve some trouble, don't they... ;)

Good point... :)

> And as a French developer, i know that *someday* it will be implemented 
> (as it's logical and follows Java references syntax), so i enforce the 
> use of ${...} in all cases where references are 'glued' to other 
> characters. I've seen other developers working on VTL doing the same, 
> instinctively...

BTW, as a result of that you have brought up this topic on the Vel.
list, I have brought up this on the FreeMarker list (since FM suffers
from the same problem). And it was a basically 0 work to fix it!
Funny. Well, we was lucky because in FM template language the { and }
is required... so the next release will support any accented and
Arabic and whatnot characters. It is good to watch the Vel. list,
really... :)

-- 
Best regards,
 Daniel Dekany



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Foreign characters in VTL references ?

Posted by Guillaume Mathe <gm...@smartinnov.com>.
Daniel Dekany wrote:

> The problem is, AFAIK, that Velocity can't accept accented (and other
> non-Latin) letters in identifiers because that would break backward
> compatibility (say, $béka was $b + éka, and then it would be
> $béka...).

Hi Daniel,

I agree. But in my opinion, people who use accentuated characters as 
syntax separators deserve some trouble, don't they... ;)
And as a French developer, i know that *someday* it will be implemented 
(as it's logical and follows Java references syntax), so i enforce the 
use of ${...} in all cases where references are 'glued' to other 
characters. I've seen other developers working on VTL doing the same, 
instinctively...

Regards,
Guillaume Mathe


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Foreign characters in VTL references ?

Posted by Daniel Dekany <dd...@freemail.hu>.
Wednesday, January 22, 2003, 9:43:34 AM, Guillaume Mathe wrote:

> Peter Romianowski wrote:
>
>> What you can always do is something like this:
>>
>> --- Java App filling the context --
>>
>> Map us_en = new HashMap();
>> us_en.put ("Accentué", "Accentuated");
>> ...
>>
>> context.put ("i18n", us_en);
>>
> Hi Peter,
>
> This is nice for a 'one-shot' servlet, but unfortunately, we are 
> building a full-size server app, with a (house-made) framework.
> The i18n resources are taken from a database. In each servlet i'd have 
> to build the hashmap etc. which has the same impact on performance.
> And, even if the solution is 'nice' (with the macro), it's still a 
> workaround... Thanx anyway.
> Definitely not a show-stopper, but i18n VTL references would be an 
> appreciated feature ;)

The problem is, AFAIK, that Velocity can't accept accented (and other
non-Latin) letters in identifiers because that would break backward
compatibility (say, $béka was $b + éka, and then it would be
$béka...). The funny is that Vel. is 100% Java, so it uses "UNICODE"
everywhere, thus perhaps it is quite easy to implement what you need.
Maybe it is just a minor modification in the Parser.jj... but because
of the backward compatibility issue... well, could it be accepted at
least in the case when somebody uses ${...} syntax?

-- 
Best regards,
 Daniel Dekany



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Foreign characters in VTL references ?

Posted by Guillaume Mathe <gm...@smartinnov.com>.
Peter Romianowski wrote:

> What you can always do is something like this:
>
> --- Java App filling the context --
>
> Map us_en = new HashMap();
> us_en.put ("Accentué", "Accentuated");
> ...
>
> context.put ("i18n", us_en);
>
Hi Peter,

This is nice for a 'one-shot' servlet, but unfortunately, we are 
building a full-size server app, with a (house-made) framework.
The i18n resources are taken from a database. In each servlet i'd have 
to build the hashmap etc. which has the same impact on performance.
And, even if the solution is 'nice' (with the macro), it's still a 
workaround... Thanx anyway.
Definitely not a show-stopper, but i18n VTL references would be an 
appreciated feature ;)

Regards,
Guillaume Mathe


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: Foreign characters in VTL references ?

Posted by Peter Romianowski <me...@gmx.de>.
What you can always do is something like this:

--- Java App filling the context --

Map us_en = new HashMap();
us_en.put ("Accentué", "Accentuated");
...

context.put ("i18n", us_en);

--- VTL ---

instead of trying ${Accentué} use $i18n.get("Accentué")

You could even wrap that call with a macro:

#macro (text $key)$i18n.get($key)#end

And then use: #text ("Accentué") (looks better for me)

---

I think this would be a good solution for your problem.

Peter

> -----Original Message-----
> From: Guillaume Mathe [mailto:gmathe@smartinnov.com] 
> Sent: Monday, January 20, 2003 4:53 PM
> To: Velocity Users List
> Subject: Re: Foreign characters in VTL references ?
> 
> 
> Peter Romianowski wrote:
> 
> >   Looking into the Velocity-Parser-Definition it only accepts 
> > ["a"-"z", "A"-"Z", "0"-"9", "-", "_"] as identifiers. So you cannot 
> > use other characters out of the box. You would have to change the 
> > Parser.jj/jjt Files and recompile Velocity to do that (Not 
> > recommendable). I just know for German Umlauts there is the 
> > possibility of a safe conversion (ä -> ae, ß -> ss and so 
> on). Could 
> > you use something like that for french too?
> >
> > Peter
> 
> Hi Peter,
> 
> Unfortunately not. For example, "Accentue" is the verb 
> (=Accentuating) 
> and "Accentué" is the past present form (=Accentuated). There's no 
> replacement like for umlaut, as french (and some other languages) 
> accents are integral part of the syntax, and of the grammar...
> 
> Oh and this 'lacking feature' would hamper serioulsy any (complex, i 
> admit) automatic dictionary-like velocity based app... 
> (instant velocity 
> parsing from GUI, anyone ? ;) )
> 
> Regards,
> Guillaume Mathe
> Smartinnov
> 
> 
> --
> To unsubscribe, e-mail:   
> <mailto:velocity-user-> unsubscribe@jakarta.apache.org>
> For 
> additional commands, 
> e-mail: <ma...@jakarta.apache.org>
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Foreign characters in VTL references ?

Posted by Guillaume Mathe <gm...@smartinnov.com>.
Peter Romianowski wrote:

>   Looking into the Velocity-Parser-Definition it only accepts
> ["a"-"z", "A"-"Z", "0"-"9", "-", "_"] as identifiers.
> So you cannot use other characters out of the box. You
> would have to change the Parser.jj/jjt Files and recompile
> Velocity to do that (Not recommendable). I just know for
> German Umlauts there is the possibility of a safe conversion
> (ä -> ae, ß -> ss and so on). Could you use something like that
> for french too?
>
> Peter

Hi Peter,

Unfortunately not. For example, "Accentue" is the verb (=Accentuating) 
and "Accentué" is the past present form (=Accentuated). There's no 
replacement like for umlaut, as french (and some other languages) 
accents are integral part of the syntax, and of the grammar...

Oh and this 'lacking feature' would hamper serioulsy any (complex, i 
admit) automatic dictionary-like velocity based app... (instant velocity 
parsing from GUI, anyone ? ;) )

Regards,
Guillaume Mathe
Smartinnov


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: Foreign characters in VTL references ?

Posted by Peter Romianowski <me...@gmx.de>.
  Looking into the Velocity-Parser-Definition it only accepts 
["a"-"z", "A"-"Z", "0"-"9", "-", "_"] as identifiers.
So you cannot use other characters out of the box. You
would have to change the Parser.jj/jjt Files and recompile
Velocity to do that (Not recommendable). I just know for 
German Umlauts there is the possibility of a safe conversion 
(ä -> ae, ß -> ss and so on). Could you use something like that 
for french too?

Peter

> -----Original Message-----
> From: Guillaume Mathe [mailto:gmathe@smartinnov.com] 
> Sent: Monday, January 20, 2003 3:58 PM
> To: Velocity Users List
> Subject: Foreign characters in VTL references ?
> 
> 
> Hi,
> 
> We're using Velocity in our server app for (very) various 
> purposes, one 
> is i18n.
> Every string or sentence laid out on our web app is actually a VTL 
> variable, which is the French index reference for that 
> string/sentence. For example: "${Bienvenue}" parses as 
> "Welcome" if the selected Locale 
> is en_US, or "Bienvenido" if the locale is spanish.
> The problem is, the VTL parser stops at foreign characters in 
> references, which are not in US-ASCII set, but (for example) only in 
> ISO-8859-1. That includes accentuated characters.
> For example, ${Accentué} (=Accentuated) is read by the parser like 
> ${Accentu}é, which is not "good", obviously -> the "Accentu" 
> reference 
> does not exist.
> Currently the workaround is to strip accents from references 
> put in the 
> context, but there's a performance hit... Not to mention the lack of 
> 'intuitiveness' for the web designer who writes the 
> templates. So i've tried to force ISO-8859-1 everywhere, but 
> still no positive sign 
> from the parser...
> Here are some test i've run. I dump the context to know if i can put 
> accentuated references in it, and it works (see the dump). 
> The problem 
> lies in the .vm file parsing, which treats foreign characters 
> like blank 
> space - or token separators, even though the encoding is specified in 
> both the .vm and velocity engine.
> 
> // Example start --
> public class TestServlet extends VelocityServlet
> {
> 
> protected Properties loadConfiguration (...) throws ...
> {
>    Properties p = new Properties();
> [...]
>    p.setProperty("input.encoding","ISO-8859-1");
>    p.setProperty("output.encoding","ISO-8859-1");
>    return p;
> }
> 
> public Template handleRequest(...,Context ctx)
> {
>    response.setContentType("text/html; charset=ISO-8859-1"); [...]
>   // Localizer is on object of mine...
>    ctx.put("Accentué",localizer.getString("Accentué"));
> [...]
>    Object[] keys=ctx.getKeys();
>    for(int i=0;i<keys.length;i++)
>    {
>    System.out.println("->key["+
>     keys[i]+"]=["+
>     ctx.get(keys[i].toString())+"]");
>    }
>    return parseTemplate("test.vm");
> }
> // Example end --
> 
> // output start --
>   ->key[Accentué]=[Accentuated]
>   ->key[browser]=[mozilla]
>   ->key[req]=[org.apache.coyote.tomcat4.CoyoteRequestFacade@4acfcd]
>   ->key[httpuseragent]=[Mozilla/5.0 (X11; U; Linux i686; 
> en-US; rv:1.2b) 
> Gecko/20021016]
>   ->key[res]=[org.apache.coyote.tomcat4.CoyoteResponseFacade@90d8ea]
> [...]
> // output end --
> 
> Side note: the result using an instance of VelocityEngine is 
> the same, 
> though i can specify the encoding:
> // Example start --
> // out is my response output stream...
> BufferedWriter print=
>    new BufferedWriter(new OutputStreamWriter(out,"ISO-8859-1"));
> ve.mergeTemplate(template_name,"ISO-8859-1",ctx,print);
> // Example end --
> 
> 
> Well this is not *top priority* but, one day, i hope to be able to 
> remove that ugly filter that stands between my servlet output and VTL 
> macros...
> 
> Regards,
> Guillaume Mathe
> Smartinnov
> 
> 
> --
> To unsubscribe, e-mail:   
> <mailto:velocity-user-> unsubscribe@jakarta.apache.org>
> For 
> additional commands, 
> e-mail: <ma...@jakarta.apache.org>
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>