You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Mike Engelhart <me...@earthtrip.com> on 2000/05/18 01:22:34 UTC

External Entities & bundles

OK, I've run into yet another problem with Cocoon and i18n...somehow I feel
that i'm the only person making a localized application. :-)

Anyway, because I want to separate presentation from data, I have resorted
to using ResourceBundle's from within my stylesheets to access localized
strings based on the users browser setting (or session setup).  This works
fine except that most non-english languages use non-ascii characters like
&auml;

When I get the string from my PropertyResourceBundle by using this call:

<xsl:value-of select="java:getString($bundle, 'SOME_TEXT')"/>

Assume that the key/value looks like:
SOME_TEXT=S&ouml;me Text Label


I just get the actual entity in my HTML output.  The same goes if I put this
in the ResourceBundle
SOME_TEXT=S&#223;me Text Label

One suggestion on the users list was to do the ResourceBundle.getString()
call from the XSP which may work but this totally ruins the separation of
presentation from data.  I don't want to have things like this in my XML
<textlabel>Some Text Label<textlabel> which has nothing to do with my data.

I tried putting the entity's characters directly into the
PropertyResourceBundle but that just comes out as "?" in my HTML.

Anyone who has any ideas about this please let me know.

thanks,
Mike




Re: External Entities & bundles

Posted by Mike Engelhart <me...@earthtrip.com>.
on 5/18/00 6:39 AM, Paul Russell at paul@luminas.co.uk wrote:

> The only way 'around' this (although it is the correct
> behaviour) is to *somehow* get the character you're after
> into unicode. Two ways of doing that - firstly by putting
> the character into whatever encoding the ResouceBundle is
> using, 
How do you do this?  Isn't Java already converting the source code into
Unicode?  Does it not understand the ascii characters that I type in using
special key commands on my Mac.  For example, if I have a
PropertyResourceBundle file that has a special character like this - รถ -,
what happens to that character when the PropertyResourceBundle gets loaded
by Java?   It appears that it's just converting it to a "?" because it
doesn't understand the encoding.   Do I need to do something special in my
text files to tell Java what is going on?

This is mind-boggling that this doesn't work....

Mike


Re: External Entities & bundles

Posted by Mike Engelhart <me...@earthtrip.com>.
on 5/18/00 6:39 AM, Paul Russell at paul@luminas.co.uk wrote:

> All XML event streams in java are unicode (as they should be),
> this means that as soon as the XML file is parsed into a SAX
> stream, the entities are replaced by their unicode character
> alter-egos. This means that when you bring the ResourceBundle
> in (which is already encoded in some way - *not* using character
> entities), you are bring in the '&','o','u','m','l' and ';'
> unicode characters, *not* an entity itself. Therefore, when
> the XML stream is serialized to HTML, it becomes "&amp;ouml;".
Very true, but as I mentioned in a previous post, my situation doesn't
change when i put the actual special character (not the entities) into my
ResourceBundle.  This is the part that I don't understand.

thanks,

Mike


Re: External Entities & bundles

Posted by Paul Russell <pa...@luminas.co.uk>.
On Thu, May 18, 2000 at 06:19:33AM -0500, Mike Engelhart wrote:
> on 5/18/00 3:13 AM, Paul Russell at paul@luminas.co.uk wrote:
> No, I haven't gone that far yet.  Is that what you're doing? It seems like a
> lot of extra overhead for something that should be working to begin with.
> I mean, parsing every single word on a page to look for ampersands seems
> like a lot of extra work for a busy site??

What I've been doing is at the other end (the output side) and
is basically doing this backwards; taking unicode characters and
escaping them into entities. In my case it isn't much of an over-
head as we already have to think down to the character level.
(and I ignore anything with a code of <128, because they don't
need escaping anyway). If you were doing it the other way
around, again it's not too bad, because you can do an indexOf("&")
(which is fast) on the incoming resources to find the entities.

The thing is that this *shouldn't* be 'working' as you put it.
If you put something into a resource bundle, it is text, *not*
XML. The fact that you happen to bring it into an XML event
stream using XSP is academic.

All XML event streams in java are unicode (as they should be),
this means that as soon as the XML file is parsed into a SAX
stream, the entities are replaced by their unicode character
alter-egos. This means that when you bring the ResourceBundle
in (which is already encoded in some way - *not* using character
entities), you are bring in the '&','o','u','m','l' and ';'
unicode characters, *not* an entity itself. Therefore, when
the XML stream is serialized to HTML, it becomes "&amp;ouml;".

The only way 'around' this (although it is the correct
behaviour) is to *somehow* get the character you're after
into unicode. Two ways of doing that - firstly by putting
the character into whatever encoding the ResouceBundle is
using, or secondly by adding another layer of encoding (such
as using HTML character entities) over the top and then using
a decoding algorithm on the incoming resources.


-- 
Paul Russell                               <pa...@luminas.co.uk>
Technical Director,                   http://www.luminas.co.uk
Luminas Ltd.

Re: External Entities & bundles

Posted by Mike Engelhart <me...@earthtrip.com>.
on 5/18/00 3:13 AM, Paul Russell at paul@luminas.co.uk wrote:

> Did you try my other suggestion of decoding the character
> entities after you read them in? Basically, you'd just need
> to write some code that searched through the incoming data
> and looked for anything beginning with an ampersand. When
> it finds one, it should look up the name of an entity in
> a hash to find the unicode character number for it.
> 
> I used a simple perl script to take the entity declarations
> from the w3c .ent files and build a properties file from
> them. You can then read these into a Properties oject and
> treat it like a hash. I'd suggest you implement this as
> a class outside your XSP page and <xsp:include/> it in.
No, I haven't gone that far yet.  Is that what you're doing? It seems like a
lot of extra overhead for something that should be working to begin with.
I mean, parsing every single word on a page to look for ampersands seems
like a lot of extra work for a busy site??

Also, this SHOULD just work.  What is xalan doing that is mangling the
output of the stylesheet transforms.  Java classes are unicode and so are
XML documents so I'm not sure what's going on.  Is it because of the
character set of the OS that I'm using to type of these documents?  I tried
saving the documents as UniCode before adding them to my classpath but that
just made things worse.

Mike


Re: External Entities & bundles

Posted by Paul Russell <pa...@luminas.co.uk>.
Mike,

On Wed, May 17, 2000 at 06:22:34PM -0500, Mike Engelhart wrote:
> I tried putting the entity's characters directly into the
> PropertyResourceBundle but that just comes out as "?" in my HTML.

Did you try my other suggestion of decoding the character
entities after you read them in? Basically, you'd just need
to write some code that searched through the incoming data
and looked for anything beginning with an ampersand. When
it finds one, it should look up the name of an entity in
a hash to find the unicode character number for it.

I used a simple perl script to take the entity declarations
from the w3c .ent files and build a properties file from
them. You can then read these into a Properties oject and
treat it like a hash. I'd suggest you implement this as
a class outside your XSP page and <xsp:include/> it in.

-- 
Paul Russell                               <pa...@luminas.co.uk>
Technical Director,                   http://www.luminas.co.uk
Luminas Ltd.