You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Mike Engelhart <me...@earthtrip.com> on 2000/05/17 22:27:32 UTC

external entities

OK, this is probably an FAQ but I can't locate the answer anywhere and it's
driving me nuts.  I've tried adding the entities in a DTD and nothing seems
to work.

How do i prevent Cocoon (xalan/xerces) from changing during a transformation
to HTML:

&auml;
  to 
&amp;auml;


thanks,

Mike


Re: external entities

Posted by Paul Russell <pa...@luminas.co.uk>.
On Wed, May 17, 2000 at 11:29:59PM +0200, Giacomo Pati wrote:
> <!DOCTYPE page [
>  <!ENTITY % characters SYSTEM "characters.ent">
>  %characters;
> ]>

> I've found the characters.ent file in the cvs of stylebook in the
> directory styles/apachexml/dtd.

Indeed, and if you want to go *totally* to town:

  http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
  http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
  http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent

... are the full xhtml entity sets.

-- 
Paul Russell                               <pa...@luminas.co.uk>
Technical Director,                   http://www.luminas.co.uk
Luminas Ltd.

Re: external entities

Posted by Giacomo Pati <Gi...@pwr.ch>.
Mike Engelhart wrote:
> 
> OK, this is probably an FAQ but I can't locate the answer anywhere and it's
> driving me nuts.  I've tried adding the entities in a DTD and nothing seems
> to work.
> 
> How do i prevent Cocoon (xalan/xerces) from changing during a transformation
> to HTML:
> 
> &auml;
>   to
> &amp;auml;
> 
> thanks,

I use the following code and had no problem since:

<?xml version="1.0"?>

<!DOCTYPE page [
 <!ENTITY % characters SYSTEM "characters.ent">
 %characters;
]>

<page>
  Sch&ouml;, gell!
</page>
         
I've found the characters.ent file in the cvs of stylebook in the
directory styles/apachexml/dtd.

Giacomo

-- 
PWR GmbH, Organisation & Entwicklung      Tel:   +41 (0)1 856 2202
Giacomo Pati, CTO/CEO                     Fax:   +41 (0)1 856 2201
Hintereichenstrasse 7                     Mailto:Giacomo.Pati@pwr.ch
CH-8166 Niederweningen                    Web:   http://www.pwr.ch

Re: Natural Languages [was Re: external entities]

Posted by Mike Engelhart <me...@earthtrip.com>.
on 5/18/00 7:08 AM, Stefano Mazzocchi at stefano@apache.org wrote:

> Question: is it _necessary_ to use Java ResourceBundles?
> 
> I mean, the use of xml files as resource bundles could simplify your
> job...
> 
> <resources>
> <resource xml:lang=".." name="..." value="..."/>
> ...
> </resources>
> 
> then we could create some xsp taglib that does
> 
> <xsp:page>
> <i18n:resources href="..."/>
> <document>
> <title><i18n:resource name="title"/></title>
> </document>
> ...
> </xsp:page>
> 
> which could generate some java code like
> 
> String lang = request.getLocale().getLanguage();
> ...
> if (lang == "en")
> out("Hello World!");
> else if (lang == "it")
> out("Ciao a tutti!");
> else if (lang == "fr")
> out("Salut tout le monde!);
> 
> having parsed and interpreted the resources file at compilation time.
> 
> Not a trivial taglib, but I think it's impossible even today.
> 
> Ricardo, am I right?

That would be very, very nice but as the taglib documentation is still in
it's infancy, trying to make a taglib like that would be a laborious slow
process for me and I need to finish this project first.
But... having a big if/then to handle the language determination seems a
little verbose.  
Maybe what would be good would be to parse the xml file at application
start, maybe have a configuration setting where Cocoon looks for a
languages.xml document and then read that into a map of HashMap's. then in
the tablib do something like:
<xsp:page>
    <title><i18n:resource name="title"/></title>
</xsp:page>

which would generate:
    String lang = request.getLocale().getLanguage();
    out((String) ((Map) LanguageMap.get(lang)).get(name));

or something like that.  One thing to mention though is that
ResourceBundle's automatically fall back to their inherited ResourceBundle
so it's easy to use. This would probably be non-trivial to create in an XML
version but then again, I may just not have any idea how to do it.

Mike


Natural Languages [was Re: external entities]

Posted by Stefano Mazzocchi <st...@apache.org>.
Mike Engelhart wrote:
> 
> Mike Engelhart wrote:
> 
> > I get the data out of the ResourceBundle by calling getString() on the
> > correct resourceBundle which returns say "Zus&auml;tzliche Details". Do you
> > think that has something to do with it?
> 
> on 5/17/00 5:28 PM, Paul Russell at paul@luminas.co.uk wrote:
> 
> > Ahh. Yeah, that won't work - the &auml; gets converted to unicode
> > during the parsing phase, however if you use getString from within
> > an XSP, the text will never be parsed. Two options - you could
> > either write a chunk of code that takes the above and turns it into
> > unicode by looking up the character code in a hash, or you could
> > put the actual character into the resourcebundle itself (using
> > some strange key combination or other). Either should work.
> Crap... this isn't going right.  I can't put them into my XSP's directly or
> else i'm going to have to put a bunch of non-data information like text
> labels into my XML documents and I completely lose separation of
> presentation/data which is why I'm using Cocoon in the first place.
> 
> any other ideas - anyone :-) I'm going to try posting to dev to see if
> anyone else has any ideas. thanks

Question: is it _necessary_ to use Java ResourceBundles?

I mean, the use of xml files as resource bundles could simplify your
job...

<resources>
 <resource xml:lang=".." name="..." value="..."/>
 ...
</resources>

then we could create some xsp taglib that does

<xsp:page>
 <i18n:resources href="..."/>
 <document>
  <title><i18n:resource name="title"/></title>
 </document>
 ...
</xsp:page>

which could generate some java code like

 String lang = request.getLocale().getLanguage();
 ...
 if (lang == "en")
   out("Hello World!");
 else if (lang == "it")
   out("Ciao a tutti!");
 else if (lang == "fr")
   out("Salut tout le monde!);

having parsed and interpreted the resources file at compilation time.

Not a trivial taglib, but I think it's impossible even today.

Ricardo, am I right?

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------



Re: external entities

Posted by Mike Engelhart <me...@earthtrip.com>.
Mike Engelhart wrote:

> I get the data out of the ResourceBundle by calling getString() on the
> correct resourceBundle which returns say "Zus&auml;tzliche Details". Do you
> think that has something to do with it?

on 5/17/00 5:28 PM, Paul Russell at paul@luminas.co.uk wrote:

> Ahh. Yeah, that won't work - the &auml; gets converted to unicode
> during the parsing phase, however if you use getString from within
> an XSP, the text will never be parsed. Two options - you could
> either write a chunk of code that takes the above and turns it into
> unicode by looking up the character code in a hash, or you could
> put the actual character into the resourcebundle itself (using
> some strange key combination or other). Either should work.
Crap... this isn't going right.  I can't put them into my XSP's directly or
else i'm going to have to put a bunch of non-data information like text
labels into my XML documents and I completely lose separation of
presentation/data which is why I'm using Cocoon in the first place.

any other ideas - anyone :-) I'm going to try posting to dev to see if
anyone else has any ideas. thanks

Mike


Re: external entities

Posted by Mike Engelhart <me...@earthtrip.com>.
on 5/17/00 5:28 PM, Paul Russell at paul@luminas.co.uk wrote:

> Ahh. Yeah, that won't work - the &auml; gets converted to unicode
> during the parsing phase, however if you use getString from within
> an XSP, the text will never be parsed. Two options - you could
> either write a chunk of code that takes the above and turns it into
> unicode by looking up the character code in a hash, or you could
> put the actual character into the resourcebundle itself (using
> some strange key combination or other). Either should work.
Yeah, that's what I just figured out by doing some tests.  I'm going to
start messing with the strange characters in my resource files and see what
comes up.  I'll post back soon.

thanks again.

Mike


Re: external entities

Posted by Ulrich Mayring <ul...@denic.de>.
Mike Engelhart wrote:
> 
> One down.  Putting the actual character into the property ResourceBundle
> does not work.

Remember to set the encoding of any XSP pages that deal with these
ResourceBundles. I have all my XML pages set to ISO-8859-1 and this
works very well with German Umlaut characters and many more.

Ulrich

-- 
Ulrich Mayring
DENIC eG, Systementwicklung

Re: external entities

Posted by Mike Engelhart <me...@earthtrip.com>.
on 5/17/00 5:28 PM, Paul Russell at paul@luminas.co.uk wrote:

> Ahh. Yeah, that won't work - the &auml; gets converted to unicode
> during the parsing phase, however if you use getString from within
> an XSP, the text will never be parsed. Two options - you could
> either write a chunk of code that takes the above and turns it into
> unicode by looking up the character code in a hash, or you could
> put the actual character into the resourcebundle itself (using
> some strange key combination or other). Either should work.
One down.  Putting the actual character into the property ResourceBundle
does not work.  
I may just try putting them into the XSP page itself instead. Unfortunately
XSP's and ResourceBundle's don't play together very well  and I have to use
a wrapper to call the getString() method or it throws an Exception which is
why I went with the XSL approach in the first place.

Mike


Re: external entities

Posted by Paul Russell <pa...@luminas.co.uk>.
On Wed, May 17, 2000 at 04:34:47PM -0500, Mike Engelhart wrote:
> _ADDITIONAL_DETAILS=Zus&auml;tzliche Details

> I get the data out of the ResourceBundle by calling getString() on the
> correct resourceBundle which returns say "Zus&auml;tzliche Details". Do you
> think that has something to do with it?

Ahh. Yeah, that won't work - the &auml; gets converted to unicode
during the parsing phase, however if you use getString from within
an XSP, the text will never be parsed. Two options - you could
either write a chunk of code that takes the above and turns it into
unicode by looking up the character code in a hash, or you could
put the actual character into the resourcebundle itself (using
some strange key combination or other). Either should work.

-- 
Paul Russell                               <pa...@luminas.co.uk>
Technical Director,                   http://www.luminas.co.uk
Luminas Ltd.

Re: external entities

Posted by Paul Russell <pa...@luminas.co.uk>.
On Wed, May 17, 2000 at 04:26:37PM -0500, Mike Engelhart wrote:
> &#228;
> turns into
> &amp;#228;
> 
> in my HTML output.  Is that the normal behavior??

Hmm. Must say, I wouldn't have thought so, but I could be
missing something.

-- 
Paul Russell                               <pa...@luminas.co.uk>
Technical Director,                   http://www.luminas.co.uk
Luminas Ltd.

[SOLVED] Re: external entities

Posted by Mike Engelhart <me...@earthtrip.com>.
on 5/18/00 7:51 AM, Thomas Steinborn at thomas.steinborn@exceloncorp.com
wrote:

>> I guess what I'm wondering is that if I have an ASCII text file with a
>> "ä" in it, why doesn't Java create the correct unicode representation
>> of that character in my ResourceBundle so that when I call getString()
>> it just shows up.  this is why I started screwing around with sticking
>> entities in my properties files.
>> 
> Assuming that you are running Windows the encoing that ascii file is
> placed on the hard disk is windows-1252 (XML name) or Cp1252 (Java
> name).  But it is not ISO-8859-1.  Maybe that is the problem?
> 
> Thomas
Muchas gracias!!! 

Solved... The text encoding of my files was not correct.  I don't know why I
assumed that my IDE would automatically create the correct version but it
didn't.  Anyway, I ran my PropertyResourceBundle files through a text
encoding converter to make them all ISO-8859-1 and now everything works
groovy..  Thanks for the heads up.

Mike


Re: external entities

Posted by Thomas Steinborn <th...@exceloncorp.com>.
> I guess what I'm wondering is that if I have an ASCII text file with a
> "ä" in it, why doesn't Java create the correct unicode representation
> of that character in my ResourceBundle so that when I call getString()
> it just shows up.  this is why I started screwing around with sticking
> entities in my properties files.
> 

Assuming that you are running Windows the encoing that ascii file is
placed on the hard disk is windows-1252 (XML name) or Cp1252 (Java
name).  But it is not ISO-8859-1.  Maybe that is the problem?

Thomas

Re: external entities

Posted by Mike Engelhart <me...@earthtrip.com>.
on 5/18/00 8:40 AM, Ulrich Mayring at ulim@denic.de wrote:

> This shouldn't have anything to do with your browser setting, because
> entities are rendered independent of any browser language setting. So
> you say you are constructing a German (or whatever) locale in your Java
> code and pass it to the ResourceBundle? I have been doing this and it
> worked for me, that's why I wonder.
Turns out that on MacOS, I had to convert the files to be ISO-8859-1 before
importing them into my jar file where all my PropertyResourceBundle's are
stored.

> getString() is a locale-independent method, however, the ResourceBundle
> static method isn't. So getString() will return values appropriate for
> the locale the ResourceBundle was constructed with.
I know. Before calling getString() I called
ResourceBundle bundle = ResourceBundle.getBundle("com.earthtrip.myBundle",
locale)
and then 
bundle.getString();

mike


Re: external entities

Posted by Ulrich Mayring <ul...@denic.de>.
Mike Engelhart wrote:
> 
> on 5/18/00 6:36 AM, Ulrich Mayring at ulim@denic.de wrote:
> 
> > The error happens in your Java code, not in cocoon, which already gets a
> > '?' from Java. Look in the docs for ResourceBundle, there are
> > locale-aware methods for getting a ResourceBundle.
> >
> > Ulrich
> Not quite.  I AM using Locale-aware methods or else I wouldn't be having a
> problem because everything would be in English.  When I set my browser to
> have a language setting of "de" and call my XSP page, the output is mangled

This shouldn't have anything to do with your browser setting, because
entities are rendered independent of any browser language setting. So
you say you are constructing a German (or whatever) locale in your Java
code and pass it to the ResourceBundle? I have been doing this and it
worked for me, that's why I wonder.

> with a "?" in place of the special characters that are in my Properties
> files.   What Paul was saying is that I need to somehow get the correct
> characters into my PropertyResourceBundles but how the heck do you do that?
> I'm using a CodeWarrior as my development tool and there isn't any setting
> for character encoding.  I guess what I'm wondering is that if I have an
> ASCII text file with a "ä" in it, why doesn't Java create the correct
> unicode representation of that character in my ResourceBundle so that when
> I
> call getString() it just shows up.  this is why I started screwing around
> with sticking entities in my properties files.

getString() is a locale-independent method, however, the ResourceBundle
static method isn't. So getString() will return values appropriate for
the locale the ResourceBundle was constructed with.

Ulrich

-- 
Ulrich Mayring
DENIC eG, Systementwicklung

Re: external entities

Posted by Mike Engelhart <me...@earthtrip.com>.
on 5/18/00 6:36 AM, Ulrich Mayring at ulim@denic.de wrote:

> The error happens in your Java code, not in cocoon, which already gets a
> '?' from Java. Look in the docs for ResourceBundle, there are
> locale-aware methods for getting a ResourceBundle.
> 
> Ulrich
Not quite.  I AM using Locale-aware methods or else I wouldn't be having a
problem because everything would be in English.  When I set my browser to
have a language setting of "de" and call my XSP page, the output is mangled
with a "?" in place of the special characters that are in my Properties
files.   What Paul was saying is that I need to somehow get the correct
characters into my PropertyResourceBundles but how the heck do you do that?
I'm using a CodeWarrior as my development tool and there isn't any setting
for character encoding.  I guess what I'm wondering is that if I have an
ASCII text file with a "ä" in it, why doesn't Java create the correct
unicode representation of that character in my ResourceBundle so that when I
call getString() it just shows up.  this is why I started screwing around
with sticking entities in my properties files.

Mike


Re: external entities

Posted by Ulrich Mayring <ul...@denic.de>.
Mike Engelhart wrote:
> 
> because this is what I have in my PropertyResourceBundle:
> ADDITIONAL_DETAILS=Zusätzliche Details
> and this is what is displayed by my XSL sheet when I call getString() using
> Xalan extension (where $bundle is the correct bundle) <xsl:value-of
> select="java:getString($bundle, 'ADDITIONAL_DETAILS')"/>
> 
> Zus?tzliche Details
> 
> The question mark in the first word is actually what is output to the
> browser, not the special character for &auml;
> What must be stressed is that the ResourceBundle is not being read in by
> the
> XSP, it's being read in by a Xalan Extension in the XSL transformation.
> This has something to do with why it's not working but I can't figure it
> out.  Does anyone have any understanding of how Xalan processing works
> under
> the hood.  I would have thought that my getString() calls would be before
> the final transformation happened...

The error happens in your Java code, not in cocoon, which already gets a
'?' from Java. Look in the docs for ResourceBundle, there are
locale-aware methods for getting a ResourceBundle.

Ulrich

-- 
Ulrich Mayring
DENIC eG, Systementwicklung

Re: external entities

Posted by Mike Engelhart <me...@earthtrip.com>.
on 5/18/00 3:16 AM, Ulrich Mayring at ulim@denic.de wrote:

> It seems very strange to me to store HTML entities in a Jave
> ResourceBundle - why is that so? I think if you had "Zusätzliche
> Details" in the ResourceBundle, then you could let cocoon worry about
> translating the Umlaut to an HTML entity. My files contain German Umlaut
> and other special characters as they are written by the author and I do
> the transformation in cocoon.
> 
> Ulrich
I agree. I'm just trying to get this to work so I tried it with HTML
entities in the ResourceBundle itself because having the special characters
in the ResourceBundle didnt work.   Something is not working correctly
because this is what I have in my PropertyResourceBundle:
ADDITIONAL_DETAILS=Zusätzliche Details
and this is what is displayed by my XSL sheet when I call getString() using
Xalan extension (where $bundle is the correct bundle) <xsl:value-of
select="java:getString($bundle, 'ADDITIONAL_DETAILS')"/>

Zus?tzliche Details

The question mark in the first word is actually what is output to the
browser, not the special character for &auml;
What must be stressed is that the ResourceBundle is not being read in by the
XSP, it's being read in by a Xalan Extension in the XSL transformation.
This has something to do with why it's not working but I can't figure it
out.  Does anyone have any understanding of how Xalan processing works under
the hood.  I would have thought that my getString() calls would be before
the final transformation happened...

Thanks for your help
Mike


Re: external entities

Posted by Ulrich Mayring <ul...@denic.de>.
Mike Engelhart wrote:
> 
> The properties document has things like this:
> 
> _ADDITIONAL_DETAILS=Zus&auml;tzliche Details
> 
> I get the data out of the ResourceBundle by calling getString() on the
> correct resourceBundle which returns say "Zus&auml;tzliche Details". Do you
> think that has something to do with it?

It seems very strange to me to store HTML entities in a Jave
ResourceBundle - why is that so? I think if you had "Zusätzliche
Details" in the ResourceBundle, then you could let cocoon worry about
translating the Umlaut to an HTML entity. My files contain German Umlaut
and other special characters as they are written by the author and I do
the transformation in cocoon.

Ulrich

-- 
Ulrich Mayring
DENIC eG, Systementwicklung

Re: external entities

Posted by Mike Engelhart <me...@earthtrip.com>.
> Do you want the bad news, or the bad news?
> 
> You need to declare them as entities in your XML DTD, eg:
> 
> <?xml version="1.0" standalone="yes"?>
> <!DOCTYPE rootelement [
> <!ENTITY auml "&#228;">
> ]>
> 
> Then when you do &auml; in your document...
> 
> <rootelement>
> &auml;
> </rootelement>
> 
> ... it'll get parsed and end up as a unicode character for
> the rest of its journey through cocoon. *Hopefully*, at the
> other end, the serializer will convert it into an HTML entity.
> I'm pretty sure Cocoon1.x does this already, and I've just
> posted a patch to make Cocoon2 do it.
> 

Well, it's certainly bad news. I just tried what you suggested and no go.
But, my situation may be more complex than the average page.
I have XSP pages which pass information to an stylesheet that uses Xalan
extensions to grab values out of PropertyResourceBundles for localization.
The properties document has things like this:

_ADDITIONAL_DETAILS=Zus&auml;tzliche Details

I get the data out of the ResourceBundle by calling getString() on the
correct resourceBundle which returns say "Zus&auml;tzliche Details". Do you
think that has something to do with it?

Mike


Re: external entities

Posted by Mike Engelhart <me...@earthtrip.com>.
on 5/17/00 3:44 PM, Paul Russell at paul@luminas.co.uk wrote:

> Do you want the bad news, or the bad news?
> 
> You need to declare them as entities in your XML DTD, eg:
> 
> <?xml version="1.0" standalone="yes"?>
> <!DOCTYPE rootelement [
> <!ENTITY auml "&#228;">
> ]>
> 
> Then when you do &auml; in your document...
> 
> <rootelement>
> &auml;
> </rootelement>
> 
> ... it'll get parsed and end up as a unicode character for
> the rest of its journey through cocoon. *Hopefully*, at the
> other end, the serializer will convert it into an HTML entity.
> I'm pretty sure Cocoon1.x does this already, and I've just
> posted a patch to make Cocoon2 do it.

Cool -thanks.  the one thing that perplexes me is that if I use the entity
codes directly they get mangled.

for example:
&#228;
turns into
&amp;#228;

in my HTML output.  Is that the normal behavior??

Thanks,

Mike


Re: external entities

Posted by Paul Russell <pa...@luminas.co.uk>.
On Wed, May 17, 2000 at 03:27:32PM -0500, Mike Engelhart wrote:
> How do i prevent Cocoon (xalan/xerces) from changing during a transformation
> to HTML:
> 
> &auml;
>   to 
> &amp;auml;

Do you want the bad news, or the bad news?

You need to declare them as entities in your XML DTD, eg:

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE rootelement [
	<!ENTITY auml "&#228;">
]>

Then when you do &auml; in your document...

<rootelement>
	&auml;
</rootelement>

... it'll get parsed and end up as a unicode character for
the rest of its journey through cocoon. *Hopefully*, at the
other end, the serializer will convert it into an HTML entity.
I'm pretty sure Cocoon1.x does this already, and I've just
posted a patch to make Cocoon2 do it.


-- 
Paul Russell                               <pa...@luminas.co.uk>
Technical Director,                   http://www.luminas.co.uk
Luminas Ltd.