You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Marco Dubbeld <ma...@davtec.nl> on 2004/03/01 12:08:07 UTC

patch to Enhance CIncludeTrasnformer to handle encoding of parameters

Hi cocoon developers,

1st - Thank you for making cocoon. Cocoon is good stuff, but I guess you
know that already. ;)
2nd - I've been using cocoon in a little project for a search page
(returns multilingual results), however found some limitation while
using the CInclude transformer. I've posted a patch for it under 

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26924

on 2004-02-13. It had to do with encoding of request parameters. The
related patch to excalibur sourceresolver has already been applied.

Without becoming pushy, just wanted to know if this can be applied......

Thanks,
Marco


RE: patch to Enhance CIncludeTrasnformer to handleencodingofparameters

Posted by Marco Dubbeld <ma...@davtec.nl>.
On Wed, 2004-03-03 at 09:45, Carsten Ziegeler wrote:
> Bruno Dumon wrote: 
> 
> > > How to determine most properly the encoding of the events 
> > or the input 
> > > source I do not precisly know.
> > > java.sun.com is down from my location so one half of my brains is 
> > > blocked. If it's back I try to search.
> > > 
> > > Maybe someone on the list knows ?
> > 
> > You just need to choose something. I don't think SAX provides 
> > information about the encoding of the original document. Always using
> > UTF-8 should be a safe choice.
> > 
> Thanks Bruno.
> 
> Marco, what do you think? Would UTF-8 work for you?
Yes, UTF-8 would work. And you are absolutely right in keeping the
startSerializedXMLRecording!

I will come back on the use of XMLUtils and the serialized XML recording
in the AbstractSAXTransformer, because the need to set the encoding in
the transformer for serializing the XML does not make sense, since
everything comes in as string.

Thanks

> 
> 
> Carsten


RE: patch to Enhance CIncludeTrasnformer to handleencodingofparameters

Posted by Carsten Ziegeler <cz...@s-und-n.de>.
Bruno Dumon wrote: 

> > How to determine most properly the encoding of the events 
> or the input 
> > source I do not precisly know.
> > java.sun.com is down from my location so one half of my brains is 
> > blocked. If it's back I try to search.
> > 
> > Maybe someone on the list knows ?
> 
> You just need to choose something. I don't think SAX provides 
> information about the encoding of the original document. Always using
> UTF-8 should be a safe choice.
> 
Thanks Bruno.

Marco, what do you think? Would UTF-8 work for you?


Carsten


RE: patch to Enhance CIncludeTrasnformer to handle encodingofparameters

Posted by Bruno Dumon <br...@outerthought.org>.
On Tue, 2004-03-02 at 14:57, Marco Dubbeld wrote:
> On Tue, 2004-03-02 at 13:29, Carsten Ziegeler wrote:
> > Marco Dubbeld wrote:
> > > 
> > > Yep! The value element may contain UTF-8, with chinese 
> > > characters or other non ISO-8859 encoding characters. While 
> > > testing, the 
> > > 
> > > this.startSerializedXMLRecording(XMLUtils.defaultSerializeToXM
> > > LFormat(true));
> > > will use ISO-8859 encoding (see the properties given back 
> > > from XMLUtils). However we should use a property set with the 
> > > encoding from the document we are transforming. Otherwise 
> > > this causes UTFDataFormatException for chinese UTF-8 for example.
> > 
> > So the best way would be to pass the encoding of the document
> > to the XMLUtils used for serializing. Is this possible?
> Properties props = XMLUtils.defaultSerializeToXMLFormat(true);
> String encoding = ??????????
> props.set(OutputKeys.ENCODING, encoding);
> this.startSerializedXMLRecording(props);
> 
> How to determine most properly the encoding of the events or the input
> source I do not precisly know.
> java.sun.com is down from my location so one half of my brains is
> blocked. If it's back I try to search. 
> 
> Maybe someone on the list knows ?

You just need to choose something. I don't think SAX provides
information about the encoding of the original document. Always using
UTF-8 should be a safe choice.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


RE: patch to Enhance CIncludeTrasnformer to handle encodingofparameters

Posted by Marco Dubbeld <ma...@davtec.nl>.
On Tue, 2004-03-02 at 13:29, Carsten Ziegeler wrote:
> Marco Dubbeld wrote:
> > 
> > Yep! The value element may contain UTF-8, with chinese 
> > characters or other non ISO-8859 encoding characters. While 
> > testing, the 
> > 
> > this.startSerializedXMLRecording(XMLUtils.defaultSerializeToXM
> > LFormat(true));
> > will use ISO-8859 encoding (see the properties given back 
> > from XMLUtils). However we should use a property set with the 
> > encoding from the document we are transforming. Otherwise 
> > this causes UTFDataFormatException for chinese UTF-8 for example.
> 
> So the best way would be to pass the encoding of the document
> to the XMLUtils used for serializing. Is this possible?
Properties props = XMLUtils.defaultSerializeToXMLFormat(true);
String encoding = ??????????
props.set(OutputKeys.ENCODING, encoding);
this.startSerializedXMLRecording(props);

How to determine most properly the encoding of the events or the input
source I do not precisly know.
java.sun.com is down from my location so one half of my brains is
blocked. If it's back I try to search. 

Maybe someone on the list knows ?


> 
> > 
> > To use xml recording for a cinclude value element did not 
> > make sense to me anyways, so I choose to use text recording 
> > instead to prevent the problem. 
> > 
> Yes, that's true, I'm wondering about that as well. But unfortunately
> your change is incompatible - although I think it doesn't affect
> someone.
> 
> Carsten 


RE: patch to Enhance CIncludeTrasnformer to handle encodingofparameters

Posted by Carsten Ziegeler <cz...@s-und-n.de>.
Marco Dubbeld wrote:
> 
> Yep! The value element may contain UTF-8, with chinese 
> characters or other non ISO-8859 encoding characters. While 
> testing, the 
> 
> this.startSerializedXMLRecording(XMLUtils.defaultSerializeToXM
> LFormat(true));
> will use ISO-8859 encoding (see the properties given back 
> from XMLUtils). However we should use a property set with the 
> encoding from the document we are transforming. Otherwise 
> this causes UTFDataFormatException for chinese UTF-8 for example.

So the best way would be to pass the encoding of the document
to the XMLUtils used for serializing. Is this possible?

> 
> To use xml recording for a cinclude value element did not 
> make sense to me anyways, so I choose to use text recording 
> instead to prevent the problem. 
> 
Yes, that's true, I'm wondering about that as well. But unfortunately
your change is incompatible - although I think it doesn't affect
someone.

Carsten 


RE: patch to Enhance CIncludeTrasnformer to handle encoding ofparameters

Posted by Marco Dubbeld <ma...@davtec.nl>.
Yep! The value element may contain UTF-8, with chinese characters or
other non ISO-8859 encoding characters. While testing, the 

this.startSerializedXMLRecording(XMLUtils.defaultSerializeToXMLFormat(true)); 
will use ISO-8859 encoding (see the properties given back from
XMLUtils). However we should use a property set with the encoding from
the document we are transforming. Otherwise this causes
UTFDataFormatException for chinese UTF-8 for example.

To use xml recording for a cinclude value element did not make sense to
me anyways, so I choose to use text recording instead to prevent the
problem. 

Thanks,
Marco




On Tue, 2004-03-02 at 10:51, Carsten Ziegeler wrote:
> Hi Marco,
> 
> I applied the first part (SourceUtil) of your patch. I'm not
> sure about the other part (for the CIncludeTransformer).
> Can you explain a little bit, why/when this might be necessary?
> 
> Thanks
> Carsten
> 
> > -----Original Message-----
> > From: Marco Dubbeld [mailto:marco@davtec.nl] 
> > Sent: Monday, March 01, 2004 12:08 PM
> > To: dev@cocoon.apache.org
> > Subject: patch to Enhance CIncludeTrasnformer to handle 
> > encoding ofparameters
> > 
> > Hi cocoon developers,
> > 
> > 1st - Thank you for making cocoon. Cocoon is good stuff, but 
> > I guess you know that already. ;) 2nd - I've been using 
> > cocoon in a little project for a search page (returns 
> > multilingual results), however found some limitation while 
> > using the CInclude transformer. I've posted a patch for it under 
> > 
> > http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26924
> > 
> > on 2004-02-13. It had to do with encoding of request 
> > parameters. The related patch to excalibur sourceresolver has 
> > already been applied.
> > 
> > Without becoming pushy, just wanted to know if this can be 
> > applied......
> > 
> > Thanks,
> > Marco
> > 
> > 


RE: patch to Enhance CIncludeTrasnformer to handle encoding ofparameters

Posted by Carsten Ziegeler <cz...@s-und-n.de>.
Hi Marco,

I applied the first part (SourceUtil) of your patch. I'm not
sure about the other part (for the CIncludeTransformer).
Can you explain a little bit, why/when this might be necessary?

Thanks
Carsten

> -----Original Message-----
> From: Marco Dubbeld [mailto:marco@davtec.nl] 
> Sent: Monday, March 01, 2004 12:08 PM
> To: dev@cocoon.apache.org
> Subject: patch to Enhance CIncludeTrasnformer to handle 
> encoding ofparameters
> 
> Hi cocoon developers,
> 
> 1st - Thank you for making cocoon. Cocoon is good stuff, but 
> I guess you know that already. ;) 2nd - I've been using 
> cocoon in a little project for a search page (returns 
> multilingual results), however found some limitation while 
> using the CInclude transformer. I've posted a patch for it under 
> 
> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26924
> 
> on 2004-02-13. It had to do with encoding of request 
> parameters. The related patch to excalibur sourceresolver has 
> already been applied.
> 
> Without becoming pushy, just wanted to know if this can be 
> applied......
> 
> Thanks,
> Marco
> 
>