You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Carsten Ziegeler <cz...@s-und-n.de> on 2004/03/02 13:29:31 UTC

RE: patch to Enhance CIncludeTrasnformer to handle encodingofparameters

Marco Dubbeld wrote:
> 
> Yep! The value element may contain UTF-8, with chinese 
> characters or other non ISO-8859 encoding characters. While 
> testing, the 
> 
> this.startSerializedXMLRecording(XMLUtils.defaultSerializeToXM
> LFormat(true));
> will use ISO-8859 encoding (see the properties given back 
> from XMLUtils). However we should use a property set with the 
> encoding from the document we are transforming. Otherwise 
> this causes UTFDataFormatException for chinese UTF-8 for example.

So the best way would be to pass the encoding of the document
to the XMLUtils used for serializing. Is this possible?

> 
> To use xml recording for a cinclude value element did not 
> make sense to me anyways, so I choose to use text recording 
> instead to prevent the problem. 
> 
Yes, that's true, I'm wondering about that as well. But unfortunately
your change is incompatible - although I think it doesn't affect
someone.

Carsten 


RE: patch to Enhance CIncludeTrasnformer to handleencodingofparameters

Posted by Marco Dubbeld <ma...@davtec.nl>.
On Wed, 2004-03-03 at 09:45, Carsten Ziegeler wrote:
> Bruno Dumon wrote: 
> 
> > > How to determine most properly the encoding of the events 
> > or the input 
> > > source I do not precisly know.
> > > java.sun.com is down from my location so one half of my brains is 
> > > blocked. If it's back I try to search.
> > > 
> > > Maybe someone on the list knows ?
> > 
> > You just need to choose something. I don't think SAX provides 
> > information about the encoding of the original document. Always using
> > UTF-8 should be a safe choice.
> > 
> Thanks Bruno.
> 
> Marco, what do you think? Would UTF-8 work for you?
Yes, UTF-8 would work. And you are absolutely right in keeping the
startSerializedXMLRecording!

I will come back on the use of XMLUtils and the serialized XML recording
in the AbstractSAXTransformer, because the need to set the encoding in
the transformer for serializing the XML does not make sense, since
everything comes in as string.

Thanks

> 
> 
> Carsten


RE: patch to Enhance CIncludeTrasnformer to handleencodingofparameters

Posted by Carsten Ziegeler <cz...@s-und-n.de>.
Bruno Dumon wrote: 

> > How to determine most properly the encoding of the events 
> or the input 
> > source I do not precisly know.
> > java.sun.com is down from my location so one half of my brains is 
> > blocked. If it's back I try to search.
> > 
> > Maybe someone on the list knows ?
> 
> You just need to choose something. I don't think SAX provides 
> information about the encoding of the original document. Always using
> UTF-8 should be a safe choice.
> 
Thanks Bruno.

Marco, what do you think? Would UTF-8 work for you?


Carsten


RE: patch to Enhance CIncludeTrasnformer to handle encodingofparameters

Posted by Bruno Dumon <br...@outerthought.org>.
On Tue, 2004-03-02 at 14:57, Marco Dubbeld wrote:
> On Tue, 2004-03-02 at 13:29, Carsten Ziegeler wrote:
> > Marco Dubbeld wrote:
> > > 
> > > Yep! The value element may contain UTF-8, with chinese 
> > > characters or other non ISO-8859 encoding characters. While 
> > > testing, the 
> > > 
> > > this.startSerializedXMLRecording(XMLUtils.defaultSerializeToXM
> > > LFormat(true));
> > > will use ISO-8859 encoding (see the properties given back 
> > > from XMLUtils). However we should use a property set with the 
> > > encoding from the document we are transforming. Otherwise 
> > > this causes UTFDataFormatException for chinese UTF-8 for example.
> > 
> > So the best way would be to pass the encoding of the document
> > to the XMLUtils used for serializing. Is this possible?
> Properties props = XMLUtils.defaultSerializeToXMLFormat(true);
> String encoding = ??????????
> props.set(OutputKeys.ENCODING, encoding);
> this.startSerializedXMLRecording(props);
> 
> How to determine most properly the encoding of the events or the input
> source I do not precisly know.
> java.sun.com is down from my location so one half of my brains is
> blocked. If it's back I try to search. 
> 
> Maybe someone on the list knows ?

You just need to choose something. I don't think SAX provides
information about the encoding of the original document. Always using
UTF-8 should be a safe choice.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


RE: patch to Enhance CIncludeTrasnformer to handle encodingofparameters

Posted by Marco Dubbeld <ma...@davtec.nl>.
On Tue, 2004-03-02 at 13:29, Carsten Ziegeler wrote:
> Marco Dubbeld wrote:
> > 
> > Yep! The value element may contain UTF-8, with chinese 
> > characters or other non ISO-8859 encoding characters. While 
> > testing, the 
> > 
> > this.startSerializedXMLRecording(XMLUtils.defaultSerializeToXM
> > LFormat(true));
> > will use ISO-8859 encoding (see the properties given back 
> > from XMLUtils). However we should use a property set with the 
> > encoding from the document we are transforming. Otherwise 
> > this causes UTFDataFormatException for chinese UTF-8 for example.
> 
> So the best way would be to pass the encoding of the document
> to the XMLUtils used for serializing. Is this possible?
Properties props = XMLUtils.defaultSerializeToXMLFormat(true);
String encoding = ??????????
props.set(OutputKeys.ENCODING, encoding);
this.startSerializedXMLRecording(props);

How to determine most properly the encoding of the events or the input
source I do not precisly know.
java.sun.com is down from my location so one half of my brains is
blocked. If it's back I try to search. 

Maybe someone on the list knows ?


> 
> > 
> > To use xml recording for a cinclude value element did not 
> > make sense to me anyways, so I choose to use text recording 
> > instead to prevent the problem. 
> > 
> Yes, that's true, I'm wondering about that as well. But unfortunately
> your change is incompatible - although I think it doesn't affect
> someone.
> 
> Carsten