You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Jeremy Quinn <je...@media.demon.co.uk> on 2004/12/03 14:44:00 UTC
encoding problem in textarea
Hi All
I have an editor for XHTML snippets built in CForms using 2.1.7-dev.
It is very basic, it just uses a textarea.
I am having encoding issues, that appeared only in the last week or so,
I cannot work out the solution.
Symptoms:
I have an accented character in my source document.
The document is displayed in the textarea, with that character
corrupted.
If you save, that character is saved corrupted to disk.
I have the same accented character output by i18n outside of the
textarea, it displays correctly.
Scenario:
My source document is UTF-8.
My serializer (o.a.c.serialization.HTMLSerializer) outputs UTF-8.
My web.xml's 'form-encoding' parameter is set to UTF-8.
My browser recognises the document as UTF-8.
Behaviour:
The character "é" (e acute) when outside of the textarea is serialised
as é.
The same character is serialised as é when it is within the
textarea.
The brackets of the XHTML tags in the textarea are output as entities.
I output to log the string being edited. The accented character is
correct before being added to the widget, correct after being added to
the widget but before display.
If I edit the character to correct it in the form, the correct
character is written to the file.
If I do not edit the character, the incorrect characters ("é" ie. the
characters represented by é) are written to the file.
However, regardless of whether I edit it or not, my log message shows
the correct character after the form has been submitted, before it has
been written back.
Technique:
I read the XML Source to a String (to add to the textarea widget) like
this:
var string = org.apache.avalon.excalibur.io.IOUtil.toString(
new java.io.BufferedInputStream(
org.apache.cocoon.components.source.SourceUtil.getInputSource(
resolver.resolveURI(uri)
).getByteStream()
)
);
form.lookupWidget("xhtml").setValue(string);
I write the String from the widget back to XML File like this:
var source = resolver.resolveURI(uri);
var dom = parser.parseDocument(
new org.xml.sax.InputSource(
new java.io.StringReader(form.lookupWidget("xhtml").getValue())
)
);
// basically copied from the samples
var outputStream = null;
try {
var tf =
Packages.javax.xml.transform.TransformerFactory.newInstance();
if (source instanceof
Packages.org.apache.excalibur.source.ModifiableSource
&&
tf.getFeature(Packages.javax.xml.transform.sax.SAXTransformerFactory.FEA
TURE))
{
outputStream = source.getOutputStream();
var transformerHandler = tf.newTransformerHandler();
var transformer = transformerHandler.getTransformer();
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.IN
DENT, "true");
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.ME
THOD, "xml");
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.EN
CODING, "UTF-8");
transformerHandler.setResult(new
Packages.javax.xml.transform.stream.StreamResult(outputStream));
var streamer = new
Packages.org.apache.cocoon.xml.dom.DOMStreamer(transformerHandler);
streamer.stream(document);
} else {
throw ("error.source.not-writeable");
}
} catch (e) {
throw(e);
} finally {
if (outputStream != null) {
try {
outputStream.flush();
outputStream.close();
} catch (error) {
cocoon.log.error("Could not flush/close outputstream: " + error);
}
}
}
Can anyone see what I am doing wrong?
I have tried using
org.apache.cocoon.components.serializers.HTMLSerializer, but CForms
does not work with it.
I have tried different doctypes.
I tried pre entity encoding the accented character in the source
document, and the textarea showed the raw entity.
I really have made this work before, but now I am completely stumped !!!
Thanks for any suggestions.
regards Jeremy
--------------------------------------------------------
If email from this address is not signed
IT IS NOT FROM ME
Always check the label, folks !!!!!
--------------------------------------------------------
Re: encoding problem in textarea
Posted by Torsten Curdt <tc...@apache.org>.
Jeremy Quinn wrote:
> Torsten
>
> You are my hero !!!!!
Hehe :-)
> I works again ;)
Glad you found it
> Many thanks mate !!
No worries, mate
cheers
--
Torsten
Re: encoding problem in textarea
Posted by Jeremy Quinn <je...@media.demon.co.uk>.
Torsten
You are my hero !!!!!
function getSourceAsString(uri) {
return ( org.apache.avalon.excalibur.io.IOUtil.toString(
new java.io.BufferedInputStream(
org.apache.cocoon.components.source.SourceUtil.getInputSource(
resolve(uri)
).getByteStream()
), "UTF-8"
);
}
Yes, I should have added the encoding to
org.apache.avalon.excalibur.io.IOUtil.toString(inputStream, encoding)
I works again ;)
Many thanks mate !!
regards Jeremy
On 3 Dec 2004, at 14:47, Torsten Curdt wrote:
> Mate,
>
> I am not totally sure but...
>
>> Technique:
>> I read the XML Source to a String (to add to the textarea widget)
>> like this:
>> var string = org.apache.avalon.excalibur.io.IOUtil.toString(
>> new java.io.BufferedInputStream(
>> org.apache.cocoon.components.source.SourceUtil.getInputSource(
>> resolver.resolveURI(uri)
>> ).getByteStream()
>> )
>> );
>
> if you create a String from a byte stream, array or
> whatever is byte-based don't you have to specify the
> charset? AFAIK creating a String from a byte array
> uses the platform default charset.
>
>> form.lookupWidget("xhtml").setValue(string);
>
> did you check whether the string looks fine here?
>
> ...just what came up to my mind right away
>
> HTH
>
> cheers
> --
> Torsten
>
>
--------------------------------------------------------
If email from this address is not signed
IT IS NOT FROM ME
Always check the label, folks !!!!!
--------------------------------------------------------
Re: encoding problem in textarea
Posted by Jeremy Quinn <je...@media.demon.co.uk>.
On 3 Dec 2004, at 14:47, Torsten Curdt wrote:
> Mate,
Thanks for your reply.
> I am not totally sure but...
>
>> Technique:
>> I read the XML Source to a String (to add to the textarea widget)
>> like this:
>> var string = org.apache.avalon.excalibur.io.IOUtil.toString(
>> new java.io.BufferedInputStream(
>> org.apache.cocoon.components.source.SourceUtil.getInputSource(
>> resolver.resolveURI(uri)
>> ).getByteStream()
>> )
>> );
>
> if you create a String from a byte stream, array or
> whatever is byte-based don't you have to specify the
> charset?
I will look into that.
> AFAIK creating a String from a byte array
> uses the platform default charset.
Which AFAIU is UTF-8 on MacOSX.
>> form.lookupWidget("xhtml").setValue(string);
>
> did you check whether the string looks fine here?
>
> ...just what came up to my mind right away
var htmlWidget = form.lookupWidget("xhtml");
htmlWidget.setValue(getSourceAsString(id));
cocoon.log.info("String before edit: " + htmlWidget.getValue());
And it is correct at this stage.
regards Jeremy
--------------------------------------------------------
If email from this address is not signed
IT IS NOT FROM ME
Always check the label, folks !!!!!
--------------------------------------------------------
Re: encoding problem in textarea
Posted by Torsten Curdt <tc...@apache.org>.
Mate,
I am not totally sure but...
> Technique:
> I read the XML Source to a String (to add to the textarea widget) like
> this:
>
> var string = org.apache.avalon.excalibur.io.IOUtil.toString(
> new java.io.BufferedInputStream(
> org.apache.cocoon.components.source.SourceUtil.getInputSource(
> resolver.resolveURI(uri)
> ).getByteStream()
> )
> );
if you create a String from a byte stream, array or
whatever is byte-based don't you have to specify the
charset? AFAIK creating a String from a byte array
uses the platform default charset.
> form.lookupWidget("xhtml").setValue(string);
did you check whether the string looks fine here?
...just what came up to my mind right away
HTH
cheers
--
Torsten