You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Jeremy Quinn <je...@media.demon.co.uk> on 2004/12/03 14:44:00 UTC

encoding problem in textarea

Hi All

I have an editor for XHTML snippets built in CForms using 2.1.7-dev.
It is very basic, it just uses a textarea.
I am having encoding issues, that appeared only in the last week or so,  
I cannot work out the solution.

Symptoms:
I have an accented character in my source document.
The document is displayed in the textarea, with that character  
corrupted.
If you save, that character is saved corrupted to disk.
I have the same accented character output by i18n outside of the  
textarea, it displays correctly.

Scenario:
My source document is UTF-8.
My serializer (o.a.c.serialization.HTMLSerializer) outputs UTF-8.
My web.xml's 'form-encoding' parameter is set to UTF-8.
My browser recognises the document as UTF-8.

Behaviour:
The character "é" (e acute) when outside of the textarea is serialised  
as &eacute;.
The same character is serialised as &radic;&copy; when it is within the  
textarea.
The brackets of the XHTML tags in the textarea are output as entities.

I output to log the string being edited. The accented character is  
correct before being added to the widget, correct after being added to  
the widget but before display.

If I edit the character to correct it in the form, the correct  
character is written to the file.
If I do not edit the character, the incorrect characters ("é" ie. the  
characters represented by &radic;&copy;) are written to the file.

However, regardless of whether I edit it or not, my log message shows  
the correct character after the form has been submitted, before it has  
been written back.

Technique:
I read the XML Source to a String (to add to the textarea widget) like  
this:

var string = org.apache.avalon.excalibur.io.IOUtil.toString(
   new java.io.BufferedInputStream(
     org.apache.cocoon.components.source.SourceUtil.getInputSource(
       resolver.resolveURI(uri)
     ).getByteStream()
   )
);
form.lookupWidget("xhtml").setValue(string);

I write the String from the widget back to XML File like this:

var source = resolver.resolveURI(uri);
var dom = parser.parseDocument(
   new org.xml.sax.InputSource(
     new java.io.StringReader(form.lookupWidget("xhtml").getValue())
   )
);

// basically copied from the samples
var outputStream = null;
try {
   var tf =  
Packages.javax.xml.transform.TransformerFactory.newInstance();
   if (source instanceof  
Packages.org.apache.excalibur.source.ModifiableSource
       &&
      
tf.getFeature(Packages.javax.xml.transform.sax.SAXTransformerFactory.FEA 
TURE))
   {
     outputStream = source.getOutputStream();
     var transformerHandler = tf.newTransformerHandler();
     var transformer = transformerHandler.getTransformer();
      
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.IN 
DENT, "true");
      
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.ME 
THOD, "xml");
      
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.EN 
CODING, "UTF-8");
     transformerHandler.setResult(new  
Packages.javax.xml.transform.stream.StreamResult(outputStream));
     var streamer = new  
Packages.org.apache.cocoon.xml.dom.DOMStreamer(transformerHandler);
     streamer.stream(document);
   } else {
     throw ("error.source.not-writeable");
   }	
} catch (e) {
   throw(e);
} finally {
   if (outputStream != null) {
     try {
       outputStream.flush();
       outputStream.close();
     } catch (error) {
       cocoon.log.error("Could not flush/close outputstream: " + error);
     }
   }	
}


Can anyone see what I am doing wrong?

I have tried using  
org.apache.cocoon.components.serializers.HTMLSerializer, but CForms  
does not work with it.

I have tried different doctypes.

I tried pre entity encoding the accented character in the source  
document, and the textarea showed the raw entity.

I really have made this work before, but now I am completely stumped !!!

Thanks for any suggestions.

regards Jeremy


--------------------------------------------------------

                   If email from this address is not signed
                                 IT IS NOT FROM ME

                         Always check the label, folks !!!!!
--------------------------------------------------------

Re: encoding problem in textarea

Posted by Torsten Curdt <tc...@apache.org>.
Jeremy Quinn wrote:
> Torsten
> 
> You are my hero !!!!!

Hehe :-)

> I works again ;)

Glad you found it

> Many thanks mate !!

No worries, mate

cheers
--
Torsten

Re: encoding problem in textarea

Posted by Jeremy Quinn <je...@media.demon.co.uk>.
Torsten

You are my hero !!!!!

function getSourceAsString(uri) {
	return ( org.apache.avalon.excalibur.io.IOUtil.toString(
		new java.io.BufferedInputStream(
			org.apache.cocoon.components.source.SourceUtil.getInputSource(
				resolve(uri)
			).getByteStream()
		), "UTF-8"
	);
}

Yes, I should have added the encoding to
	org.apache.avalon.excalibur.io.IOUtil.toString(inputStream, encoding)

I works again ;)

Many thanks mate !!

regards Jeremy

On 3 Dec 2004, at 14:47, Torsten Curdt wrote:

> Mate,
>
> I am not totally sure but...
>
>> Technique:
>> I read the XML Source to a String (to add to the textarea widget) 
>> like  this:
>> var string = org.apache.avalon.excalibur.io.IOUtil.toString(
>>   new java.io.BufferedInputStream(
>>     org.apache.cocoon.components.source.SourceUtil.getInputSource(
>>       resolver.resolveURI(uri)
>>     ).getByteStream()
>>   )
>> );
>
> if you create a String from a byte stream, array or
> whatever is byte-based don't you have to specify the
> charset? AFAIK creating a String from a byte array
> uses the platform default charset.
>
>> form.lookupWidget("xhtml").setValue(string);
>
> did you check whether the string looks fine here?
>
> ...just what came up to my mind right away
>
> HTH
>
> cheers
> --
> Torsten
>
>
--------------------------------------------------------

                   If email from this address is not signed
                                 IT IS NOT FROM ME

                         Always check the label, folks !!!!!
--------------------------------------------------------


Re: encoding problem in textarea

Posted by Jeremy Quinn <je...@media.demon.co.uk>.
On 3 Dec 2004, at 14:47, Torsten Curdt wrote:

> Mate,

Thanks for your reply.

> I am not totally sure but...
>
>> Technique:
>> I read the XML Source to a String (to add to the textarea widget) 
>> like  this:
>> var string = org.apache.avalon.excalibur.io.IOUtil.toString(
>>   new java.io.BufferedInputStream(
>>     org.apache.cocoon.components.source.SourceUtil.getInputSource(
>>       resolver.resolveURI(uri)
>>     ).getByteStream()
>>   )
>> );
>
> if you create a String from a byte stream, array or
> whatever is byte-based don't you have to specify the
> charset?

I will look into that.

> AFAIK creating a String from a byte array
> uses the platform default charset.

Which AFAIU is UTF-8 on MacOSX.

>> form.lookupWidget("xhtml").setValue(string);
>
> did you check whether the string looks fine here?
>
> ...just what came up to my mind right away

	var htmlWidget = form.lookupWidget("xhtml");
	htmlWidget.setValue(getSourceAsString(id));
	cocoon.log.info("String before edit: " + htmlWidget.getValue());

And it is correct at this stage.

regards Jeremy


--------------------------------------------------------

                   If email from this address is not signed
                                 IT IS NOT FROM ME

                         Always check the label, folks !!!!!
--------------------------------------------------------


Re: encoding problem in textarea

Posted by Torsten Curdt <tc...@apache.org>.
Mate,

I am not totally sure but...

> Technique:
> I read the XML Source to a String (to add to the textarea widget) like  
> this:
> 
> var string = org.apache.avalon.excalibur.io.IOUtil.toString(
>   new java.io.BufferedInputStream(
>     org.apache.cocoon.components.source.SourceUtil.getInputSource(
>       resolver.resolveURI(uri)
>     ).getByteStream()
>   )
> );

if you create a String from a byte stream, array or
whatever is byte-based don't you have to specify the
charset? AFAIK creating a String from a byte array
uses the platform default charset.

> form.lookupWidget("xhtml").setValue(string);

did you check whether the string looks fine here?

...just what came up to my mind right away

HTH

cheers
--
Torsten