You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by stefan pickschneider <st...@s1998.tu-chemnitz.de> on 2005/03/05 15:17:04 UTC

Encoding Problem: File Upload and UTF-8

Hello,

we have a problem with the encoding, cforms and file uploads.

We have a form that allows to upload a file and on the same form an 
input field to enter the corresponding title. The upload works but
if you enter special characters (like german umlauts) these are sent
as garbage ('?'). The enctype of the form is enctype="multipart/form-data".

If I enable UTF-8 as container-encoding in web.xml it works - the upload 
and the title is ok. Problem: Then all other forms (with no uploads) 
stop to handle UTF-8 correctely.

Has anybody a clue how I can get forms with or without uploads work at 
the same time?


Thanks a lot,
stefan.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding Problem: File Upload and UTF-8

Posted by stefan pickschneider <st...@s1998.tu-chemnitz.de>.
> It should work the same for both normal and multipart forms, without use
> of the SetCharacterEncodingAction. You do need the fix though I pointed
> to in a previous mail, since that one is not yet in Cocoon 2.1.6 IIRC.

Got it ;-) Should've tried that first. It just works.

Thanks a lot!
Stefan.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding Problem: File Upload and UTF-8

Posted by Bruno Dumon <br...@outerthought.org>.
On Mon, 2005-03-07 at 12:33 +0100, stefan pickschneider wrote:
> Hey Bruno,
> 
> >>I am not sure if the above patch fixes the problem I mentioned.
> >>
> >>The patch replaces
> >>   this.defaultFormEncoding);
> >>by
> >>   this.containerEncoding);
> >>
> >>In our configuration the
> >>defaultFormEncoding == containerEncoding == "UTF-8" - so the patch
> >>doesn't seem to have a influence on our problem :-(
> > 
> > 
> > then your configuration is wrong :-) The containerEncoding should be
> > ISO-8859-1, always.
> > 
> Ok. So this option is for future use or if you really know, what
> your webcontainer is doing.

I assume it is for broken web containers, that don't follow the spec.

> 
> > 
> >>Somewhere the encoding for the Parameters is set to ISO-8859-1 probably 
> >>because no encoding is set until the first call of getParameter*().
> > 
> > 
> > Cocoon doesn't rely on the setCharacterEncoding support of the
> > webcontainer, basically because of historical reasons, that method
> > didn't always exist. Additionally, Cocoon's approach has the advantage
> > that it allows to change the encoding from the sitemap, allowing
> > different parts of the sitemap to use different encodings.
> > 
> I tried it this way. Normal forms (not multipart/form-data) are 
> submitted correctly with ISO-8859-1 (containerencoding) decoded to UTF-8 
> (formencoding).
> 
> Multipart forms instead seem to send the form fields as UTF-8. Cocoon 
> tries to decode these too from ISO to UTF - so they break and i doesn't 
> work.
> 
> Ok, so for our problem a solution would like:
> ContainerEncoding ISO-8859-1
> FormEncoding UTF-8
> 
> For all forms that use "multipart/form-data" & utf-8: use the 
> SetCharacterEncodingAction in the sitemap to change the encoding to utf-8.
> 
> Unfortunately SetCharacterEncodingAction is only able to change
> the FormEncoding - not the ContainerEncoding - so that this way wouldn't
> work without enhancing the action.
> 
> 
> Are there any other ways?

It should work the same for both normal and multipart forms, without use
of the SetCharacterEncodingAction. You do need the fix though I pointed
to in a previous mail, since that one is not yet in Cocoon 2.1.6 IIRC.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding Problem: File Upload and UTF-8

Posted by stefan pickschneider <st...@s1998.tu-chemnitz.de>.
Hey Bruno,

>>I am not sure if the above patch fixes the problem I mentioned.
>>
>>The patch replaces
>>   this.defaultFormEncoding);
>>by
>>   this.containerEncoding);
>>
>>In our configuration the
>>defaultFormEncoding == containerEncoding == "UTF-8" - so the patch
>>doesn't seem to have a influence on our problem :-(
> 
> 
> then your configuration is wrong :-) The containerEncoding should be
> ISO-8859-1, always.
> 
Ok. So this option is for future use or if you really know, what
your webcontainer is doing.

> 
>>Somewhere the encoding for the Parameters is set to ISO-8859-1 probably 
>>because no encoding is set until the first call of getParameter*().
> 
> 
> Cocoon doesn't rely on the setCharacterEncoding support of the
> webcontainer, basically because of historical reasons, that method
> didn't always exist. Additionally, Cocoon's approach has the advantage
> that it allows to change the encoding from the sitemap, allowing
> different parts of the sitemap to use different encodings.
> 
I tried it this way. Normal forms (not multipart/form-data) are 
submitted correctly with ISO-8859-1 (containerencoding) decoded to UTF-8 
(formencoding).

Multipart forms instead seem to send the form fields as UTF-8. Cocoon 
tries to decode these too from ISO to UTF - so they break and i doesn't 
work.

Ok, so for our problem a solution would like:
ContainerEncoding ISO-8859-1
FormEncoding UTF-8

For all forms that use "multipart/form-data" & utf-8: use the 
SetCharacterEncodingAction in the sitemap to change the encoding to utf-8.

Unfortunately SetCharacterEncodingAction is only able to change
the FormEncoding - not the ContainerEncoding - so that this way wouldn't
work without enhancing the action.


Are there any other ways?

Thanks,
Stefan.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding Problem: File Upload and UTF-8

Posted by Bruno Dumon <br...@outerthought.org>.
On Mon, 2005-03-07 at 11:02 +0100, stefan pickschneider wrote:
> > This problem is already fixed, though in a different way:
> > 
> > http://svn.apache.org/viewcvs.cgi/cocoon/branches/BRANCH_2_1_X/src/java/org/apache/cocoon/servlet/CocoonServlet.java?rev=106200&r1=55730&r2=106200
> 
> Hey Bruno,
> 
> thanks for your reply.
> 
> I am not sure if the above patch fixes the problem I mentioned.
> 
> The patch replaces
>    this.defaultFormEncoding);
> by
>    this.containerEncoding);
> 
> In our configuration the
> defaultFormEncoding == containerEncoding == "UTF-8" - so the patch
> doesn't seem to have a influence on our problem :-(

then your configuration is wrong :-) The containerEncoding should be
ISO-8859-1, always.

> 
> Somewhere the encoding for the Parameters is set to ISO-8859-1 probably 
> because no encoding is set until the first call of getParameter*().

Cocoon doesn't rely on the setCharacterEncoding support of the
webcontainer, basically because of historical reasons, that method
didn't always exist. Additionally, Cocoon's approach has the advantage
that it allows to change the encoding from the sitemap, allowing
different parts of the sitemap to use different encodings.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding Problem: File Upload and UTF-8

Posted by stefan pickschneider <st...@s1998.tu-chemnitz.de>.
> This problem is already fixed, though in a different way:
> 
> http://svn.apache.org/viewcvs.cgi/cocoon/branches/BRANCH_2_1_X/src/java/org/apache/cocoon/servlet/CocoonServlet.java?rev=106200&r1=55730&r2=106200

Hey Bruno,

thanks for your reply.

I am not sure if the above patch fixes the problem I mentioned.

The patch replaces
   this.defaultFormEncoding);
by
   this.containerEncoding);

In our configuration the
defaultFormEncoding == containerEncoding == "UTF-8" - so the patch
doesn't seem to have a influence on our problem :-(

Somewhere the encoding for the Parameters is set to ISO-8859-1 probably 
because no encoding is set until the first call of getParameter*().


Stefan.






---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding Problem: File Upload and UTF-8

Posted by Bruno Dumon <br...@outerthought.org>.
On Sun, 2005-03-06 at 20:46 +0100, stefan pickschneider wrote:
> Bertrand Delacretaz wrote:
> > Le 5 mars 05, à 15:17, stefan pickschneider a écrit :
> > 
> >> ..we have a problem with the encoding, cforms and file uploads.
> > 
> > 
> > Did you look at http://wiki.apache.org/cocoon/RequestParameterEncoding ?
> > It might give some answers.
> > 
> > -Bertrand
> 
> Hello,
> 
> thanks for your suggestion - it helped me to find a "solution".
> 
> It seems that the cocoon servlet initializes the container-encoding for 
> the request too late, so that the default encoding iso-8859-1 is used 
> for reading the parameters. These are incorrectly decoded, since 
> container-encoding and form-encoding are set to UTF-8 in web.xml.
> All forms etc. are UTF-8, too.
> This problem only happens if you're using "application/x-www-
> form-urlencoded" as enctype. For "multipart/form-data" it works, because
> MultipartHttpServletRequest doesn't try to decode parameters.
> 
> Solution:
> In CocoonServlet.java around line 1040 insert
>          // set encoding before parameters are accessed
>          request.setCharacterEncoding(this.containerEncoding);
> before
> 	// Get the cocoon engine instance
>          getCocoon(request.getPathInfo(), 			
> 	request.getParameter(Constants.RELOAD_PARAM));
> 
> 
> Hope someone needs it and it is not too dirty ;-)
> Stefan.
> 

This problem is already fixed, though in a different way:

http://svn.apache.org/viewcvs.cgi/cocoon/branches/BRANCH_2_1_X/src/java/org/apache/cocoon/servlet/CocoonServlet.java?rev=106200&r1=55730&r2=106200

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding Problem: File Upload and UTF-8

Posted by Bertrand Delacretaz <bd...@apache.org>.
Le 6 mars 05, à 20:46, stefan pickschneider a écrit :
> ...It seems that the cocoon servlet initializes the container-encoding 
> for the request too late..

Could you create a bugzilla entry (http://issues.apache.org/bugzilla/) 
for this, so that we don't forget to have a look at it?

-Bertrand

Re: Encoding Problem: File Upload and UTF-8

Posted by stefan pickschneider <st...@s1998.tu-chemnitz.de>.
Bertrand Delacretaz wrote:
> Le 5 mars 05, à 15:17, stefan pickschneider a écrit :
> 
>> ..we have a problem with the encoding, cforms and file uploads.
> 
> 
> Did you look at http://wiki.apache.org/cocoon/RequestParameterEncoding ?
> It might give some answers.
> 
> -Bertrand

Hello,

thanks for your suggestion - it helped me to find a "solution".

It seems that the cocoon servlet initializes the container-encoding for 
the request too late, so that the default encoding iso-8859-1 is used 
for reading the parameters. These are incorrectly decoded, since 
container-encoding and form-encoding are set to UTF-8 in web.xml.
All forms etc. are UTF-8, too.
This problem only happens if you're using "application/x-www-
form-urlencoded" as enctype. For "multipart/form-data" it works, because
MultipartHttpServletRequest doesn't try to decode parameters.

Solution:
In CocoonServlet.java around line 1040 insert
         // set encoding before parameters are accessed
         request.setCharacterEncoding(this.containerEncoding);
before
	// Get the cocoon engine instance
         getCocoon(request.getPathInfo(), 			
	request.getParameter(Constants.RELOAD_PARAM));


Hope someone needs it and it is not too dirty ;-)
Stefan.





---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding Problem: File Upload and UTF-8

Posted by Bertrand Delacretaz <bd...@apache.org>.
Le 5 mars 05, à 15:17, stefan pickschneider a écrit :

> ..we have a problem with the encoding, cforms and file uploads.

Did you look at http://wiki.apache.org/cocoon/RequestParameterEncoding ?
It might give some answers.

-Bertrand