You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Jaime Hablutzel Egoavil <ha...@gmail.com> on 2013/03/08 16:38:50 UTC

Wrong processing for encoding of Content-disposition 'filename' parameter

I'm looking that commons fileupload uses a 'headerEncoding' variable which
Javadoc explanes:

Specifies the character encoding to be used when reading the headers of
> individual part. When not specified, or null, the request encoding is used.
> If that is also not specified, or null, the platform default encoding is
> used.


Well, this headerEncoding is responsible for decoding the 'filename'
parameter value too, but I can see that rfc1867 (the RFC you implement)
says:

 The original local file name may be supplied as well, either as a
>    'filename' parameter either of the 'content-disposition: form-data'
>    header or in the case of multiple files in a 'content-disposition:
>    file' header of the subpart. The client application should make best
>    effort to supply the file name; if the file name of the client's
>    operating system is not in US-ASCII, the file name might be
>    approximated or encoded using the method of RFC 1522.  This is a
>    convenience for those cases where, for example, the uploaded files
>    might contain references to each other, e.g., a TeX file and its .sty
>    auxiliary style description.


So the filename parameter value should not be decoded without any mechanism
but US-ASCII  or the method described in RFC 1522 (encoded words), but you
just decode it with a custom 'headerEncoding'. So please any clarification
would be useful to me, why are you processing headers like that?

Take a look at this issue too and see if you can reopen it:

https://issues.apache.org/jira/browse/FILEUPLOAD-56#comment-13597224


PS: Chrome and Firefox browsers doesn't seem to follow this spec neither as
they encode 'filename' parameter (from a multipart/form-data) with the page
encoding (or form accept-charset), thus producing headers with raw UTF-8 or
any encoding choosen for the page in some cases.



-- 
Jaime Hablutzel -  RPC 987608463

Re: Wrong processing for encoding of Content-disposition 'filename' parameter

Posted by Simone Tripodi <si...@apache.org>.
Hi Jaime,

thanks a lot for contributing to FileUpload!

Please have a look at FILEUPLOAD-199 - there is an enhancement request
for non-ASCII encoded file name. You may are able to provide
feedbacks, here! :)

Many thanks in advance, all the best,
-Simo

PS please, for future emails you'll send to the list, always put the
component prefix in the subject, i.e. [fileupload], otherwise it will
be hard that interested people will notice the discussion - I found it
accidentally!

TIA,
-Simo

http://people.apache.org/~simonetripodi/
http://simonetripodi.livejournal.com/
http://twitter.com/simonetripodi
http://www.99soft.org/


On Tue, Mar 12, 2013 at 2:22 AM, Jaime Hablutzel Egoavil
<ha...@gmail.com> wrote:
> nothing on this?
>
>
> On Fri, Mar 8, 2013 at 10:38 AM, Jaime Hablutzel Egoavil <
> hablutzel1@gmail.com> wrote:
>
>> I'm looking that commons fileupload uses a 'headerEncoding' variable which
>> Javadoc explanes:
>>
>> Specifies the character encoding to be used when reading the headers of
>>> individual part. When not specified, or null, the request encoding is used.
>>> If that is also not specified, or null, the platform default encoding is
>>> used.
>>
>>
>> Well, this headerEncoding is responsible for decoding the 'filename'
>> parameter value too, but I can see that rfc1867 (the RFC you implement)
>> says:
>>
>>  The original local file name may be supplied as well, either as a
>>>    'filename' parameter either of the 'content-disposition: form-data'
>>>    header or in the case of multiple files in a 'content-disposition:
>>>    file' header of the subpart. The client application should make best
>>>    effort to supply the file name; if the file name of the client's
>>>    operating system is not in US-ASCII, the file name might be
>>>    approximated or encoded using the method of RFC 1522.  This is a
>>>    convenience for those cases where, for example, the uploaded files
>>>    might contain references to each other, e.g., a TeX file and its .sty
>>>    auxiliary style description.
>>
>>
>> So the filename parameter value should not be decoded without any
>> mechanism but US-ASCII  or the method described in RFC 1522 (encoded
>> words), but you just decode it with a custom 'headerEncoding'. So please
>> any clarification would be useful to me, why are you processing headers
>> like that?
>>
>> Take a look at this issue too and see if you can reopen it:
>>
>> https://issues.apache.org/jira/browse/FILEUPLOAD-56#comment-13597224
>>
>>
>> PS: Chrome and Firefox browsers doesn't seem to follow this spec neither
>> as they encode 'filename' parameter (from a multipart/form-data) with the
>> page encoding (or form accept-charset), thus producing headers with raw
>> UTF-8 or any encoding choosen for the page in some cases.
>>
>>
>>
>> --
>> Jaime Hablutzel -  RPC 987608463
>>
>
>
>
> --
> Jaime Hablutzel -  RPC 987608463

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Wrong processing for encoding of Content-disposition 'filename' parameter

Posted by Jaime Hablutzel Egoavil <ha...@gmail.com>.
nothing on this?


On Fri, Mar 8, 2013 at 10:38 AM, Jaime Hablutzel Egoavil <
hablutzel1@gmail.com> wrote:

> I'm looking that commons fileupload uses a 'headerEncoding' variable which
> Javadoc explanes:
>
> Specifies the character encoding to be used when reading the headers of
>> individual part. When not specified, or null, the request encoding is used.
>> If that is also not specified, or null, the platform default encoding is
>> used.
>
>
> Well, this headerEncoding is responsible for decoding the 'filename'
> parameter value too, but I can see that rfc1867 (the RFC you implement)
> says:
>
>  The original local file name may be supplied as well, either as a
>>    'filename' parameter either of the 'content-disposition: form-data'
>>    header or in the case of multiple files in a 'content-disposition:
>>    file' header of the subpart. The client application should make best
>>    effort to supply the file name; if the file name of the client's
>>    operating system is not in US-ASCII, the file name might be
>>    approximated or encoded using the method of RFC 1522.  This is a
>>    convenience for those cases where, for example, the uploaded files
>>    might contain references to each other, e.g., a TeX file and its .sty
>>    auxiliary style description.
>
>
> So the filename parameter value should not be decoded without any
> mechanism but US-ASCII  or the method described in RFC 1522 (encoded
> words), but you just decode it with a custom 'headerEncoding'. So please
> any clarification would be useful to me, why are you processing headers
> like that?
>
> Take a look at this issue too and see if you can reopen it:
>
> https://issues.apache.org/jira/browse/FILEUPLOAD-56#comment-13597224
>
>
> PS: Chrome and Firefox browsers doesn't seem to follow this spec neither
> as they encode 'filename' parameter (from a multipart/form-data) with the
> page encoding (or form accept-charset), thus producing headers with raw
> UTF-8 or any encoding choosen for the page in some cases.
>
>
>
> --
> Jaime Hablutzel -  RPC 987608463
>



-- 
Jaime Hablutzel -  RPC 987608463