You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geronimo.apache.org by "Sangjin Lee (JIRA)" <ji...@apache.org> on 2007/11/28 03:11:43 UTC

[jira] Created: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

should allow URL encoding with custom encoding charset other than the default
-----------------------------------------------------------------------------

                 Key: GERONIMO-3638
                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
             Project: Geronimo
          Issue Type: New Feature
      Security Level: public (Regular issues)
          Components: AsyncHttpClient
    Affects Versions: 1.x
            Reporter: Sangjin Lee


Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Sangjin Lee (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sangjin Lee updated GERONIMO-3638:
----------------------------------

    Attachment: patch.zip

A suggested fix.  It appears the charset variable in HttpMessage is unused, and it seems to be intended for the content charset.  The only place this charset is used is when we URL encode the form/query.


> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>         Attachments: patch.zip
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Sangjin Lee (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sangjin Lee updated GERONIMO-3638:
----------------------------------

    Attachment: 3638.patch

a suggested fix

> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>         Attachments: 3638.patch
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Rick McGuire (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551464 ] 

Rick McGuire commented on GERONIMO-3638:
----------------------------------------

I'm not sure I understand the rationale for using US-ASCII as the default.  If I'm interpreting the snippet for RFC 3986 correctly, UTF-8 should be the only encoding used for transforming the textual data into a URL encoding.  This is essentially a 2-stage process.  1) encode the characters in bytes using UTF-8 as the target encoding.  2)  Interpret those bytes as if they were an 8-bit ASCII encoding and perform the URL encoding on that.  Since every character in the US-ASCII character set would encode exactly the same way using UTF-8 as the converson target, that encoding is contained within the new standard.   

I guess the only reason for allowing the charset to be specified would if the target of the message is known not to support RFC 3986.  But in that case, it wouldn't make sense to try to send those characters in the first place, since they wouldn't encode correctly.  So changing the encoding at best provide a safety measure to ensure incorrect encoding are not sent. 

The provided patch does a very nice job of implementing the proposed behavior.  I'm not yet convinced the proposed behavior is the correct one. 

> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>         Attachments: 3638.patch
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Sangjin Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551592 ] 

Sangjin Lee commented on GERONIMO-3638:
---------------------------------------

Yes, I agree the default for URL encoding should be UTF-8.  I also agree the only case where anything other than UTF-8 could be used is to interact with a server that may not be handling it properly and thus the client *has* to use a different custom encoding.

I am changing the default URL encoding value to UTF-8, but proposing keeping an option of setting the charset.  I added an ample warning in the javadoc not to ... try this at home. :)

I'll upload a revised patch shortly...

> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>         Attachments: 3638.patch
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Sangjin Lee (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sangjin Lee updated GERONIMO-3638:
----------------------------------

    Attachment:     (was: 3638.patch)

> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>         Attachments: 3638.patch
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Jay D. McHugh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546372 ] 

Jay D. McHugh commented on GERONIMO-3638:
-----------------------------------------

Content can be in any character set, but I am pretty sure that you are only allowed to use a portion of the 8859 character set in URLs.

<snippet from RFC1738>
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved  purposes may be used unencoded within a URL.
</snippet>

Since this is an HTTP client, shouldn't it be limited to the characters allowed by the URL spec?

> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>         Attachments: patch.zip
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Sangjin Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546418 ] 

Sangjin Lee commented on GERONIMO-3638:
---------------------------------------

Hmm, I swear I remember seeing the RFCs (2616 and 2396 which supplants 1738) mention different *URL* encoding than us-ascii (e.g. utf-8), but I cannot find it at the moment...

At minimum, shouldn't we use "ISO-8859-1" or "US-ASCII" explicitly instead of using Charset.defaultCharset()?  If the JVM is running on a non-US box, Charset.defaultCharset() would return a different charset than ascii, no?  Thanks.

> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>         Attachments: patch.zip
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Rick McGuire (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick McGuire resolved GERONIMO-3638.
------------------------------------

    Resolution: Fixed
      Assignee: Rick McGuire

Committed revision 604018.

This last patch looks like it is the correct way to manage these encodings.  Thanks for sticking with this one. 

> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>            Assignee: Rick McGuire
>         Attachments: 3638.patch
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Jay D. McHugh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546454 ] 

Jay D. McHugh commented on GERONIMO-3638:
-----------------------------------------

RFC 2396 seems to say the same character set (a portion of latin with numerics and some special characters).

As far as whether we should be using the Charset.defaultCharset() - You may be right.

Maybe we should be either using US-ASCII or ISO-8859-1 explicitly rather than letting the JVM pick which one we use.

Anyone else have a comment or suggestion?

> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>         Attachments: patch.zip
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Sangjin Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551083 ] 

sjlee0 edited comment on GERONIMO-3638 at 12/12/07 3:04 PM:
-----------------------------------------------------------------

OK, I think I understand what the right thing here is...  For URL encoding, now the new RFC is 3986 and it obsoletes 2396.  It states

<snippet>
   When a new URI scheme defines a component that represents textual
   data consisting of characters from the Universal Character Set [UCS],
   the data should first be encoded as octets according to the UTF-8
   character encoding [STD63]; then only those octets that do not
   correspond to characters in the unreserved set should be percent-
   encoded.  For example, the character A would be represented as "A",
   the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
   as "%C3%80", and the character KATAKANA LETTER A would be represented
   as "%E3%82%A2".
</snippet>

The default URL encoding charset used to be US-ASCII, but now it is UTF-8.  The distinction becomes clear when you need to encode Unicode characters that do not exist in the ascii code page.

In any case, I think the right thing to happen here is:
- The default charset for URL encoding queries and forms shall be "US-ASCII"
- One should allow the charset to be overridden specifically with "UTF-8" in mind

Independent of the URL encoding charset, one needs a charset with which to encode/decode HTTP elements.  Currently it is using Charset.defaultCharset().  This is incorrect.  It should use "US-ASCII".

Thoughts?


      was (Author: sjlee0):
    OK, I think I understand what the right thing here is...  For URL encoding, now the new RFC is 3986 and it obsoletes 2396.  It states

<snippet>
   When a new URI scheme defines a component that represents textual
   data consisting of characters from the Universal Character Set [UCS],
   the data should first be encoded as octets according to the UTF-8
   character encoding [STD63]; then only those octets that do not
   correspond to characters in the unreserved set should be percent-
   encoded.  For example, the character A would be represented as "A",
   the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
   as "%C3%80", and the character KATAKANA LETTER A would be represented
   as "%E3%82%A2".
</snippet>

The default URL encoding charset used to be ISO-8859-1, but now it is UTF-8.  The distinction becomes clear when you need to encode Unicode characters that do not exist in the ascii code page.

In any case, I think the right thing to happen here is:
- The default charset for URL encoding queries and forms shall be "ISO-8859-1"
- One should allow the charset to be overridden specifically with "UTF-8" in mind

Thoughts?


  
> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>         Attachments: 3638.patch
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Rick McGuire (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick McGuire closed GERONIMO-3638.
----------------------------------


> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>            Assignee: Rick McGuire
>         Attachments: 3638.patch
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Sangjin Lee (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sangjin Lee updated GERONIMO-3638:
----------------------------------

    Attachment:     (was: patch.zip)

> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>         Attachments: 3638.patch
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Sangjin Lee (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551083 ] 

Sangjin Lee commented on GERONIMO-3638:
---------------------------------------

OK, I think I understand what the right thing here is...  For URL encoding, now the new RFC is 3986 and it obsoletes 2396.  It states

<snippet>
   When a new URI scheme defines a component that represents textual
   data consisting of characters from the Universal Character Set [UCS],
   the data should first be encoded as octets according to the UTF-8
   character encoding [STD63]; then only those octets that do not
   correspond to characters in the unreserved set should be percent-
   encoded.  For example, the character A would be represented as "A",
   the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
   as "%C3%80", and the character KATAKANA LETTER A would be represented
   as "%E3%82%A2".
</snippet>

The default URL encoding charset used to be ISO-8859-1, but now it is UTF-8.  The distinction becomes clear when you need to encode Unicode characters that do not exist in the ascii code page.

In any case, I think the right thing to happen here is:
- The default charset for URL encoding queries and forms shall be "ISO-8859-1"
- One should allow the charset to be overridden specifically with "UTF-8" in mind

Thoughts?



> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>         Attachments: patch.zip
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (GERONIMO-3638) should allow URL encoding with custom encoding charset other than the default

Posted by "Sangjin Lee (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GERONIMO-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sangjin Lee updated GERONIMO-3638:
----------------------------------

    Attachment: 3638.patch

a revised patch

> should allow URL encoding with custom encoding charset other than the default
> -----------------------------------------------------------------------------
>
>                 Key: GERONIMO-3638
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3638
>             Project: Geronimo
>          Issue Type: New Feature
>      Security Level: public(Regular issues) 
>          Components: AsyncHttpClient
>    Affects Versions: 1.x
>            Reporter: Sangjin Lee
>         Attachments: 3638.patch
>
>
> Currently AsyncHttpClient uses Chartset.defaultCharset() when it encodes the query string.  However, applications may want to use a different encoding than the machine default charset; e.g. UTF-8.  It needs to provide a way to specify an encoding that AHC should use to encode the query string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.