You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Eugeny N Dzhurinsky <bo...@redwerk.com> on 2009/03/11 22:09:09 UTC

Weird issue with '+' symbols in path?

Hello there!

I've recently stumbled over the weird issue with + symbols when using Commons
HttpClient 3.1. There is the unit test, which illustrates the problem below:

========================================================================================
import junit.framework.TestCase;

import org.apache.commons.httpclient.URI;

/**
 * Tests the escaping issue in URI class
 */
public class TestURIEscaping extends TestCase {

        private static final String SAMPLE_URI = "http://www.fulltiltpoker.com/hu/pro-chat-transcript/Chau+Giang/1233971932";

        public void testURIEscaping() throws Exception {
                URI uri = new URI(
                                SAMPLE_URI,
                                false, "latin1");
                assertEquals(SAMPLE_URI, uri.toString());
        }

}
========================================================================================

Surprisingly the test fails!

After I dig into the code of the URI class, I've noticed there is the such
strange code exists, like listed below:

    /**
     * Those characters that are allowed for the abs_path.
     */
    public static final BitSet allowed_abs_path = new BitSet(256);
    // Static initializer for allowed_abs_path
    static {
        allowed_abs_path.or(abs_path);
        // allowed_abs_path.set('/');  // aleady included
        allowed_abs_path.andNot(percent);
        allowed_abs_path.clear('+');
    }

and looks like the '+' character is always replaced with it's hex code %2B.
But as far as I remember, there is the RFC 2396

http://www.ietf.org/rfc/rfc2396.txt

and in the section 

3.3. Path Component

   The path component contains data, specific to the authority (or the
   scheme if there is no authority component), identifying the resource
   within the scope of that scheme and authority.

      path          = [ abs_path | opaque_part ]

      path_segments = segment *( "/" segment )
      segment       = *pchar *( ";" param )
      param         = *pchar

      pchar         = unreserved | escaped |
                      ":" | "@" | "&" | "=" | "+" | "$" | ","

   The path may consist of a sequence of path segments separated by a
   single slash "/" character.  Within a path segment, the characters
   "/", ";", "=", and "?" are reserved.  Each path segment may include a
   sequence of parameters, indicated by the semicolon ";" character.
   The parameters are not significant to the parsing of relative
   references.

So as per this section, the '+' character should never be escaped! So why does
the HttpClient violates this RFC? Or I did not understand something properly?

Thank you in advance!

-- 
Eugene N Dzhurinsky

Re: Weird issue with '+' symbols in path?

Posted by Oleg Kalnichevski <ol...@apache.org>.
Eugeny N Dzhurinsky wrote:
> Hello there!
> 
> I've recently stumbled over the weird issue with + symbols when using Commons
> HttpClient 3.1. There is the unit test, which illustrates the problem below:
> 
> ========================================================================================
> import junit.framework.TestCase;
> 
> import org.apache.commons.httpclient.URI;
> 
> /**
>  * Tests the escaping issue in URI class
>  */
> public class TestURIEscaping extends TestCase {
> 
>         private static final String SAMPLE_URI = "http://www.fulltiltpoker.com/hu/pro-chat-transcript/Chau+Giang/1233971932";
> 
>         public void testURIEscaping() throws Exception {
>                 URI uri = new URI(
>                                 SAMPLE_URI,
>                                 false, "latin1");
>                 assertEquals(SAMPLE_URI, uri.toString());
>         }
> 
> }
> ========================================================================================
> 
> Surprisingly the test fails!
> 
> After I dig into the code of the URI class, I've noticed there is the such
> strange code exists, like listed below:
> 
>     /**
>      * Those characters that are allowed for the abs_path.
>      */
>     public static final BitSet allowed_abs_path = new BitSet(256);
>     // Static initializer for allowed_abs_path
>     static {
>         allowed_abs_path.or(abs_path);
>         // allowed_abs_path.set('/');  // aleady included
>         allowed_abs_path.andNot(percent);
>         allowed_abs_path.clear('+');
>     }
> 
> and looks like the '+' character is always replaced with it's hex code %2B.
> But as far as I remember, there is the RFC 2396
> 
> http://www.ietf.org/rfc/rfc2396.txt
> 
> and in the section 
> 
> 3.3. Path Component
> 
>    The path component contains data, specific to the authority (or the
>    scheme if there is no authority component), identifying the resource
>    within the scope of that scheme and authority.
> 
>       path          = [ abs_path | opaque_part ]
> 
>       path_segments = segment *( "/" segment )
>       segment       = *pchar *( ";" param )
>       param         = *pchar
> 
>       pchar         = unreserved | escaped |
>                       ":" | "@" | "&" | "=" | "+" | "$" | ","
> 
>    The path may consist of a sequence of path segments separated by a
>    single slash "/" character.  Within a path segment, the characters
>    "/", ";", "=", and "?" are reserved.  Each path segment may include a
>    sequence of parameters, indicated by the semicolon ";" character.
>    The parameters are not significant to the parsing of relative
>    references.
> 
> So as per this section, the '+' character should never be escaped! So why does
> the HttpClient violates this RFC? Or I did not understand something properly?
> 
> Thank you in advance!
> 


There will be no fixes in HttpClient 3.x except for critical security 
bugs. Consider migrating to HttpClient 4.0

Oleg

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org