You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by Tim Julien <tj...@limewire.com> on 2008/01/18 15:47:49 UTC

unable to encode reserved characters using java.net.URI multi-arg constructors

All,

I've spent a few days looking into some strange URL encoding issues on
http client 4.0 alpha 2.  I'll describe some things I've found, 
hopefully I am thinking about this correctly.

I think there is a regression from 3.0 -> 4.0 due to the use of java.net.URI

On the old commons http client stack, we encoded URLs using
java.net.URLEncoder, and passed them to the
org.apache.commons.httpclient.URI() constructors.  Those constructors
had a boolean parameter that indicated whether the url was encoded.

On the new 4.0 stack, java.net.URI is used instead - and apparently it
has some strange encoding behavior.  For starters, you cannot specify
whether the URL is encoded.  Instead - URI's constructed with the
single-arg constructor are treated as encoded - while URI's constructed
with the multi-arg constructors are treated as un-encoded.  When using 
the multi-arg constructors, java.net.URI will perform encoding for you.

example:
uri = new URI("http", null, "foo.com", -1, "/bar", "a=b&c=jon doe", null);

uri.toASCIIString() -> http://foo.com/bar?a=b&c=jon%20doe

This is correct (the space is encoded to %20).

The trouble comes with certain characters that the URL RFC 2396 
designates as "reserved".  "Reserved" characters are those that help 
give URI's their structure:

reserved = ;" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                     "$" | ","

Those characters are also allowed to be used in a non-reserved fashion - 
for example as values within a query string.  In such cases, you are 
required to URL encode them, effectively "escaping" them.

And it seems that the multi-arg constructors, which do URL encoding for 
you, do NOT provide a way for you to encode these characters - which 
means you can only ever use them for their reserved (unescaped) purpose.

For example, suppose I want to produce this URL:

http://foo.com/bar?a=b&c=jon%26doe

// %26 is the encoded value of &
// %25 is the encoded value of %

uri = new URI("http", null, "foo.com", -1, "/bar", "a=b&c=jon%26doe", null);
uri.toASCIIString() -> http://foo.com/bar?a=b&c=jon%2526doe

// java.net.URI encodes the incoming "%" as %25

uri = new URI("http", null, "foo.com", -1, "/bar", "a=b&c=jon&doe", null);
uri.toASCIIString() -> http://foo.com/bar?a=b&c=jon&doe

// java.net.URI has no way of knowing that the un-escaped "&" is 
//actually a value in the URI

The upshot of all of this is that I claim the multi-arg constructors are 
unusable, unless you restrict your URLs to to never use reserved 
characters as values.  In our use case, we can't do that because we 
don't control what URLs are incoming / outgoing.

(Note that I can produce the desired URIs, if I use the single-arg 
constructor and do all of the encoding myself before hand)

This ends up being a problem on http client 4.0, because the URI passed 
in is reconstructed a few times under the covers by http client - using 
the multi-arg constructors.  I believe that the multi-arg constructors 
have to be replaced with single-arg constructors.

-Tim Julien





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Ortwin Glück <od...@odi.ch>.

Tim Julien wrote:
> 
> For example, suppose I want to produce this URL:
> 
> http://foo.com/bar?a=b&c=jon%26doe
> 
> // %26 is the encoded value of &
> // %25 is the encoded value of %
> 
> uri = new URI("http", null, "foo.com", -1, "/bar", "a=b&c=jon%26doe",
> null);
> uri.toASCIIString() -> http://foo.com/bar?a=b&c=jon%2526doe
> 
> // java.net.URI encodes the incoming "%" as %25

which is by the way incorrect. It should be + according to W3C standards. That's
because this name/value encoding is an HTML thing, that is not directly
specified by the URI spec, but by the W3C (being the HTML authority):

http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

That W3C spec just references the URI spec for all other characters than space.

NB: URI encoding and HTML (!= HTTP) query string encoding are two DISTINCT
algorithms that are not to be confused. They are almost equal, but not
completely. Note that a URI encoded string can be decoded with a HTML query
decoder, but not vice versa (URI decoder does [should] not know how to decode a +).

Please also note that the W3C defines that & and ; are BOTH to be treated as
separators in HTML query strings:

http://www.w3.org/TR/html4/appendix/notes.html#ampersands-in-uris

> uri = new URI("http", null, "foo.com", -1, "/bar", "a=b&c=jon&doe", null);
> uri.toASCIIString() -> http://foo.com/bar?a=b&c=jon&doe
> 
> // java.net.URI has no way of knowing that the un-escaped "&" is
> //actually a value in the URI

Tim, the API Doc of the multi-arg constructor says: "Any character that is not a
legal URI character is quoted." Obviously the implementation does not do that
correctly: it does not escape the ampersand and equals characters. Instead it
tries to be "smart" - which leads to this nonsense behaviour that you're observing.

Just accept that the multi-arg constructor is broken and don't use it.

NB: Whenever you have a HTML query string, that is a list of names and values
separated by = and &, the names AND values MUST HAVE already be encoded. You CAN
NOT apply the encoding afterwards in general anymore in an unambigous way. You
can only apply the encoding as long as you have the names and values separately
(in a HashMap for instance).

Odi

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Fri, 2008-01-18 at 21:35 +0100, Ortwin Glück wrote:
> Oleg Kalnichevski wrote:
> > needs a better parser that can deal with escaped and unescaped queries,
> 
> Sorry, Oleg, nobody can deal with unescaped queries. It's NOT POSSIBLE (tm).
> 

Escape, maybe? ;-)

Oleg


> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Ortwin Glück <od...@odi.ch>.
Oleg Kalnichevski wrote:
> needs a better parser that can deal with escaped and unescaped queries,

Sorry, Oleg, nobody can deal with unescaped queries. It's NOT POSSIBLE (tm).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Sam Berlin <sb...@gmail.com>.
FWIW, replacing the else branch in
DefaultClientRequestDirector.rewriteRequestURI with
-- 
                String path = uri.getRawPath();
                String query = uri.getRawQuery();
                String fragment = uri.getRawFragment();
                String newUri =
                (path == null ? "" : path)
                 + (query == null ? "" : ("?" + query))
                     + (fragment == null ? "" : ("#" + fragment));
                request.setURI(new URI(newUri));
-- 

fixes the test.  I don't know how stable that is, though, and it
certainly doesn't fix the other three uses.  (I tried using relativize
[not resolve], but failed to find any combination of parameters that
worked.)

Sam

On Jan 21, 2008 9:58 PM, Sam Berlin <sb...@gmail.com> wrote:
> So... I got bored and had a little time.  Here's a testcase which
> highlights the URI-rewrite changing the URI for requests.  It only
> tests one example of the URI failures right now:
> DefaultClientRequestDirector.rewriteRequestURI's else branch.  I'm not
> positive how to setup the environment to test the if branch.  The
> other tests can be added to TestRedirects.
>
> Only two testcases fail: testEscapedAmpersandInQueryAbsolute, and
> testEscapedAmpersandInPathAbsolute.  The relative requests pass,
> because no URI-rewriting is done there.
>
> There's probably many more (and other) ways to test it, but assuming
> no special-casing is done in the core to look for ampersands and do
> things specially, whatever fixes this should fix other problems.
>
> Sam
>
>
> On Jan 21, 2008 1:19 PM, Sam Berlin <sb...@gmail.com> wrote:
> > I'm also not certain why the URIs are being recreated, but it's only
> > done in four places in the code.  Two are within
> > DefaultRedirectHandler.getLocationURI, and the other two are within
> > DefaultClientRequestDirector.rewriteRequestURI.  One of them within
> > DefaultRedirectHandler seems to be harmless, since it's just for
> > ensuring there are no circular redirects.  The other method in
> > DefaultRedirectHandler already calls resolve on the newly-built URI.
> >
> > To be honest I have no clue what the methods are trying to do (nor can
> > I understand what's going on in the resolve or relativize mehods), so
> > I'm not sure what'd be required to fix them.  I do think it should be
> > possible to basically replace whatever calls the multi-arg
> > constructors with a quick method toURI(multi-args) that returns a new
> > URI(a+b+c+d+e), essentially concatenating the non-null parts together
> > and returning a URI from the single-arg constructor.  Of course, that
> > could also fail miserably...
> >
> > Tim has looked into this in much greater detail than I, so he likely
> > has more suggestions and/or insight than I can provide.
> >
> > It should be possible to write tests that want to send a request for
> > something like http://localhost/file%20name?a=b&c=%26d, and see if the
> > server gets the correct request.  I did a quick glance through the
> > httpclient tests, but couldn't find a class that was testing similar
> > things.  Where would a test like this go? (And is there any test that
> > does something similar?)
> >
> > Sam
> >
> >
> > On Jan 20, 2008 10:16 AM, Roland Weber <os...@dubioso.net> wrote:
> > > Oleg Kalnichevski wrote:
> > > > On Sat, 2008-01-19 at 09:33 -0800, Sam Berlin wrote:
> > > >> It almost certainly would work, however HttpClient would then be
> > > >> broken (as far as URI parsing goes) for everyone else.  As others have
> > > >> pointed out (and as Tim explained to me in sad detail), URI is just
> > > >> basically broken when it comes to using it with the multi-arg
> > > >> constructors.  It's flat-out impossible to recreate a URI with the
> > > >> multi-arg constructors and have it point to the correct resource.
> > > >
> > > > What would be your suggestion on dealing with the issue? Is there anyway
> > > > we could avoid rewriting the whole URI class and leverage functionality
> > > > already available in the JRE?
> > >
> > > I don't know by heart where we are creating URIs. If path escaping
> > > is the problem, maybe we can use some workaround like:
> > >
> > > URI base = new URI(scheme, hostport, null);
> > > URI full = base.resolve(pathonly); // maps to single-arg constructor
> > >
> > > cheers,
> > >  Roland
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> > > For additional commands, e-mail: dev-help@hc.apache.org
> > >
> > >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Sam Berlin <sb...@gmail.com>.
Oops, sorry about that.  GMail doesn't do a great job letting you know
what your outgoing messages look like to everyone else.  The JIRA is
now at: https://issues.apache.org/jira/browse/HTTPCLIENT-730 .

Sam

On Jan 22, 2008 4:55 AM, Oleg Kalnichevski <ol...@apache.org> wrote:
>
> On Mon, 2008-01-21 at 21:58 -0500, Sam Berlin wrote:
> > So... I got bored and had a little time.  Here's a testcase which
> > highlights the URI-rewrite changing the URI for requests.  It only
> > tests one example of the URI failures right now:
> > DefaultClientRequestDirector.rewriteRequestURI's else branch.  I'm not
> > positive how to setup the environment to test the if branch.  The
> > other tests can be added to TestRedirects.
> >
> > Only two testcases fail: testEscapedAmpersandInQueryAbsolute, and
> > testEscapedAmpersandInPathAbsolute.  The relative requests pass,
> > because no URI-rewriting is done there.
> >
> > There's probably many more (and other) ways to test it, but assuming
> > no special-casing is done in the core to look for ampersands and do
> > things specially, whatever fixes this should fix other problems.
> >
> > Sam
> >
>
> Hi Sam
>
> The attachments apparently got stripped away. Can you open a JIRA for
> this problem and attach the test cases to the report?
>
> Oleg
>
>
>
> > On Jan 21, 2008 1:19 PM, Sam Berlin <sb...@gmail.com> wrote:
> > > I'm also not certain why the URIs are being recreated, but it's only
> > > done in four places in the code.  Two are within
> > > DefaultRedirectHandler.getLocationURI, and the other two are within
> > > DefaultClientRequestDirector.rewriteRequestURI.  One of them within
> > > DefaultRedirectHandler seems to be harmless, since it's just for
> > > ensuring there are no circular redirects.  The other method in
> > > DefaultRedirectHandler already calls resolve on the newly-built URI.
> > >
> > > To be honest I have no clue what the methods are trying to do (nor can
> > > I understand what's going on in the resolve or relativize mehods), so
> > > I'm not sure what'd be required to fix them.  I do think it should be
> > > possible to basically replace whatever calls the multi-arg
> > > constructors with a quick method toURI(multi-args) that returns a new
> > > URI(a+b+c+d+e), essentially concatenating the non-null parts together
> > > and returning a URI from the single-arg constructor.  Of course, that
> > > could also fail miserably...
> > >
> > > Tim has looked into this in much greater detail than I, so he likely
> > > has more suggestions and/or insight than I can provide.
> > >
> > > It should be possible to write tests that want to send a request for
> > > something like http://localhost/file%20name?a=b&c=%26d, and see if the
> > > server gets the correct request.  I did a quick glance through the
> > > httpclient tests, but couldn't find a class that was testing similar
> > > things.  Where would a test like this go? (And is there any test that
> > > does something similar?)
> > >
> > > Sam
> > >
> > >
> > > On Jan 20, 2008 10:16 AM, Roland Weber <os...@dubioso.net> wrote:
> > > > Oleg Kalnichevski wrote:
> > > > > On Sat, 2008-01-19 at 09:33 -0800, Sam Berlin wrote:
> > > > >> It almost certainly would work, however HttpClient would then be
> > > > >> broken (as far as URI parsing goes) for everyone else.  As others have
> > > > >> pointed out (and as Tim explained to me in sad detail), URI is just
> > > > >> basically broken when it comes to using it with the multi-arg
> > > > >> constructors.  It's flat-out impossible to recreate a URI with the
> > > > >> multi-arg constructors and have it point to the correct resource.
> > > > >
> > > > > What would be your suggestion on dealing with the issue? Is there anyway
> > > > > we could avoid rewriting the whole URI class and leverage functionality
> > > > > already available in the JRE?
> > > >
> > > > I don't know by heart where we are creating URIs. If path escaping
> > > > is the problem, maybe we can use some workaround like:
> > > >
> > > > URI base = new URI(scheme, hostport, null);
> > > > URI full = base.resolve(pathonly); // maps to single-arg constructor
> > > >
> > > > cheers,
> > > >  Roland
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> > > > For additional commands, e-mail: dev-help@hc.apache.org
> > > >
> > > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> > For additional commands, e-mail: dev-help@hc.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Mon, 2008-01-21 at 21:58 -0500, Sam Berlin wrote:
> So... I got bored and had a little time.  Here's a testcase which
> highlights the URI-rewrite changing the URI for requests.  It only
> tests one example of the URI failures right now:
> DefaultClientRequestDirector.rewriteRequestURI's else branch.  I'm not
> positive how to setup the environment to test the if branch.  The
> other tests can be added to TestRedirects.
> 
> Only two testcases fail: testEscapedAmpersandInQueryAbsolute, and
> testEscapedAmpersandInPathAbsolute.  The relative requests pass,
> because no URI-rewriting is done there.
> 
> There's probably many more (and other) ways to test it, but assuming
> no special-casing is done in the core to look for ampersands and do
> things specially, whatever fixes this should fix other problems.
> 
> Sam
> 

Hi Sam

The attachments apparently got stripped away. Can you open a JIRA for
this problem and attach the test cases to the report?

Oleg


> On Jan 21, 2008 1:19 PM, Sam Berlin <sb...@gmail.com> wrote:
> > I'm also not certain why the URIs are being recreated, but it's only
> > done in four places in the code.  Two are within
> > DefaultRedirectHandler.getLocationURI, and the other two are within
> > DefaultClientRequestDirector.rewriteRequestURI.  One of them within
> > DefaultRedirectHandler seems to be harmless, since it's just for
> > ensuring there are no circular redirects.  The other method in
> > DefaultRedirectHandler already calls resolve on the newly-built URI.
> >
> > To be honest I have no clue what the methods are trying to do (nor can
> > I understand what's going on in the resolve or relativize mehods), so
> > I'm not sure what'd be required to fix them.  I do think it should be
> > possible to basically replace whatever calls the multi-arg
> > constructors with a quick method toURI(multi-args) that returns a new
> > URI(a+b+c+d+e), essentially concatenating the non-null parts together
> > and returning a URI from the single-arg constructor.  Of course, that
> > could also fail miserably...
> >
> > Tim has looked into this in much greater detail than I, so he likely
> > has more suggestions and/or insight than I can provide.
> >
> > It should be possible to write tests that want to send a request for
> > something like http://localhost/file%20name?a=b&c=%26d, and see if the
> > server gets the correct request.  I did a quick glance through the
> > httpclient tests, but couldn't find a class that was testing similar
> > things.  Where would a test like this go? (And is there any test that
> > does something similar?)
> >
> > Sam
> >
> >
> > On Jan 20, 2008 10:16 AM, Roland Weber <os...@dubioso.net> wrote:
> > > Oleg Kalnichevski wrote:
> > > > On Sat, 2008-01-19 at 09:33 -0800, Sam Berlin wrote:
> > > >> It almost certainly would work, however HttpClient would then be
> > > >> broken (as far as URI parsing goes) for everyone else.  As others have
> > > >> pointed out (and as Tim explained to me in sad detail), URI is just
> > > >> basically broken when it comes to using it with the multi-arg
> > > >> constructors.  It's flat-out impossible to recreate a URI with the
> > > >> multi-arg constructors and have it point to the correct resource.
> > > >
> > > > What would be your suggestion on dealing with the issue? Is there anyway
> > > > we could avoid rewriting the whole URI class and leverage functionality
> > > > already available in the JRE?
> > >
> > > I don't know by heart where we are creating URIs. If path escaping
> > > is the problem, maybe we can use some workaround like:
> > >
> > > URI base = new URI(scheme, hostport, null);
> > > URI full = base.resolve(pathonly); // maps to single-arg constructor
> > >
> > > cheers,
> > >  Roland
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> > > For additional commands, e-mail: dev-help@hc.apache.org
> > >
> > >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Sam Berlin <sb...@gmail.com>.
So... I got bored and had a little time.  Here's a testcase which
highlights the URI-rewrite changing the URI for requests.  It only
tests one example of the URI failures right now:
DefaultClientRequestDirector.rewriteRequestURI's else branch.  I'm not
positive how to setup the environment to test the if branch.  The
other tests can be added to TestRedirects.

Only two testcases fail: testEscapedAmpersandInQueryAbsolute, and
testEscapedAmpersandInPathAbsolute.  The relative requests pass,
because no URI-rewriting is done there.

There's probably many more (and other) ways to test it, but assuming
no special-casing is done in the core to look for ampersands and do
things specially, whatever fixes this should fix other problems.

Sam

On Jan 21, 2008 1:19 PM, Sam Berlin <sb...@gmail.com> wrote:
> I'm also not certain why the URIs are being recreated, but it's only
> done in four places in the code.  Two are within
> DefaultRedirectHandler.getLocationURI, and the other two are within
> DefaultClientRequestDirector.rewriteRequestURI.  One of them within
> DefaultRedirectHandler seems to be harmless, since it's just for
> ensuring there are no circular redirects.  The other method in
> DefaultRedirectHandler already calls resolve on the newly-built URI.
>
> To be honest I have no clue what the methods are trying to do (nor can
> I understand what's going on in the resolve or relativize mehods), so
> I'm not sure what'd be required to fix them.  I do think it should be
> possible to basically replace whatever calls the multi-arg
> constructors with a quick method toURI(multi-args) that returns a new
> URI(a+b+c+d+e), essentially concatenating the non-null parts together
> and returning a URI from the single-arg constructor.  Of course, that
> could also fail miserably...
>
> Tim has looked into this in much greater detail than I, so he likely
> has more suggestions and/or insight than I can provide.
>
> It should be possible to write tests that want to send a request for
> something like http://localhost/file%20name?a=b&c=%26d, and see if the
> server gets the correct request.  I did a quick glance through the
> httpclient tests, but couldn't find a class that was testing similar
> things.  Where would a test like this go? (And is there any test that
> does something similar?)
>
> Sam
>
>
> On Jan 20, 2008 10:16 AM, Roland Weber <os...@dubioso.net> wrote:
> > Oleg Kalnichevski wrote:
> > > On Sat, 2008-01-19 at 09:33 -0800, Sam Berlin wrote:
> > >> It almost certainly would work, however HttpClient would then be
> > >> broken (as far as URI parsing goes) for everyone else.  As others have
> > >> pointed out (and as Tim explained to me in sad detail), URI is just
> > >> basically broken when it comes to using it with the multi-arg
> > >> constructors.  It's flat-out impossible to recreate a URI with the
> > >> multi-arg constructors and have it point to the correct resource.
> > >
> > > What would be your suggestion on dealing with the issue? Is there anyway
> > > we could avoid rewriting the whole URI class and leverage functionality
> > > already available in the JRE?
> >
> > I don't know by heart where we are creating URIs. If path escaping
> > is the problem, maybe we can use some workaround like:
> >
> > URI base = new URI(scheme, hostport, null);
> > URI full = base.resolve(pathonly); // maps to single-arg constructor
> >
> > cheers,
> >  Roland
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> > For additional commands, e-mail: dev-help@hc.apache.org
> >
> >
>


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Sam Berlin <sb...@gmail.com>.
I'm also not certain why the URIs are being recreated, but it's only
done in four places in the code.  Two are within
DefaultRedirectHandler.getLocationURI, and the other two are within
DefaultClientRequestDirector.rewriteRequestURI.  One of them within
DefaultRedirectHandler seems to be harmless, since it's just for
ensuring there are no circular redirects.  The other method in
DefaultRedirectHandler already calls resolve on the newly-built URI.

To be honest I have no clue what the methods are trying to do (nor can
I understand what's going on in the resolve or relativize mehods), so
I'm not sure what'd be required to fix them.  I do think it should be
possible to basically replace whatever calls the multi-arg
constructors with a quick method toURI(multi-args) that returns a new
URI(a+b+c+d+e), essentially concatenating the non-null parts together
and returning a URI from the single-arg constructor.  Of course, that
could also fail miserably...

Tim has looked into this in much greater detail than I, so he likely
has more suggestions and/or insight than I can provide.

It should be possible to write tests that want to send a request for
something like http://localhost/file%20name?a=b&c=%26d, and see if the
server gets the correct request.  I did a quick glance through the
httpclient tests, but couldn't find a class that was testing similar
things.  Where would a test like this go? (And is there any test that
does something similar?)

Sam

On Jan 20, 2008 10:16 AM, Roland Weber <os...@dubioso.net> wrote:
> Oleg Kalnichevski wrote:
> > On Sat, 2008-01-19 at 09:33 -0800, Sam Berlin wrote:
> >> It almost certainly would work, however HttpClient would then be
> >> broken (as far as URI parsing goes) for everyone else.  As others have
> >> pointed out (and as Tim explained to me in sad detail), URI is just
> >> basically broken when it comes to using it with the multi-arg
> >> constructors.  It's flat-out impossible to recreate a URI with the
> >> multi-arg constructors and have it point to the correct resource.
> >
> > What would be your suggestion on dealing with the issue? Is there anyway
> > we could avoid rewriting the whole URI class and leverage functionality
> > already available in the JRE?
>
> I don't know by heart where we are creating URIs. If path escaping
> is the problem, maybe we can use some workaround like:
>
> URI base = new URI(scheme, hostport, null);
> URI full = base.resolve(pathonly); // maps to single-arg constructor
>
> cheers,
>  Roland
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Roland Weber <os...@dubioso.net>.
Oleg Kalnichevski wrote:
> On Sat, 2008-01-19 at 09:33 -0800, Sam Berlin wrote:
>> It almost certainly would work, however HttpClient would then be
>> broken (as far as URI parsing goes) for everyone else.  As others have
>> pointed out (and as Tim explained to me in sad detail), URI is just
>> basically broken when it comes to using it with the multi-arg
>> constructors.  It's flat-out impossible to recreate a URI with the
>> multi-arg constructors and have it point to the correct resource.
> 
> What would be your suggestion on dealing with the issue? Is there anyway
> we could avoid rewriting the whole URI class and leverage functionality
> already available in the JRE?  

I don't know by heart where we are creating URIs. If path escaping
is the problem, maybe we can use some workaround like:

URI base = new URI(scheme, hostport, null);
URI full = base.resolve(pathonly); // maps to single-arg constructor

cheers,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Ortwin Glück <od...@odi.ch>.
Tim Julien wrote:
> Also I think a bug should be filed against the JDK; I think this is a 
> design bug.

They will never fix design bugs in existing code. As it may break 
existing applications... They fix design bugs by deprecation and new 
APIs. But Sun is a completely different story anyway.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Tim Julien <tj...@limewire.com>.
Oleg Kalnichevski wrote:
> On Sat, 2008-01-19 at 09:33 -0800, Sam Berlin wrote:
>>> Here's my take. There is nothing wrong with j.u.URI as such. It just
>>> needs a better parser that can deal with escaped and unescaped queries,
>>> as well as be more lenient about common non-compliant behaviors, and
>>> then construct the URI instance using a multi-arg constructor. It was
>>> long on my virtual to-do list to open a feature request for pluggable
>>> URI parsers in JIRA. Probably it is about time.
>>>
>>> Would that work for LimeWire?
>>>
>>> Oleg
>>>
>> It almost certainly would work, however HttpClient would then be
>> broken (as far as URI parsing goes) for everyone else.  As others have
>> pointed out (and as Tim explained to me in sad detail), URI is just
>> basically broken when it comes to using it with the multi-arg
>> constructors.  It's flat-out impossible to recreate a URI with the
>> multi-arg constructors and have it point to the correct resource.
>>
>> Sam
>>
> 
> Sam
> 
> What would be your suggestion on dealing with the issue? Is there anyway
> we could avoid rewriting the whole URI class and leverage functionality
> already available in the JRE?  

In the short term, I think all multi-arg constructors have to be 
replaced with single-arg ones (like Sam's patch in 
https://issues.apache.org/jira/browse/HTTPCLIENT-730).

For correctness - this may or may not require re-implementing much of 
the j.n.URI class. I think we could probably get away with just stealing 
j.n.URI.defineString() (private method).

Also I think a bug should be filed against the JDK; I think this is a 
design bug.

And we need to document that users of httpclient should NOT use the 
multi-arg constructors.

> 
> Oleg
> 
> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
>> For additional commands, e-mail: dev-help@hc.apache.org
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Sat, 2008-01-19 at 09:33 -0800, Sam Berlin wrote:
> > Here's my take. There is nothing wrong with j.u.URI as such. It just
> > needs a better parser that can deal with escaped and unescaped queries,
> > as well as be more lenient about common non-compliant behaviors, and
> > then construct the URI instance using a multi-arg constructor. It was
> > long on my virtual to-do list to open a feature request for pluggable
> > URI parsers in JIRA. Probably it is about time.
> >
> > Would that work for LimeWire?
> >
> > Oleg
> >
> 
> It almost certainly would work, however HttpClient would then be
> broken (as far as URI parsing goes) for everyone else.  As others have
> pointed out (and as Tim explained to me in sad detail), URI is just
> basically broken when it comes to using it with the multi-arg
> constructors.  It's flat-out impossible to recreate a URI with the
> multi-arg constructors and have it point to the correct resource.
> 
> Sam
> 

Sam

What would be your suggestion on dealing with the issue? Is there anyway
we could avoid rewriting the whole URI class and leverage functionality
already available in the JRE?  

Oleg


> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Sam Berlin <sb...@gmail.com>.
> Here's my take. There is nothing wrong with j.u.URI as such. It just
> needs a better parser that can deal with escaped and unescaped queries,
> as well as be more lenient about common non-compliant behaviors, and
> then construct the URI instance using a multi-arg constructor. It was
> long on my virtual to-do list to open a feature request for pluggable
> URI parsers in JIRA. Probably it is about time.
>
> Would that work for LimeWire?
>
> Oleg
>

It almost certainly would work, however HttpClient would then be
broken (as far as URI parsing goes) for everyone else.  As others have
pointed out (and as Tim explained to me in sad detail), URI is just
basically broken when it comes to using it with the multi-arg
constructors.  It's flat-out impossible to recreate a URI with the
multi-arg constructors and have it point to the correct resource.

Sam

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: unable to encode reserved characters using java.net.URI multi-arg constructors

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Fri, 2008-01-18 at 09:47 -0500, Tim Julien wrote:
> All,
> 
> I've spent a few days looking into some strange URL encoding issues on
> http client 4.0 alpha 2.  I'll describe some things I've found, 
> hopefully I am thinking about this correctly.
> 
> I think there is a regression from 3.0 -> 4.0 due to the use of java.net.URI
> 

That was to be expected. The sole reason for not porting URI class from
HttpClient 3.x and using j.u.URI instead is the fact that the URI code
is a horrible mess no one wants to maintain, even though it arguably has
a more flexible API.  


> On the old commons http client stack, we encoded URLs using
> java.net.URLEncoder, and passed them to the
> org.apache.commons.httpclient.URI() constructors.  Those constructors
> had a boolean parameter that indicated whether the url was encoded.
> 
> On the new 4.0 stack, java.net.URI is used instead - and apparently it
> has some strange encoding behavior.  For starters, you cannot specify
> whether the URL is encoded.  Instead - URI's constructed with the
> single-arg constructor are treated as encoded - while URI's constructed
> with the multi-arg constructors are treated as un-encoded.  When using 
> the multi-arg constructors, java.net.URI will perform encoding for you.
> 
> example:
> uri = new URI("http", null, "foo.com", -1, "/bar", "a=b&c=jon doe", null);
> 
> uri.toASCIIString() -> http://foo.com/bar?a=b&c=jon%20doe
> 
> This is correct (the space is encoded to %20).
> 
> The trouble comes with certain characters that the URL RFC 2396 
> designates as "reserved".  "Reserved" characters are those that help 
> give URI's their structure:
> 
> reserved = ;" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
>                      "$" | ","
> 
> Those characters are also allowed to be used in a non-reserved fashion - 
> for example as values within a query string.  In such cases, you are 
> required to URL encode them, effectively "escaping" them.
> 
> And it seems that the multi-arg constructors, which do URL encoding for 
> you, do NOT provide a way for you to encode these characters - which 
> means you can only ever use them for their reserved (unescaped) purpose.
> 
> For example, suppose I want to produce this URL:
> 
> http://foo.com/bar?a=b&c=jon%26doe
> 
> // %26 is the encoded value of &
> // %25 is the encoded value of %
> 
> uri = new URI("http", null, "foo.com", -1, "/bar", "a=b&c=jon%26doe", null);
> uri.toASCIIString() -> http://foo.com/bar?a=b&c=jon%2526doe
> 
> // java.net.URI encodes the incoming "%" as %25
> 
> uri = new URI("http", null, "foo.com", -1, "/bar", "a=b&c=jon&doe", null);
> uri.toASCIIString() -> http://foo.com/bar?a=b&c=jon&doe
> 
> // java.net.URI has no way of knowing that the un-escaped "&" is 
> //actually a value in the URI
> 
> The upshot of all of this is that I claim the multi-arg constructors are 
> unusable, unless you restrict your URLs to to never use reserved 
> characters as values.  In our use case, we can't do that because we 
> don't control what URLs are incoming / outgoing.
> 
> (Note that I can produce the desired URIs, if I use the single-arg 
> constructor and do all of the encoding myself before hand)
> 

Here's my take. There is nothing wrong with j.u.URI as such. It just
needs a better parser that can deal with escaped and unescaped queries,
as well as be more lenient about common non-compliant behaviors, and
then construct the URI instance using a multi-arg constructor. It was
long on my virtual to-do list to open a feature request for pluggable
URI parsers in JIRA. Probably it is about time.

Would that work for LimeWire?

Oleg

> This ends up being a problem on http client 4.0, because the URI passed 
> in is reconstructed a few times under the covers by http client - using 
> the multi-arg constructors.  I believe that the multi-arg constructors 
> have to be replaced with single-arg constructors.
> 
> -Tim Julien
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org