You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Christopher BROWN <br...@reflexe.fr> on 2014/03/25 00:01:03 UTC

Correct encoding and decoding of HTTP path segments

Hello,

This article:
http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding

...refers to the pitfalls of using Java's standard URLEncoder / URLDecoder
classes for anything other than *form* encoding.  In particular, it is
unsuitable for path segments, and has no concept of matrix parameters (such
as ;jsessionid=xxx but appearing in any segment).  The standard URI class
in the JDK doesn't appear to provide much in the way of solutions either.

Given that when encoding and decoding URLs/URIs, I can first split into
segments (easy enough splitting on "/"), does anyone on this HTTP client
list have any recommendations for correctly encoding and decoding each
individual path segment (and extracting matrix parameters)?  After
searching, I've found no standalone class (read: without being bloated or
loaded with dependencies) that can both encode and decode.  Given how many
implementors seem to have got it more or less wrong, I'm humble enough not
to try it myself unless there really is no other choice.

Thanks,
Christopher

Re: Correct encoding and decoding of HTTP path segments

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Tue, 2014-03-25 at 00:03 +0000, sebb wrote:
> On 24 March 2014 23:01, Christopher BROWN <br...@reflexe.fr> wrote:
> > Hello,
> >
> > This article:
> > http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding
> >
> > ...refers to the pitfalls of using Java's standard URLEncoder / URLDecoder
> > classes for anything other than *form* encoding.  In particular, it is
> > unsuitable for path segments, and has no concept of matrix parameters (such
> > as ;jsessionid=xxx but appearing in any segment).  The standard URI class
> > in the JDK doesn't appear to provide much in the way of solutions either.
> >
> > Given that when encoding and decoding URLs/URIs, I can first split into
> > segments (easy enough splitting on "/"), does anyone on this HTTP client
> > list have any recommendations for correctly encoding and decoding each
> > individual path segment (and extracting matrix parameters)?  After
> > searching, I've found no standalone class (read: without being bloated or
> > loaded with dependencies) that can both encode and decode.  Given how many
> > implementors seem to have got it more or less wrong, I'm humble enough not
> > to try it myself unless there really is no other choice.
> 
> The java.net.URI class is worth a careful review.
> 
> It has constructors that encode parameters and a constructor that
> decodes its parameter.
> 

In addition you may want to take a look at the URIBuilder, which also
can do query parsing / formatting for you.

Oleg



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Correct encoding and decoding of HTTP path segments

Posted by Christopher BROWN <br...@reflexe.fr>.
It seems that using the different constructors (encoding, decoding) of
java.net.URI helps in most cases I have to deal with (for example I can
pass it one "directory" in a longer path with encoded characters to the
single-String constructor to decode, or use multi-parameter constructors
for decoding).  It doesn't help me with URL encoding query parameters
because it won't encode "?", "=", or "&" (it looks like it's designed to
operate on the full query including -- without transforming -- the
different separators).

For encoding query parts, I resorted to using URLEncoder and a quick pass
over that to replace the remaining "+" characters to %20 for (hopefully)
better compliance and interoperability (not ambiguous).

I did consider using Oleg's suggestion of URIBuilder, but it doesn't appear
to cover matrix parameters on path segments (even if it does have helpers
for query parameters).

Thanks,
Christopher



On 25 March 2014 01:03, sebb <se...@gmail.com> wrote:

> On 24 March 2014 23:01, Christopher BROWN <br...@reflexe.fr> wrote:
> > Hello,
> >
> > This article:
> >
> http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding
> >
> > ...refers to the pitfalls of using Java's standard URLEncoder /
> URLDecoder
> > classes for anything other than *form* encoding.  In particular, it is
> > unsuitable for path segments, and has no concept of matrix parameters
> (such
> > as ;jsessionid=xxx but appearing in any segment).  The standard URI class
> > in the JDK doesn't appear to provide much in the way of solutions either.
> >
> > Given that when encoding and decoding URLs/URIs, I can first split into
> > segments (easy enough splitting on "/"), does anyone on this HTTP client
> > list have any recommendations for correctly encoding and decoding each
> > individual path segment (and extracting matrix parameters)?  After
> > searching, I've found no standalone class (read: without being bloated or
> > loaded with dependencies) that can both encode and decode.  Given how
> many
> > implementors seem to have got it more or less wrong, I'm humble enough
> not
> > to try it myself unless there really is no other choice.
>
> The java.net.URI class is worth a careful review.
>
> It has constructors that encode parameters and a constructor that
> decodes its parameter.
>
> > Thanks,
> > Christopher
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>
>

Re: Correct encoding and decoding of HTTP path segments

Posted by sebb <se...@gmail.com>.
On 24 March 2014 23:01, Christopher BROWN <br...@reflexe.fr> wrote:
> Hello,
>
> This article:
> http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding
>
> ...refers to the pitfalls of using Java's standard URLEncoder / URLDecoder
> classes for anything other than *form* encoding.  In particular, it is
> unsuitable for path segments, and has no concept of matrix parameters (such
> as ;jsessionid=xxx but appearing in any segment).  The standard URI class
> in the JDK doesn't appear to provide much in the way of solutions either.
>
> Given that when encoding and decoding URLs/URIs, I can first split into
> segments (easy enough splitting on "/"), does anyone on this HTTP client
> list have any recommendations for correctly encoding and decoding each
> individual path segment (and extracting matrix parameters)?  After
> searching, I've found no standalone class (read: without being bloated or
> loaded with dependencies) that can both encode and decode.  Given how many
> implementors seem to have got it more or less wrong, I'm humble enough not
> to try it myself unless there really is no other choice.

The java.net.URI class is worth a careful review.

It has constructors that encode parameters and a constructor that
decodes its parameter.

> Thanks,
> Christopher

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org