You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by khiem nguyen <kh...@googlemail.com> on 2011/06/29 12:21:59 UTC

double-slash in url causes circular redirect

Hi, i tried to retrieve the content of this link:

http://de.tommy.com//Sale/600000,de_DE,sc.html


& got circular redirect, logging tells me that httpclient fires : GET
/Sale/600000,de_DE,sc.html
server response with redirect back to
http://de.tommy.com//Sale/600000,de_DE,sc.html

wget behaves like browser & gives back the content.


with telnet:


telnet de.tommy.com 80
Trying 89.202.105.72...
Connected to de.tommy.com.
Escape character is '^]'.
GET /Sale/600000,de_DE,sc.html HTTP/1.1
Host:de.tommy.com

HTTP/1.1 301 Moved Permanently
Date: Wed, 29 Jun 2011 10:11:15 GMT
Server: Apache
Content-Length: 0
Set-Cookie: dwsid=
CvVvWMuShdGfstjxicXY9lJb8Fk8gkMT8xV8zGEU_X1Y81Rt4F-469BS_cTJZ4hHcE7f5NVeacb1VKcXHFEKGg==;
path=/; HttpOnly
Cache-Control: no-cache,no-store,must-revalidate
Pragma: no-cache
Expires: Thu, 01 Dec 1994 16:00:00 GMT
Location: http://de.tommy.com//Sale/600000,de_DE,sc.html
Vary: Accept-Encoding
Accept-Ranges: bytes
Content-Type: text/plain

Connection closed by foreign host.
-----


de.tommy.com 80
Trying 89.202.105.72...
Connected to de.tommy.com.
Escape character is '^]'.
GET //Sale/600000,de_DE,sc.html HTTP/1.1
Host: de.tommy.com

HTTP/1.1 200 OK
Date: Wed, 29 Jun 2011 10:07:11 GMT
Server: Apache
Set-Cookie: ....
....content


...

seems like httpclient strip out one of the 2 slashes.
is it a bug or the server is misconfigured ( i guess they use rewrite or
something but its not rare)

how can i fix this ?
thanx

Re: double-slash in url causes circular redirect

Posted by khiem nguyen <kh...@googlemail.com>.
i'll check it.
thanx  alot

On Thu, Jun 30, 2011 at 3:44 PM, Oleg Kalnichevski <ol...@apache.org> wrote:

> On Thu, 2011-06-30 at 14:52 +0200, khiem nguyen wrote:
> > well, before they correct this on the server-side-configuration,i still
> need
> > to handle this as "tolerant" as possible, that means get the content of
> the
> > site just like telnet/wget or browser does. i want to make httpclient,
> after
> > eg. 2 times get redirected with an invalid uri, just user //path-whatever
> > instead of trying to cut off to /path-whatever..
> >
> > thanx
> >
> >
>
> RedirectStrategy is your friend.
>
> Oleg
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>
>

Re: double-slash in url causes circular redirect

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Thu, 2011-06-30 at 14:52 +0200, khiem nguyen wrote:
> well, before they correct this on the server-side-configuration,i still need
> to handle this as "tolerant" as possible, that means get the content of the
> site just like telnet/wget or browser does. i want to make httpclient, after
> eg. 2 times get redirected with an invalid uri, just user //path-whatever
> instead of trying to cut off to /path-whatever..
> 
> thanx
> 
> 

RedirectStrategy is your friend.

Oleg 



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: double-slash in url causes circular redirect

Posted by khiem nguyen <kh...@googlemail.com>.
well, before they correct this on the server-side-configuration,i still need
to handle this as "tolerant" as possible, that means get the content of the
site just like telnet/wget or browser does. i want to make httpclient, after
eg. 2 times get redirected with an invalid uri, just user //path-whatever
instead of trying to cut off to /path-whatever..

thanx



On Thu, Jun 30, 2011 at 2:05 PM, Oleg Kalnichevski <ol...@apache.org> wrote:

> On Thu, 2011-06-30 at 11:45 +0200, khiem nguyen wrote:
> > i dont think it's problem of redirect here,
>
>
> Well, it is. The redirect location is invalid and leads to the following
> request having an ambiguous request-URI
>
> >  i'm using httclient for proxying
> > request from browser & just handle redirect-url back to browser , which
> in
> > turn always the same, httpclient fires /Sale.... instead of //Sale ...,
> > server redirect with  ...//Sale/... again
> >
>
> //Sale is not a valid URI.
>
> > where can override this behavior ?
>
> See my previous message.
>
> Oleg
>
> > thanx alot
> >
> >
> > On Wed, Jun 29, 2011 at 9:10 PM, Oleg Kalnichevski <ol...@apache.org>
> wrote:
> >
> > > On Wed, 2011-06-29 at 12:21 +0200, khiem nguyen wrote:
> > > > Hi, i tried to retrieve the content of this link:
> > > >
> > > > http://de.tommy.com//Sale/600000,de_DE,sc.html
> > > >
> > > >
> > > > & got circular redirect, logging tells me that httpclient fires : GET
> > > > /Sale/600000,de_DE,sc.html
> > > > server response with redirect back to
> > > > http://de.tommy.com//Sale/600000,de_DE,sc.html
> > > >
> > > > wget behaves like browser & gives back the content.
> > > >
> > > >
> > > > with telnet:
> > > >
> > > >
> > > > telnet de.tommy.com 80
> > > > Trying 89.202.105.72...
> > > > Connected to de.tommy.com.
> > > > Escape character is '^]'.
> > > > GET /Sale/600000,de_DE,sc.html HTTP/1.1
> > > > Host:de.tommy.com
> > > >
> > > > HTTP/1.1 301 Moved Permanently
> > > > Date: Wed, 29 Jun 2011 10:11:15 GMT
> > > > Server: Apache
> > > > Content-Length: 0
> > > > Set-Cookie: dwsid=
> > > >
> > >
> CvVvWMuShdGfstjxicXY9lJb8Fk8gkMT8xV8zGEU_X1Y81Rt4F-469BS_cTJZ4hHcE7f5NVeacb1VKcXHFEKGg==;
> > > > path=/; HttpOnly
> > > > Cache-Control: no-cache,no-store,must-revalidate
> > > > Pragma: no-cache
> > > > Expires: Thu, 01 Dec 1994 16:00:00 GMT
> > > > Location: http://de.tommy.com//Sale/600000,de_DE,sc.html
> > > > Vary: Accept-Encoding
> > > > Accept-Ranges: bytes
> > > > Content-Type: text/plain
> > > >
> > > > Connection closed by foreign host.
> > > > -----
> > > >
> > > >
> > > > de.tommy.com 80
> > > > Trying 89.202.105.72...
> > > > Connected to de.tommy.com.
> > > > Escape character is '^]'.
> > > > GET //Sale/600000,de_DE,sc.html HTTP/1.1
> > > > Host: de.tommy.com
> > > >
> > > > HTTP/1.1 200 OK
> > > > Date: Wed, 29 Jun 2011 10:07:11 GMT
> > > > Server: Apache
> > > > Set-Cookie: ....
> > > > ....content
> > > >
> > > >
> > > > ...
> > > >
> > > > seems like httpclient strip out one of the 2 slashes.
> > > > is it a bug or the server is misconfigured ( i guess they use rewrite
> or
> > > > something but its not rare)
> > > >
> > > > how can i fix this ?
> > > > thanx
> > >
> > > The redirect returned by the server is malformed
> > >
> > > http://www.ietf.org/rfc/rfc2396.txt
> > >
> > > ---
> > > 3.3. Path Component
> > >
> > >   The path component contains data, specific to the authority (or the
> > >   scheme if there is no authority component), identifying the resource
> > >   within the scope of that scheme and authority.
> > >
> > >      path          = [ abs_path | opaque_part ]
> > >
> > >      path_segments = segment *( "/" segment )
> > >      segment       = *pchar *( ";" param )
> > >      param         = *pchar
> > >
> > >      pchar         = unreserved | escaped |
> > >                      ":" | "@" | "&" | "=" | "+" | "$" | ","
> > >
> > >   The path may consist of a sequence of path segments separated by a
> > >   single slash "/" character.  Within a path segment, the characters
> > >   "/", ";", "=", and "?" are reserved.  Each path segment may include a
> > >   sequence of parameters, indicated by the semicolon ";" character.
> > >   The parameters are not significant to the parsing of relative
> > >   references.
> > >
> > > ---
> > > The path element of the URI is not supposed to have multiple
> consecutive
> > > slashes. Such URIs are ambiguous and whichever way HttpClient tries to
> > > normalize them it cannot get it right all the time. You have two
> options
> > > here: turning off automatic redirect and handling redirects manually or
> > > building a custom RedirectStrategy.
> > >
> > > Hope this helps
> > >
> > > Oleg
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> > > For additional commands, e-mail: httpclient-users-help@hc.apache.org
> > >
> > >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>
>

Re: double-slash in url causes circular redirect

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Thu, 2011-06-30 at 11:45 +0200, khiem nguyen wrote:
> i dont think it's problem of redirect here,


Well, it is. The redirect location is invalid and leads to the following
request having an ambiguous request-URI 

>  i'm using httclient for proxying
> request from browser & just handle redirect-url back to browser , which in
> turn always the same, httpclient fires /Sale.... instead of //Sale ...,
> server redirect with  ...//Sale/... again
> 

//Sale is not a valid URI.

> where can override this behavior ?

See my previous message.

Oleg

> thanx alot
> 
> 
> On Wed, Jun 29, 2011 at 9:10 PM, Oleg Kalnichevski <ol...@apache.org> wrote:
> 
> > On Wed, 2011-06-29 at 12:21 +0200, khiem nguyen wrote:
> > > Hi, i tried to retrieve the content of this link:
> > >
> > > http://de.tommy.com//Sale/600000,de_DE,sc.html
> > >
> > >
> > > & got circular redirect, logging tells me that httpclient fires : GET
> > > /Sale/600000,de_DE,sc.html
> > > server response with redirect back to
> > > http://de.tommy.com//Sale/600000,de_DE,sc.html
> > >
> > > wget behaves like browser & gives back the content.
> > >
> > >
> > > with telnet:
> > >
> > >
> > > telnet de.tommy.com 80
> > > Trying 89.202.105.72...
> > > Connected to de.tommy.com.
> > > Escape character is '^]'.
> > > GET /Sale/600000,de_DE,sc.html HTTP/1.1
> > > Host:de.tommy.com
> > >
> > > HTTP/1.1 301 Moved Permanently
> > > Date: Wed, 29 Jun 2011 10:11:15 GMT
> > > Server: Apache
> > > Content-Length: 0
> > > Set-Cookie: dwsid=
> > >
> > CvVvWMuShdGfstjxicXY9lJb8Fk8gkMT8xV8zGEU_X1Y81Rt4F-469BS_cTJZ4hHcE7f5NVeacb1VKcXHFEKGg==;
> > > path=/; HttpOnly
> > > Cache-Control: no-cache,no-store,must-revalidate
> > > Pragma: no-cache
> > > Expires: Thu, 01 Dec 1994 16:00:00 GMT
> > > Location: http://de.tommy.com//Sale/600000,de_DE,sc.html
> > > Vary: Accept-Encoding
> > > Accept-Ranges: bytes
> > > Content-Type: text/plain
> > >
> > > Connection closed by foreign host.
> > > -----
> > >
> > >
> > > de.tommy.com 80
> > > Trying 89.202.105.72...
> > > Connected to de.tommy.com.
> > > Escape character is '^]'.
> > > GET //Sale/600000,de_DE,sc.html HTTP/1.1
> > > Host: de.tommy.com
> > >
> > > HTTP/1.1 200 OK
> > > Date: Wed, 29 Jun 2011 10:07:11 GMT
> > > Server: Apache
> > > Set-Cookie: ....
> > > ....content
> > >
> > >
> > > ...
> > >
> > > seems like httpclient strip out one of the 2 slashes.
> > > is it a bug or the server is misconfigured ( i guess they use rewrite or
> > > something but its not rare)
> > >
> > > how can i fix this ?
> > > thanx
> >
> > The redirect returned by the server is malformed
> >
> > http://www.ietf.org/rfc/rfc2396.txt
> >
> > ---
> > 3.3. Path Component
> >
> >   The path component contains data, specific to the authority (or the
> >   scheme if there is no authority component), identifying the resource
> >   within the scope of that scheme and authority.
> >
> >      path          = [ abs_path | opaque_part ]
> >
> >      path_segments = segment *( "/" segment )
> >      segment       = *pchar *( ";" param )
> >      param         = *pchar
> >
> >      pchar         = unreserved | escaped |
> >                      ":" | "@" | "&" | "=" | "+" | "$" | ","
> >
> >   The path may consist of a sequence of path segments separated by a
> >   single slash "/" character.  Within a path segment, the characters
> >   "/", ";", "=", and "?" are reserved.  Each path segment may include a
> >   sequence of parameters, indicated by the semicolon ";" character.
> >   The parameters are not significant to the parsing of relative
> >   references.
> >
> > ---
> > The path element of the URI is not supposed to have multiple consecutive
> > slashes. Such URIs are ambiguous and whichever way HttpClient tries to
> > normalize them it cannot get it right all the time. You have two options
> > here: turning off automatic redirect and handling redirects manually or
> > building a custom RedirectStrategy.
> >
> > Hope this helps
> >
> > Oleg
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> > For additional commands, e-mail: httpclient-users-help@hc.apache.org
> >
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: double-slash in url causes circular redirect

Posted by khiem nguyen <kh...@googlemail.com>.
i dont think it's problem of redirect here, i'm using httclient for proxying
request from browser & just handle redirect-url back to browser , which in
turn always the same, httpclient fires /Sale.... instead of //Sale ...,
server redirect with  ...//Sale/... again

where can override this behavior ?
thanx alot


On Wed, Jun 29, 2011 at 9:10 PM, Oleg Kalnichevski <ol...@apache.org> wrote:

> On Wed, 2011-06-29 at 12:21 +0200, khiem nguyen wrote:
> > Hi, i tried to retrieve the content of this link:
> >
> > http://de.tommy.com//Sale/600000,de_DE,sc.html
> >
> >
> > & got circular redirect, logging tells me that httpclient fires : GET
> > /Sale/600000,de_DE,sc.html
> > server response with redirect back to
> > http://de.tommy.com//Sale/600000,de_DE,sc.html
> >
> > wget behaves like browser & gives back the content.
> >
> >
> > with telnet:
> >
> >
> > telnet de.tommy.com 80
> > Trying 89.202.105.72...
> > Connected to de.tommy.com.
> > Escape character is '^]'.
> > GET /Sale/600000,de_DE,sc.html HTTP/1.1
> > Host:de.tommy.com
> >
> > HTTP/1.1 301 Moved Permanently
> > Date: Wed, 29 Jun 2011 10:11:15 GMT
> > Server: Apache
> > Content-Length: 0
> > Set-Cookie: dwsid=
> >
> CvVvWMuShdGfstjxicXY9lJb8Fk8gkMT8xV8zGEU_X1Y81Rt4F-469BS_cTJZ4hHcE7f5NVeacb1VKcXHFEKGg==;
> > path=/; HttpOnly
> > Cache-Control: no-cache,no-store,must-revalidate
> > Pragma: no-cache
> > Expires: Thu, 01 Dec 1994 16:00:00 GMT
> > Location: http://de.tommy.com//Sale/600000,de_DE,sc.html
> > Vary: Accept-Encoding
> > Accept-Ranges: bytes
> > Content-Type: text/plain
> >
> > Connection closed by foreign host.
> > -----
> >
> >
> > de.tommy.com 80
> > Trying 89.202.105.72...
> > Connected to de.tommy.com.
> > Escape character is '^]'.
> > GET //Sale/600000,de_DE,sc.html HTTP/1.1
> > Host: de.tommy.com
> >
> > HTTP/1.1 200 OK
> > Date: Wed, 29 Jun 2011 10:07:11 GMT
> > Server: Apache
> > Set-Cookie: ....
> > ....content
> >
> >
> > ...
> >
> > seems like httpclient strip out one of the 2 slashes.
> > is it a bug or the server is misconfigured ( i guess they use rewrite or
> > something but its not rare)
> >
> > how can i fix this ?
> > thanx
>
> The redirect returned by the server is malformed
>
> http://www.ietf.org/rfc/rfc2396.txt
>
> ---
> 3.3. Path Component
>
>   The path component contains data, specific to the authority (or the
>   scheme if there is no authority component), identifying the resource
>   within the scope of that scheme and authority.
>
>      path          = [ abs_path | opaque_part ]
>
>      path_segments = segment *( "/" segment )
>      segment       = *pchar *( ";" param )
>      param         = *pchar
>
>      pchar         = unreserved | escaped |
>                      ":" | "@" | "&" | "=" | "+" | "$" | ","
>
>   The path may consist of a sequence of path segments separated by a
>   single slash "/" character.  Within a path segment, the characters
>   "/", ";", "=", and "?" are reserved.  Each path segment may include a
>   sequence of parameters, indicated by the semicolon ";" character.
>   The parameters are not significant to the parsing of relative
>   references.
>
> ---
> The path element of the URI is not supposed to have multiple consecutive
> slashes. Such URIs are ambiguous and whichever way HttpClient tries to
> normalize them it cannot get it right all the time. You have two options
> here: turning off automatic redirect and handling redirects manually or
> building a custom RedirectStrategy.
>
> Hope this helps
>
> Oleg
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>
>

Re: double-slash in url causes circular redirect

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Wed, 2011-06-29 at 12:21 +0200, khiem nguyen wrote:
> Hi, i tried to retrieve the content of this link:
> 
> http://de.tommy.com//Sale/600000,de_DE,sc.html
> 
> 
> & got circular redirect, logging tells me that httpclient fires : GET
> /Sale/600000,de_DE,sc.html
> server response with redirect back to
> http://de.tommy.com//Sale/600000,de_DE,sc.html
> 
> wget behaves like browser & gives back the content.
> 
> 
> with telnet:
> 
> 
> telnet de.tommy.com 80
> Trying 89.202.105.72...
> Connected to de.tommy.com.
> Escape character is '^]'.
> GET /Sale/600000,de_DE,sc.html HTTP/1.1
> Host:de.tommy.com
> 
> HTTP/1.1 301 Moved Permanently
> Date: Wed, 29 Jun 2011 10:11:15 GMT
> Server: Apache
> Content-Length: 0
> Set-Cookie: dwsid=
> CvVvWMuShdGfstjxicXY9lJb8Fk8gkMT8xV8zGEU_X1Y81Rt4F-469BS_cTJZ4hHcE7f5NVeacb1VKcXHFEKGg==;
> path=/; HttpOnly
> Cache-Control: no-cache,no-store,must-revalidate
> Pragma: no-cache
> Expires: Thu, 01 Dec 1994 16:00:00 GMT
> Location: http://de.tommy.com//Sale/600000,de_DE,sc.html
> Vary: Accept-Encoding
> Accept-Ranges: bytes
> Content-Type: text/plain
> 
> Connection closed by foreign host.
> -----
> 
> 
> de.tommy.com 80
> Trying 89.202.105.72...
> Connected to de.tommy.com.
> Escape character is '^]'.
> GET //Sale/600000,de_DE,sc.html HTTP/1.1
> Host: de.tommy.com
> 
> HTTP/1.1 200 OK
> Date: Wed, 29 Jun 2011 10:07:11 GMT
> Server: Apache
> Set-Cookie: ....
> ....content
> 
> 
> ...
> 
> seems like httpclient strip out one of the 2 slashes.
> is it a bug or the server is misconfigured ( i guess they use rewrite or
> something but its not rare)
> 
> how can i fix this ?
> thanx

The redirect returned by the server is malformed

http://www.ietf.org/rfc/rfc2396.txt

---
3.3. Path Component

   The path component contains data, specific to the authority (or the
   scheme if there is no authority component), identifying the resource
   within the scope of that scheme and authority.

      path          = [ abs_path | opaque_part ]

      path_segments = segment *( "/" segment )
      segment       = *pchar *( ";" param )
      param         = *pchar

      pchar         = unreserved | escaped |
                      ":" | "@" | "&" | "=" | "+" | "$" | ","

   The path may consist of a sequence of path segments separated by a
   single slash "/" character.  Within a path segment, the characters
   "/", ";", "=", and "?" are reserved.  Each path segment may include a
   sequence of parameters, indicated by the semicolon ";" character.
   The parameters are not significant to the parsing of relative
   references.

---
The path element of the URI is not supposed to have multiple consecutive
slashes. Such URIs are ambiguous and whichever way HttpClient tries to
normalize them it cannot get it right all the time. You have two options
here: turning off automatic redirect and handling redirects manually or
building a custom RedirectStrategy.

Hope this helps

Oleg   




---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org