You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Subashini S <s_...@yahoo.com> on 2008/09/16 21:45:15 UTC

blank location in redirected response (properly formatted)

Hi, 

I’ve come across a scenario where the location: 
http://www.nwsource.com/travel/scr/tf_destination.cfm? When requested in IE browser comes back with the following response code: 
GET http://www.nwsource.com/travel/scr/tf_destination.cfm? 302 Moved Temporarily to '' 
When the successive requests were studied, IE browser seems to make requests as follows:  
 
GET    http://www.nwsource.com/travel/scr/tf_destination.cfm?    302 Moved Temporarily to '' 
GET    http://www.nwsource.com/travel/scr/                       302 Moved Temporarily to /travel/ 
GET    http://www.nwsource.com/travel/                           301 Moved Permanently to http://www.nwsource.com/travel 
So finally the browser renders the page: http://www.nwsource.com/travel. However with httpclient, when the first request is made, the response  header comes back with blank value for location. 
GET    http://www.nwsource.com/travel/scr/tf_destination.cfm?    302 Moved Temporarily to '' 
Whenever there is a redirection to a relative url, the url is absolutized and fetched. In our case however, the redirected url happens to be a blank string and hence when absolutizing, the original url is returned. And hence, goes into infinite redirection.
  
if (redirectUri.isRelativeURI()) { 
    if (this.params.isParameterTrue(HttpClientParams.REJECT_RELATIVE_REDIRECT)) { 
        LOG.warn("Relative redirect location '" + location + "' not allowed"); 
        return false; 
    } else { 
        //location is incomplete, use current values for defaults 
        LOG.debug("Redirect URI is not absolute - parsing as relative"); 
        redirectUri = new URI(currentUri, redirectUri); 
    }


The control comes to the else part of this snippet in org.apache.commons.httpclient.HttpMethodDirector class 
and we are getting the following output: 
Narrowly avoided an infinite loop in execute 
caught org.apache.commons.httpclient.RedirectException: Maximum redirects (100) exceeded 
  
Has anyone come across a similar situation where the redirect location is blank. If so is it possible to emulate browser behaviour without a code change in httpclient? 

Thanks in advance, 
Subashini



      

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: unreadable response body

Posted by sebb <se...@gmail.com>.
On 03/12/2008, Mike Clark <mc...@apache.org> wrote:
> > -----Original Message-----
>  > From: Subashini S [mailto:s_subashini@yahoo.com]
>  > Sent: Tuesday, December 02, 2008 1:26 PM
>  > To: HttpClient User Discussion
>  > Subject: unreadable response body
>  >
>  > Hi,
>  > I'm trying to fetch a url similar to the one below:
>  > http://www.metacafe.com/watch/yt-
>  > 6gDlmA5SF9E/cow_slaughtering_on_eid_ul_zoha/
>  >
>
> > However with httpclient the fetched content seems to be unreadable
>  > and looks like a binary content.
>
>
> The server is indeed returning binary content; it is gzip
>  compressing the response body.  If you examine the response
>  headers from the server, you will see a header which announces
>  this fact:
>
>  Content-Encoding: gzip
>
>  I do not think HttpClient has built-in support for gzip.
>  You will, I believe, have to handle the decompression yourself.
>

There's an example here:

http://svn.apache.org/repos/asf/httpcomponents/httpclient/trunk/module-client/src/examples/org/apache/http/examples/client/ClientGZipContentCompression.java

>  I hope this helps,
>
>  regards,
>
>
>  Mike
>
>
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
>  For additional commands, e-mail: httpclient-users-help@hc.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


RE: unreadable response body

Posted by Mike Clark <mc...@apache.org>.
> -----Original Message-----
> From: Subashini S [mailto:s_subashini@yahoo.com]
> Sent: Tuesday, December 02, 2008 1:26 PM
> To: HttpClient User Discussion
> Subject: unreadable response body
> 
> Hi,
> I'm trying to fetch a url similar to the one below:
> http://www.metacafe.com/watch/yt-
> 6gDlmA5SF9E/cow_slaughtering_on_eid_ul_zoha/
> 
> However with httpclient the fetched content seems to be unreadable
> and looks like a binary content.

The server is indeed returning binary content; it is gzip 
compressing the response body.  If you examine the response
headers from the server, you will see a header which announces
this fact:

Content-Encoding: gzip

I do not think HttpClient has built-in support for gzip.
You will, I believe, have to handle the decompression yourself.

I hope this helps,

regards,

Mike



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


unreadable response body

Posted by Subashini S <s_...@yahoo.com>.
Hi,
 
I'm trying to fetch a url similar to the one below:

http://www.metacafe.com/watch/yt-6gDlmA5SF9E/cow_slaughtering_on_eid_ul_zoha/
 
When I study the response, I see that the response status code is 404. However there is some content in the page that I would like to retrieve. Internet Explorer / Firefox browser reports the content to be of type text/html and when I do a view source, I see something like a html source. 
===========================================================================
 	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
			<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
			<head>				<script type="text/javascript">
				var startTime = new Date();
				var reportURL = 'http://winter.metacafe.com';
				var isBeta = 0;
=========================================================================

However with httpclient the fetched content seems to be unreadable and looks like a binary content. 

This is my sample file TestHttpFetch.java:

public class TestHttpFetch {
    public static void main(String args[]) {
        HttpClient client = new HttpClient();
        HttpMethod method = new GetMethod(
                "http://www.metacafe.com/watch/yt-6gDlmA5SF9E/cow_slaughtering_on_eid_ul_zoha/");
        try {
            int statusCode = client.executeMethod(method);
            if (statusCode != HttpStatus.SC_OK) {
                System.err.println("Method failed: " + method.getStatusLine());
            }
            String response = method.getResponseBodyAsString();
            System.out.println("RESPONSE BODY: \n" + response);
        } catch (Exception e) {
            System.out.println("Exception occured:");
            e.printStackTrace();
        }
    }
}

Could you please let me know why I am getting this binary content?

Thanks in advance,
Subashini



      

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: blank location in redirected response (properly formatted)

Posted by sebb <se...@gmail.com>.
On 16/09/2008, Oleg Kalnichevski <ol...@apache.org> wrote:
> On Tue, 2008-09-16 at 12:45 -0700, Subashini S wrote:
>  > Hi,
>  >
>  > I've come across a scenario where the location:
>  > http://www.nwsource.com/travel/scr/tf_destination.cfm? When requested in IE browser comes back with the following response code:
>  > GET http://www.nwsource.com/travel/scr/tf_destination.cfm? 302 Moved Temporarily to ''
>  > When the successive requests were studied, IE browser seems to make requests as follows:
>  >
>  > GET    http://www.nwsource.com/travel/scr/tf_destination.cfm?    302 Moved Temporarily to ''
>  > GET    http://www.nwsource.com/travel/scr/                       302 Moved Temporarily to /travel/
>  > GET    http://www.nwsource.com/travel/                           301 Moved Permanently to http://www.nwsource.com/travel
>  > So finally the browser renders the page: http://www.nwsource.com/travel. However with httpclient, when the first request is made, the response  header comes back with blank value for location.
>  > GET    http://www.nwsource.com/travel/scr/tf_destination.cfm?    302 Moved Temporarily to ''
>  > Whenever there is a redirection to a relative url, the url is absolutized and fetched. In our case however, the redirected url happens to be a blank string and hence when absolutizing, the original url is returned. And hence, goes into infinite redirection.
>  >
>  > if (redirectUri.isRelativeURI()) {
>  >     if (this.params.isParameterTrue(HttpClientParams.REJECT_RELATIVE_REDIRECT)) {
>  >         LOG.warn("Relative redirect location '" + location + "' not allowed");
>  >         return false;
>  >     } else {
>  >         //location is incomplete, use current values for defaults
>  >         LOG.debug("Redirect URI is not absolute - parsing as relative");
>  >         redirectUri = new URI(currentUri, redirectUri);
>  >     }
>  >
>  >
>  > The control comes to the else part of this snippet in org.apache.commons.httpclient.HttpMethodDirector class
>  > and we are getting the following output:
>  > Narrowly avoided an infinite loop in execute
>  > caught org.apache.commons.httpclient.RedirectException: Maximum redirects (100) exceeded
>  >
>  > Has anyone come across a similar situation where the redirect location is blank. If so is it possible to emulate browser behaviour without a code change in httpclient?
>  >
>
>
> Subashini,
>
>  If upgrading to HttpClient 4.0 is an option, you could plug in a custom
>  RedirectHandler impl in order to emulate IE compatible behavior

Note that neither Firefox nor Opera (nor Java http) handle the URL
correctly, so if any such URLs are expected to be used much, perhaps
the website owners should be informed...

>  http://hc.apache.org/httpcomponents-client/httpclient/apidocs/org/apache/http/client/RedirectHandler.html
>
>  With HttpClient 3.x you do not have that many options. The only
>  possibility would be disabling automatic redirects and handling all
>  redirects manually.
>
>
>  Oleg
>
>
>  > Thanks in advance,
>  > Subashini
>  >
>  >
>  >
>  >
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
>  > For additional commands, e-mail: httpclient-users-help@hc.apache.org
>  >
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
>  For additional commands, e-mail: httpclient-users-help@hc.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: blank location in redirected response (properly formatted)

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Tue, 2008-09-16 at 12:45 -0700, Subashini S wrote:
> Hi, 
> 
> I’ve come across a scenario where the location: 
> http://www.nwsource.com/travel/scr/tf_destination.cfm? When requested in IE browser comes back with the following response code: 
> GET http://www.nwsource.com/travel/scr/tf_destination.cfm? 302 Moved Temporarily to '' 
> When the successive requests were studied, IE browser seems to make requests as follows:  
>  
> GET    http://www.nwsource.com/travel/scr/tf_destination.cfm?    302 Moved Temporarily to '' 
> GET    http://www.nwsource.com/travel/scr/                       302 Moved Temporarily to /travel/ 
> GET    http://www.nwsource.com/travel/                           301 Moved Permanently to http://www.nwsource.com/travel 
> So finally the browser renders the page: http://www.nwsource.com/travel. However with httpclient, when the first request is made, the response  header comes back with blank value for location. 
> GET    http://www.nwsource.com/travel/scr/tf_destination.cfm?    302 Moved Temporarily to '' 
> Whenever there is a redirection to a relative url, the url is absolutized and fetched. In our case however, the redirected url happens to be a blank string and hence when absolutizing, the original url is returned. And hence, goes into infinite redirection.
>   
> if (redirectUri.isRelativeURI()) { 
>     if (this.params.isParameterTrue(HttpClientParams.REJECT_RELATIVE_REDIRECT)) { 
>         LOG.warn("Relative redirect location '" + location + "' not allowed"); 
>         return false; 
>     } else { 
>         //location is incomplete, use current values for defaults 
>         LOG.debug("Redirect URI is not absolute - parsing as relative"); 
>         redirectUri = new URI(currentUri, redirectUri); 
>     }
> 
> 
> The control comes to the else part of this snippet in org.apache.commons.httpclient.HttpMethodDirector class 
> and we are getting the following output: 
> Narrowly avoided an infinite loop in execute 
> caught org.apache.commons.httpclient.RedirectException: Maximum redirects (100) exceeded 
>   
> Has anyone come across a similar situation where the redirect location is blank. If so is it possible to emulate browser behaviour without a code change in httpclient? 
> 

Subashini,

If upgrading to HttpClient 4.0 is an option, you could plug in a custom
RedirectHandler impl in order to emulate IE compatible behavior

http://hc.apache.org/httpcomponents-client/httpclient/apidocs/org/apache/http/client/RedirectHandler.html

With HttpClient 3.x you do not have that many options. The only
possibility would be disabling automatic redirects and handling all
redirects manually.

Oleg

> Thanks in advance, 
> Subashini
> 
> 
> 
>       
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org