You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Albretch Mueller <lb...@gmail.com> on 2012/06/22 21:19:59 UTC

differences between HttpResponse.getAllHeaders() and wget --server-response --no-verbose ...

 Some code I tested based on:

   DefaultHttpClient httpclient = new DefaultHttpClient();
   httpclient.setRedirectStrategy(new DefaultRedirectStrategy());
   HttpGet HTTPGet = new HttpGet(aDomURL);
   HttpResponse HTTPRspns = httpclient.execute(HTTPGet);
   Header[] Hdrs = HTTPRspns.getAllHeaders();
   for (int i = 0; (i < Hdrs.length); ++i) { System.err.println(Hdrs[i]); }

 (to me) looks very different from what you get using $ wget
--server-response --no-verbose

 why is that? I could imagine some servers may "get smart" about
clients hittign them, but shouldn't you get pretty much the same
response?

 Also I would like to get the most exhaustive information I could
possibly get from the server. Am I making use of the API in a smart
way? Why is it, it is not, for example, indicating an eTag?

 thank you
 lbrtchx
// __ java ...

Server: Apache
Content-Location: pg11.txt.utf8
Vary: negotiate,accept-encoding
TCN: choice
X-Rate-Limiter: php
Cache-Control: max-age=86400
X-Frame-Options: sameorigin
Content-Type: text/plain; charset=utf-8
Content-Length: 167517
X-Powered-By: 3
Date: Fri, 22 Jun 2012 19:15:55 GMT
X-Varnish: 1675817004
Age: 0
Via: 1.1 varnish
Connection: keep-alive

$ wget --server-response --no-verbose
http://www.gutenberg.org/ebooks/11.txt.utf-8
Server: Apache
Content-Location: pg11.txt.utf8
Vary: negotiate,accept-encoding
TCN: choice
X-Rate-Limiter: php
Cache-Control: max-age=86400
X-Frame-Options: sameorigin
Content-Type: text/plain; charset=utf-8
Content-Length: 167517
X-Powered-By: 2
X-Hits: 2
Date: Fri, 22 Jun 2012 19:02:31 GMT
X-Varnish: 35841995 35837303
Age: 245
Via: 1.1 varnish
Connection: keep-alive

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: differences between HttpResponse.getAllHeaders() and wget --server-response --no-verbose ...

Posted by David Motes <da...@gmail.com>.
// __ while using wget
~
GET /files/11/11.zip HTTP/1.0

> GET /files/11/11.zip HTTP/1.1
> Host: www.gutenberg.org


HTTP 1.0 vs. HTTP 1.1 maybe.



On Fri, Jun 22, 2012 at 3:52 PM, Albretch Mueller <lb...@gmail.com> wrote:
>  if you run wireshark to actually see the client-server exchange this
> is what you see:
> ~
> // __ while using wget
> ~
> GET /files/11/11.zip HTTP/1.0
> User-Agent: Wget/1.12 (linux-gnu)
> Accept: */*
> Host: www.gutenberg.org
> Connection: Keep-Alive
>
> HTTP/1.1 403 Forbidden
> Date: Fri, 22 Jun 2012 19:42:10 GMT
> Server: Apache
> Connection: close
> Expires: Sun, 03 Oct 2004 12:00:00 GMT
> Cache-Control: no-cache
> X-Frame-Options: sameorigin
> Content-Type: text/html
>
>
> <html>
>  <head>
>    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
>    <title>403 Access Forbidden</title>
>  </head>
>
>  <body>
>    <h1>403 Access Forbidden</h1>
>
>    <div style="margin: 0 10%; border: 1px solid red; padding: 1em">
>
>    <p>The Project Gutenberg Web Site is for human (non-automated) users only.
>    Any perceived use of automated tools to access our web site
>    will result in a temporary or permanent block of your IP address
> or subnet.</p>
>
>    <p>To protect our human users we have now
>    <strong>blocked all access from hosting services.</strong></p>
>
>    <p>If you think you need to download all our books,
>    then use one of our mirrors nearest you:
>    See: <a href="/MIRRORS.ALL">list of PG mirrors</a> and
>    <a href="/terms_of_use/">PG terms of use</a>.</p>
>
>    </div>
>
>
>    <h2>Requested URI</h2>
>
>    <p>/files/11/11.zip</p>
>    <h2>Local time</h2>
>
>    <p>Fri, 22 Jun 2012 15:42:10 -0400</p>
>    <h2>IP Address</h2>
>
>    <p>74.125.226.225</p>
>    <h2>Browser</h2>
>
>    <p>Wget/1.12 (linux-gnu)</p>
>    <h2>Referrer</h2>
>
>    <p></p>
>    <h2>Server Protocol</h2>
>
>    <p>HTTP/1.0</p>
>    <h2>Accept Headers</h2>
>
>    <h3>Accept</h3>
>
>    <p>*/*</p>
>    <h3>Accept Charset</h3>
>
>    <p></p>
>    <h3>Accept Encoding</h3>
>
>    <p></p>
>    <h3>Accept Language</h3>
>
>    <p></p>
>
>  </body>
> </html>
>
> ~
> // __ while using HttpClient
> ~
> GET /files/11/11.zip HTTP/1.1
> Host: www.gutenberg.org
> Connection: Keep-Alive
> User-Agent: Apache-HttpClient/4.2 (java 1.5)
>
> HTTP/1.1 200 OK
> Date: Fri, 22 Jun 2012 19:39:45 GMT
> Server: Apache
> Last-Modified: Tue, 20 Dec 2011 16:01:58 GMT
> ETag: "4a66bf-ed3b-4b4882fd6c980"
> Accept-Ranges: bytes
> Content-Length: 60731
> Cache-Control: max-age=604800
> Expires: Fri, 29 Jun 2012 19:39:45 GMT
> X-Frame-Options: sameorigin
> Keep-Alive: timeout=5, max=190
> Connection: Keep-Alive
> Content-Type: application/zip
>
> ...
> < even you only asked for the Response Headers the server sends along
> the payload>
> ~
>  Also it seems that, as requested, HttpClient instead of being 403ed
> seamlessly follows the redirection, but why isn't that negotiation
> reported?
> ~
>  lbrtchx
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: differences between HttpResponse.getAllHeaders() and wget --server-response --no-verbose ...

Posted by Albretch Mueller <lb...@gmail.com>.
 if you run wireshark to actually see the client-server exchange this
is what you see:
~
// __ while using wget
~
GET /files/11/11.zip HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: www.gutenberg.org
Connection: Keep-Alive

HTTP/1.1 403 Forbidden
Date: Fri, 22 Jun 2012 19:42:10 GMT
Server: Apache
Connection: close
Expires: Sun, 03 Oct 2004 12:00:00 GMT
Cache-Control: no-cache
X-Frame-Options: sameorigin
Content-Type: text/html


<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>403 Access Forbidden</title>
  </head>

  <body>
    <h1>403 Access Forbidden</h1>

    <div style="margin: 0 10%; border: 1px solid red; padding: 1em">

    <p>The Project Gutenberg Web Site is for human (non-automated) users only.
    Any perceived use of automated tools to access our web site
    will result in a temporary or permanent block of your IP address
or subnet.</p>

    <p>To protect our human users we have now
    <strong>blocked all access from hosting services.</strong></p>

    <p>If you think you need to download all our books,
    then use one of our mirrors nearest you:
    See: <a href="/MIRRORS.ALL">list of PG mirrors</a> and
    <a href="/terms_of_use/">PG terms of use</a>.</p>

    </div>


    <h2>Requested URI</h2>

    <p>/files/11/11.zip</p>
    <h2>Local time</h2>

    <p>Fri, 22 Jun 2012 15:42:10 -0400</p>
    <h2>IP Address</h2>

    <p>74.125.226.225</p>
    <h2>Browser</h2>

    <p>Wget/1.12 (linux-gnu)</p>
    <h2>Referrer</h2>

    <p></p>
    <h2>Server Protocol</h2>

    <p>HTTP/1.0</p>
    <h2>Accept Headers</h2>

    <h3>Accept</h3>

    <p>*/*</p>
    <h3>Accept Charset</h3>

    <p></p>
    <h3>Accept Encoding</h3>

    <p></p>
    <h3>Accept Language</h3>

    <p></p>

  </body>
</html>

~
// __ while using HttpClient
~
GET /files/11/11.zip HTTP/1.1
Host: www.gutenberg.org
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.2 (java 1.5)

HTTP/1.1 200 OK
Date: Fri, 22 Jun 2012 19:39:45 GMT
Server: Apache
Last-Modified: Tue, 20 Dec 2011 16:01:58 GMT
ETag: "4a66bf-ed3b-4b4882fd6c980"
Accept-Ranges: bytes
Content-Length: 60731
Cache-Control: max-age=604800
Expires: Fri, 29 Jun 2012 19:39:45 GMT
X-Frame-Options: sameorigin
Keep-Alive: timeout=5, max=190
Connection: Keep-Alive
Content-Type: application/zip

...
< even you only asked for the Response Headers the server sends along
the payload>
~
 Also it seems that, as requested, HttpClient instead of being 403ed
seamlessly follows the redirection, but why isn't that negotiation
reported?
~
 lbrtchx

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org