You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Hannes Carl Meyer <ha...@googlemail.com> on 2011/01/06 12:11:17 UTC

Get Server Response Header without downloading

Hi,

I have a big list of URLs and the only data I need to fetch are the server
response headers for each URL.
What is the fastet way to get this data while NOT downloading the actual
content?

My current code:

client.executeMethod(method);
Header[] headers = method.getResponseHeaders();
(do something with the header data)
method.releaseConnection();

The code is actually doing fine, I just wonder if there are other - more
effective - ways to retrieve server response headers for a particular url.

Thanks

Hannes

Re: Get Server Response Header without downloading

Posted by Ken Krugler <kk...@transpac.com>.
On Jan 6, 2011, at 3:21am, Sam Crawford wrote:

> This is what the HEAD request method is for - it gets the headers
> without the body. For example:

[snip]

One caveat - there are servers out there that will fail when you make  
the HEAD request, but work with the GET.

We ran into this while trying to efficiently handle link shorteners  
during web crawls.

I would suggest retrying with a GET if a HEAD request fails.

-- Ken

> # curl -I -v http://www.google.com
> * About to connect() to www.google.com port 80
> *   Trying 74.125.45.105... connected
> * Connected to www.google.com (74.125.45.105) port 80
>> HEAD / HTTP/1.1
>> User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5  
>> OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
>> Host: www.google.com
>> Accept: */*
>>
> < HTTP/1.1 200 OK
> < Date: Thu, 06 Jan 2011 11:20:57 GMT
> < Expires: -1
> < Cache-Control: private, max-age=0
> < Content-Type: text/html; charset=ISO-8859-1
> < Set-Cookie:  
> PREF 
> = 
> ID 
> =6fa2fec0f809889b:FF=0:TM=1294312857:LM=1294312857:S=7WeuO7OI73SOSQdT;
> expires=Sat, 05-Jan-2013 11:20:57 GMT; path=/; domain=.google.com
> < Server: gws
> < X-XSS-Protection: 1; mode=block
> < Transfer-Encoding: chunked
>
> So rather than using the GET method (probably defined earlier in your
> code where you instantiate "method"), you should be able to just
> change it to HEAD.
>
> Thanks,
>
> Sam
>
>
> On 6 January 2011 11:11, Hannes Carl Meyer  
> <ha...@googlemail.com> wrote:
>> Hi,
>>
>> I have a big list of URLs and the only data I need to fetch are the  
>> server
>> response headers for each URL.
>> What is the fastet way to get this data while NOT downloading the  
>> actual
>> content?
>>
>> My current code:
>>
>> client.executeMethod(method);
>> Header[] headers = method.getResponseHeaders();
>> (do something with the header data)
>> method.releaseConnection();
>>
>> The code is actually doing fine, I just wonder if there are other -  
>> more
>> effective - ways to retrieve server response headers for a  
>> particular url.
>>
>> Thanks
>>
>> Hannes
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Get Server Response Header without downloading

Posted by Sam Crawford <sa...@gmail.com>.
This is what the HEAD request method is for - it gets the headers
without the body. For example:

# curl -I -v http://www.google.com
* About to connect() to www.google.com port 80
*   Trying 74.125.45.105... connected
* Connected to www.google.com (74.125.45.105) port 80
> HEAD / HTTP/1.1
> User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
> Host: www.google.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 06 Jan 2011 11:20:57 GMT
< Expires: -1
< Cache-Control: private, max-age=0
< Content-Type: text/html; charset=ISO-8859-1
< Set-Cookie: PREF=ID=6fa2fec0f809889b:FF=0:TM=1294312857:LM=1294312857:S=7WeuO7OI73SOSQdT;
expires=Sat, 05-Jan-2013 11:20:57 GMT; path=/; domain=.google.com
< Server: gws
< X-XSS-Protection: 1; mode=block
< Transfer-Encoding: chunked

So rather than using the GET method (probably defined earlier in your
code where you instantiate "method"), you should be able to just
change it to HEAD.

Thanks,

Sam


On 6 January 2011 11:11, Hannes Carl Meyer <ha...@googlemail.com> wrote:
> Hi,
>
> I have a big list of URLs and the only data I need to fetch are the server
> response headers for each URL.
> What is the fastet way to get this data while NOT downloading the actual
> content?
>
> My current code:
>
> client.executeMethod(method);
> Header[] headers = method.getResponseHeaders();
> (do something with the header data)
> method.releaseConnection();
>
> The code is actually doing fine, I just wonder if there are other - more
> effective - ways to retrieve server response headers for a particular url.
>
> Thanks
>
> Hannes
>

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org