You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Albretch Mueller <lb...@gmail.com> on 2011/01/28 04:19:52 UTC

any open source crawlers using hc?

~
 I did a search on the mailing list archives and I could only find a
few pieces of code segments of people that were coding multithreaded
crawlers using hc. However I could not find fullfledge crawlers based
on hc. I know hc is only about baseline functionality relating to the
http protocol, but fullfledge crawlers and proxies could be derived
from it.
~
 Does anyone know of such projects?
~
 thanks
 lbrtchx

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: any open source crawlers using hc?

Posted by Ken Krugler <kk...@transpac.com>.
On Jan 27, 2011, at 7:19pm, Albretch Mueller wrote:

> ~
> I did a search on the mailing list archives and I could only find a
> few pieces of code segments of people that were coding multithreaded
> crawlers using hc. However I could not find fullfledge crawlers based
> on hc. I know hc is only about baseline functionality relating to the
> http protocol, but fullfledge crawlers and proxies could be derived
> from it.
> ~
> Does anyone know of such projects?

Bixo uses HttpClient 4.0 - see http://openbixo.org and https://github.com/bixo/bixo/blob/master/src/main/java/bixo/fetcher/SimpleHttpFetcher.java

Apache Droids also uses HttpClient 4.x

And Nutch uses HttpClient 3.1.

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org