You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by adu <du...@hzduozhun.com> on 2014/08/01 09:12:58 UTC

Re: How to use a proxy list while nutch is crawling?

any suggestion?
于 2014年07月31日 15:01, [Email Address Not
Verified]-dujinhang@hzduozhun.com 写道:
> Hi all,
>
> I have a proxy list , and want to apply these proxies to nutch crawl.
> How to do it?
>
> Thanks.
>
>
>




Re: How to use a proxy list while nutch is crawling?

Posted by adu <du...@hzduozhun.com>.
Thanks for your advice.
于 2014年08月01日 22:47, Bin Wang 写道:
> Hi adu,
>
> Does "Proxy List" here means a list of different proxy services? so you
> actually want to switch between different proxy services in Nutch?
>
> There is an article in Nutch Wiki
> <https://wiki.apache.org/nutch/SetupProxyForNutch> showing you show to use
> proxy generally in Nutch.
> If you want to distribute your crawling to many different IPs and switch
> between services, there are some payed services in the market can offer you
> this capability. Example, proxyrain, which will switch the IP after every
> http request, probably there are some cheaper and better solutions which
> need some google.
>
> Best,
>
> Bin
>
>
>
>
>
> On Fri, Aug 1, 2014 at 1:12 AM, adu <du...@hzduozhun.com> wrote:
>
>> any suggestion?
>> 于 2014年07月31日 15:01, [Email Address Not
>> Verified]-dujinhang@hzduozhun.com 写道:
>>> Hi all,
>>>
>>> I have a proxy list , and want to apply these proxies to nutch crawl.
>>> How to do it?
>>>
>>> Thanks.
>>>
>>>
>>>
>>
>>
>>




Re: How to use a proxy list while nutch is crawling?

Posted by Bin Wang <bi...@gmail.com>.
Hi adu,

Does "Proxy List" here means a list of different proxy services? so you
actually want to switch between different proxy services in Nutch?

There is an article in Nutch Wiki
<https://wiki.apache.org/nutch/SetupProxyForNutch> showing you show to use
proxy generally in Nutch.
If you want to distribute your crawling to many different IPs and switch
between services, there are some payed services in the market can offer you
this capability. Example, proxyrain, which will switch the IP after every
http request, probably there are some cheaper and better solutions which
need some google.

Best,

Bin





On Fri, Aug 1, 2014 at 1:12 AM, adu <du...@hzduozhun.com> wrote:

> any suggestion?
> 于 2014年07月31日 15:01, [Email Address Not
> Verified]-dujinhang@hzduozhun.com 写道:
> > Hi all,
> >
> > I have a proxy list , and want to apply these proxies to nutch crawl.
> > How to do it?
> >
> > Thanks.
> >
> >
> >
>
>
>
>