You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Eyeris Rodriguez Rueda <er...@uci.cu> on 2016/01/31 14:26:53 UTC

Re: [MASSMAIL] How to set up Nutch to only crawl links on designated web pages repeatedly?

Hello Jun.
Maybe you can use nutch´s urls filter plugins. This plugins are used to filter o restrict the visit of links.
Please i need more details about your situation.

1-How are selected the link to visit on your pages(A, B, C) , it has some pattern,subdomain or some keyword in url´s links?  

Re: [MASSMAIL] How to set up Nutch to only crawl links on designated web pages repeatedly?

Posted by Junqiang Zhang <ju...@gmail.com>.
Hello Eyeris,

Thank you very much for your suggestion. Sorry for my late reply.

Using the urls filter plugins is a good option. I am doing this for my
current crawling task. However, using urls filters is not exactly what
I want. I feel there should be some better ways to restrict nutch only
crawl the links on designated web pages. Currently, maybe nutch does
not provide such a feature.

Best,
Junqiang

On Sun, Jan 31, 2016 at 9:26 PM, Eyeris Rodriguez Rueda <er...@uci.cu> wrote:
> Hello Jun.
> Maybe you can use nutch´s urls filter plugins. This plugins are used to filter o restrict the visit of links.
> Please i need more details about your situation.
>
> 1-How are selected the link to visit on your pages(A, B, C) , it has some pattern,subdomain or some keyword in url´s links?