You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by xiao yang <ya...@gmail.com> on 2010/10/27 15:12:14 UTC

How to modify the schedule of crawler

Seems it's better to start a new thread.

I want to modify the schedule of crawler to make it more real-time.
Some web pages are frequently updated, while others seldom change. My
idea is to classify URL into 2 categories which will affect the score
of URL, so I want to add a field to store which category a URL belongs
to.
The idea is simple, but I found it's not so easy to implement in Nutch.

Thanks!
Xiao