You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Ken Krugler <kk...@transpac.com> on 2010/10/27 15:03:15 UTC

More real-time crawling

Hi Xiao,

FWIR there is adaptive refetch interval support in Nutch currently -  
or are you looking for something different?

Regards,

-- Ken

On Oct 27, 2010, at 1:42am, xiao yang wrote:

> I want to modify the schedule of crawler to make it more real-time.
> Some web pages are frequently updated, while others seldom change. My
> idea is to classify URL into 2 categories which will affect the score
> of URL, so I want to add a field to store which category a URL belongs
> to.
> The idea is simple, but I found it's not so easy to implement in  
> Nutch.
>
> Thanks!
> Xiao

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g