You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by kevin chen <ke...@bdsing.com> on 2008/01/10 05:34:49 UTC

Add new segments to exsiting

Hi,

I have maintained a crawl sites and continued to discover new relevant
urls to add to crawl.

Here is what I did:

Once I find new urls, I crawl them separately for a few rounds until I
am satisfied. I then move the new segments to put them together with my
existing segments directory. Then I run "updatedb" for each new
segments. Then I remove the existing indexes and re-index all the
segments.

Is this the right way to do this? How does everybody work around this
scenario?

Thanks


Re: Add new segments to exsiting

Posted by Dennis Kubes <ku...@apache.org>.
Yes that sounds like the correct way to do it.  I am assuming you are 
re-running invertlinks as well as indexing.  The crawldb and linkdb are 
the global databases that need to be kept up to date.

Dennis Kubes

kevin chen wrote:
> Hi,
> 
> I have maintained a crawl sites and continued to discover new relevant
> urls to add to crawl.
> 
> Here is what I did:
> 
> Once I find new urls, I crawl them separately for a few rounds until I
> am satisfied. I then move the new segments to put them together with my
> existing segments directory. Then I run "updatedb" for each new
> segments. Then I remove the existing indexes and re-index all the
> segments.
> 
> Is this the right way to do this? How does everybody work around this
> scenario?
> 
> Thanks
>