You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by kevin chen <ke...@bdsing.com> on 2008/01/10 05:34:49 UTC
Add new segments to exsiting
Hi,
I have maintained a crawl sites and continued to discover new relevant
urls to add to crawl.
Here is what I did:
Once I find new urls, I crawl them separately for a few rounds until I
am satisfied. I then move the new segments to put them together with my
existing segments directory. Then I run "updatedb" for each new
segments. Then I remove the existing indexes and re-index all the
segments.
Is this the right way to do this? How does everybody work around this
scenario?
Thanks
Re: Add new segments to exsiting
Posted by Dennis Kubes <ku...@apache.org>.
Yes that sounds like the correct way to do it. I am assuming you are
re-running invertlinks as well as indexing. The crawldb and linkdb are
the global databases that need to be kept up to date.
Dennis Kubes
kevin chen wrote:
> Hi,
>
> I have maintained a crawl sites and continued to discover new relevant
> urls to add to crawl.
>
> Here is what I did:
>
> Once I find new urls, I crawl them separately for a few rounds until I
> am satisfied. I then move the new segments to put them together with my
> existing segments directory. Then I run "updatedb" for each new
> segments. Then I remove the existing indexes and re-index all the
> segments.
>
> Is this the right way to do this? How does everybody work around this
> scenario?
>
> Thanks
>