You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Douglas Brunner <he...@gmail.com> on 2006/03/13 15:30:17 UTC

Intrant Crawling: Increasing Index Size, Updating the Index

I'm planning to launch a vortal, and using the intrant crawl seems
like the best choice for it.

To test things at first, I'd like to create a relitavely small index,
and increase progressively.

I'm not sure of the best way to do this (Please note, I don't hold a
degree in computer sciences, so please dumb down and elaborate on the
technical terms).

I have two problems I'm not sure how to overcome. updating the
database, and increasing the number of site's in the database.

First problem, transfering the database:

What is the most efficent way to get the database from the machine
doing the crawling, to the machine doing the searching? Upload a new
folder and rename it to the foldering being used for crawling (Not
space efficent)? Slowly replace the folders and files (might cause it
to crash)?

Second problem, updating the database:

After the initial crawl, and the first database has been created, what
is the best way to update the database?

The confusion with this is how to add new URL's to be crawled and
added to the database? And, actually updating the site's already
indexed?

Thanks for all help, sorry if it's been asked before, couldn't find it
after two week's of searching.