You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bryan Woliner <br...@gmail.com> on 2006/03/09 23:25:27 UTC

What are valid names and location(s) for segments

I am using nutch 0.7.1 and have a couple questions about valid segment names
and locations:

I can get nutch to work fine when I store my segments, with their original
nutch assigned names in the folder: "/usr/local/nutch-0.7.1/live/segments/"
and then start tomcat from the "/usr/local/nutch-0.7.1/live/" directory.

However, if I change the names of any of the segments then I get either zero
search results or a blank screen when I try to search.

Additionally, if I do not change the names, but move the segments to
sub-directories of the /live/segments/ folder (i.e. /live/segments/site1/),
then I always get zero search results.

Question: What is the easiest way to get nutch to recognize segments with
modified names, or those that are stored in a sub-directory of the segments
folder.

In General: The larger problem that I am trying to solve is that my nutch
search engine currently crawls and indexes a couple dozes sites and I want
to update (i.e re-crawl) these sites independently and at different time
intervals.  My current plan is to have a ../live/segments/ folder and store
an updated (and indexed) segment for each site in that folder. With this in
mind, I'm sure you can understand why it would be difficult to keep this
folder organized without being able to rename segments and/or store them in
sub-directories. If anyone has an ideas about how to organize these segments
without renaming them or storing them in sub-directories, I'm all ears.

Thanks ahead of time for any suggestions,

Bryan