You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Håvard W. Kongsgård" <nu...@niap.org> on 2006/09/28 19:02:28 UTC

Indexing in nutch 0.8 / hadoop

What is the best way to create a master index on a nutch 8 / hadoop system?

Is it to merge all of the segments together, and then create an index?

Or like Roberto Navoni in his Tutorial
First index all the segments separately and then merge the indexes into 
one master index?

-.-.-.-.-.-.-
# Create a new indexe0
bin/nutch
index /user/root/crawld/indexe0 /user/root/crawld/ /user/root/crawld/linkdb
/user/root/crawld/segments/20060722153133
# Create a new index1
bin/nutch
index /user/root/crawld/indexe1 /user/root/crawld/ /user/root/crawld/linkdb
/user/root/crawld/segments/20060722182213
#Dedup the new indexe0
bin/nutch dedup /user/root/crawld/indexe0
#Dedup the new index1
bin/nutch dedup /user/root/crawld/indexe1
#Delete the old index
#Merge the new index merge directory
bin/nutch
merge /user/root/crawld/index /user/root/crawld/indexe0 
/user/root/crawld/indexe1 ...
#(and the other index create for the fetch segments)
-.-.-.-.-.-.-