You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Fankhauser, Alain" <Al...@ipi.ch> on 2006/04/10 16:53:50 UTC
Invalid index (can't re-index)
Hello
I'm trying to re-index my filesystem.
First I create an Index with the normal crawl command. This works and in
the end i can search my index with luke.
But if i start my re-index script to re-index the same filesystem, i get
a invalid index in the end with luke.
I searched a while to find the failer but i didn't find one.
Maybe some one could help me.
Here is my little script:
------------------------------------------------------------------------
---
cd C:/eclipse_projects/nutchTrunk/
webdb_dir=c:/nutchIndexFile/crawldb
segments_dir=c:/nutchIndexFile/segments
index_dir=c:/nutchIndexFile/index
link_dir=c:/nutchIndexFile/linkdb
indexes_dir=c:/nutchIndexFile/indexes/
# The generate/fetch/update cycle with depth 2
for ((i=1; i <= 2 ; i++))
do
bin/nutch generate $webdb_dir $segments_dir
segment=`ls -d $segments_dir/* | tail -1`
bin/nutch fetch $segment
bin/nutch updatedb $webdb_dir $segment
done
#the 2 represents the depth
for segment in `ls -d $segments_dir/* | tail -2`
do
bin/nutch index $indexes_dir $webdb_dir $link_dir $segment
done
# De-duplicate indexes
bin/nutch dedup $indexes_dir
mkdir c:/tmpNutch
bin/nutch merge -workingdir c:/tmpNutch/ $index_dir $indexes_dir
------------------------------------------------------------------------
--
Thx a lot
Alain