You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Fankhauser, Alain" <Al...@ipi.ch> on 2006/04/10 16:53:50 UTC

Invalid index (can't re-index)

Hello

I'm trying to re-index my filesystem.
First I create an Index with the normal crawl command. This works and in
the end i can search my index with luke.
But if i start my re-index script to re-index the same filesystem, i get
a invalid index in the end with luke.
I searched a while to find the failer but i didn't find one.

Maybe some one could help me.

Here is my little script:

------------------------------------------------------------------------
---
cd C:/eclipse_projects/nutchTrunk/

webdb_dir=c:/nutchIndexFile/crawldb
segments_dir=c:/nutchIndexFile/segments
index_dir=c:/nutchIndexFile/index
link_dir=c:/nutchIndexFile/linkdb
indexes_dir=c:/nutchIndexFile/indexes/

# The generate/fetch/update cycle with depth 2
for ((i=1; i <= 2 ; i++))
do
  bin/nutch generate $webdb_dir $segments_dir
  segment=`ls -d $segments_dir/* | tail -1`
  bin/nutch fetch $segment
  bin/nutch updatedb $webdb_dir $segment
done

#the 2 represents the depth
for segment in `ls -d $segments_dir/* | tail -2`
do
  bin/nutch index $indexes_dir $webdb_dir $link_dir $segment
done

# De-duplicate indexes
bin/nutch dedup $indexes_dir

mkdir c:/tmpNutch

bin/nutch merge -workingdir c:/tmpNutch/ $index_dir $indexes_dir
------------------------------------------------------------------------
--

Thx a lot
Alain