You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Luca Rondanini <lu...@translated.net> on 2007/06/26 17:37:08 UTC
Re-crawling Problem
Hi all,
I'm having same trouble trying to carawl and recrawl my local
filesystem. I'm using the script posted at
http://wiki.apache.org/nutch/IntranetRecrawl
My filesystem is made like this:
../
../first/
../first/file1.pdf
../first/second/
../first/second/file2.pdf
../first/second/third
../first/second/third/file2.pdf
../first/second/third/fourth/
../first/second/third/fourth/file4.pdf
../first/second/third/fourth/fifth
../first/second/third/fourth/fifth/file5.pdf
On the first crawl "round" everything seems fine....it stops at the
"first" directory (depth 1)
On the first recrawl(depth 3) it stops at the "third" directory and all
the files seem indexed correctly.
On the second recrawl(always depth 3) it arrives at the fifth diretory
but none of the files are indexed.
any idea?
thanks
Luca