You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bruno Thiel <br...@objectconsulting.com.au> on 2006/09/15 03:02:15 UTC

Fetcher File Error 404 when crawling through file system

Hi,

I am trying to configure a recent nutch (0.8+) to configure to fetch
directly from the file system instead of http which is fairly slow. The
fetcher hits a 404 - File not found (see below). When I'm copying the
file:/// <file:///>  URL into lynx it gets found without any problems.

2006-09-15 10:29:57,739 INFO  fetcher.Fetcher - fetching
file:///mnt/smbfs/hollywood/projects/Telstra/Keystone\
<file:///mnt/smbfs/hollywood/projects/Telstra/Keystone\>  -\
Leapfrog/Keystone/Architecture/Archives/info.txt
2006-09-15 10:29:57,746 INFO  fetcher.Fetcher - fetch of
file:///mnt/smbfs/hollywood/projects/Telstra/Keystone\
<file:///mnt/smbfs/hollywood/projects/Telstra/Keystone\>  -\
Leapfrog/Keystone/Architecture/Archives/info.txt failed with:
org.apache.nutch.protocol.file.FileError: File Error: 404

Anybody having a similar problem - or better - resolution?

Cheers, Bruno