You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Casey McTaggart <ca...@colorado.edu> on 2013/02/02 01:06:20 UTC

Crawl of local file system that puts results on HDFS

hey everyone,

I'm using Nutch 1.5. I'm trying to crawl a local directory and index the
files into HDFS, and then into Solr. I can successfully run a local crawl
that then creates a local directory, but I inevitably run out of space
and/or get out of memory errors. What I really want to do is have the input
paths be on my local fs and the output paths be to HDFS.

so, seems like it wouldn't be so complicated, but I can't figure out what
to modify in the Nutch source code. anyone have any pointers?

thanks,
casey