You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Jason Calabrese <ma...@jasoncalabrese.com> on 2005/10/23 22:05:15 UTC

Using individual parts of Nutch (NDFS and MapFile/ArrayFile's)

All,

I'm thinking about using some parts of Nutch in my projects.  We have been 
using Lucene for a while with great success, but I now want to change the way 
we store our content (mostly xml documents).  

In the past we've stored the content in MySQL and now use simple gzipped xml 
files.  Removing the dependency on MySQL has been nice, but dealing with 
millions of small files has created obvious problems.

It looks like using the NDFS and MapFile/ArrayFile's could be part of a good 
solution for us.

I'm also interested in using the MapReduce framework and possibly the Fetcher 
in our applications.

Is there anyone else using just parts of Nutch?  Is it planned that the api 
will stay fairly stable?  Have there been thoughts or discusions about 
breaking parts of Nutch out in to more general toolkits?

--Jason