You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Kelvin Chappell <ke...@hotmail.co.uk> on 2005/10/03 11:39:04 UTC

Indexing multiple data sources

I wonder if anyone has any experience of keeping an index of content that 
comes both from a web-crawl and from a series of database queries?  By this 
I mean using Nutch to crawl the web and Lucene to index database content but 
merging the indices created so that queries can be made to get results from 
either source.  It would be useful to know whether this has been tried 
before.  I notice that the Nutch index doesn't hold content but rather 
references to segment indices, which makes it difficult to see how it could 
be merged with content indexed from other sources.

Thanks,
Kelvin

_________________________________________________________________
Be the first to hear what's new at MSN - sign up to our free newsletters! 
http://www.msn.co.uk/newsletters