You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Kelvin Chappell <ke...@hotmail.co.uk> on 2005/10/03 11:39:04 UTC
Indexing multiple data sources
I wonder if anyone has any experience of keeping an index of content that
comes both from a web-crawl and from a series of database queries? By this
I mean using Nutch to crawl the web and Lucene to index database content but
merging the indices created so that queries can be made to get results from
either source. It would be useful to know whether this has been tried
before. I notice that the Nutch index doesn't hold content but rather
references to segment indices, which makes it difficult to see how it could
be merged with content indexed from other sources.
Thanks,
Kelvin
_________________________________________________________________
Be the first to hear what's new at MSN - sign up to our free newsletters!
http://www.msn.co.uk/newsletters