You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Böckling <Mi...@dmc.de> on 2007/04/10 18:11:31 UTC

Combining standard Lucene and Nutch

Hi!

I have a website with both static and dynamic pages. There already is a
Lucene search implemented for the database content, now I need something for
the static pages. As far as I can see, my options are Nutch and NekoHTML for
Lucene.

I know there is a MultiSearcher class, but it seems that Nutch is using a
very different index layout than Lucene, or am I wrong here? My end goal is
a list of results with the most relevant hits from both indexes at the top
positions.

How would you go about this?
Thanks a lot for your input!

Regards,

Michael


--------------------------------------------- 
Michael Böckling
Java Engineer
dmc digital media center GmbH 
Rommelstraße 11 
70376 Stuttgart (Germany) 
Telefon: +49 711 601747-0
Telefax: +49 711 601747-141 
E-Mail: Michael.Boeckling@dmc.de 
Internet: www.dmc.de 

Handelsregister: AG Stuttgart HRB 18974
Geschäftsführer: Andreas Magg, Daniel Rebhorn, Andreas Schwend

---------------------------------------------
Besseres E-Business.
dmc ist die kreative Vernetzung von Agentur, Systemhaus und Service. Seit
über 10 Jahren entwickeln und realisieren wir zukunftweisende und
erfolgreiche E-Business-Lösungen. Zu unseren langjährigen Kunden zählen
neckermann.de, Kodak und Telekom Training. 

Better eBusiness.
dmc is the creative integration of an agency, a system vendor and services.
We have been developing and implementing innovative and successful eBusiness
solutions for more than 10 years. Among our longtime customers are
neckermann.de, Kodak and Deutsche Telekom Training.

Re: Combining standard Lucene and Nutch

Posted by Enis Soztutar <en...@gmail.com>.
Michael Böckling wrote:
> Hi!
>
>   
Hi,
> I know there is a MultiSearcher class, but it seems that Nutch is using a
> very different index layout than Lucene, or am I wrong here? 
Nutch uses lucene as an inverted index. Lucene does not have an index 
structure. You create the structure
(I mean the fields) using lucene. Nutch stores some default fields in 
the index as well as extra fields from index
plugins. You can check out the structure of the index from the wiki : 
http://wiki.apache.org/nutch/IndexStructure

What you should do is to compare the structure nutch uses with the 
structure you use, and somehow combine the two. In most of the fields, 
you sould converge to the nutch version. Other than that, once index the 
index is created from nutch, it is lucene stuff. You can merge the 
indexes or run a MultiSearcher, or open seperate 
DistributedSearch$Clients and combine the results from seperate indexes 
on the fly. However there is an issue about summaries. Do you intend to 
use them?




> My end goal is
> a list of results with the most relevant hits from both indexes at the top
> positions.
>
> How would you go about this?
> Thanks a lot for your input!
>
> Regards,
>
> Michael
>
>