You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Patrick Kratzenstein <pk...@googlemail.com> on 2006/07/27 12:15:33 UTC

How to add database to an existing nutch index?

Hi there,
I've read much about adding a database easily to a lucene index. Well, it
works.
My goal is to have an engine that crawls some pages, but also several
databases. So, at first my programm crawls all the pages and finally it goes
through each database and all available datasets. Each database field also
represents one field in the new document I create by
org.apache.lucene.document.Document. After all, the document will be added
to the recently created index.

When I now open Luke to take a look inside the new index, all the documents
are there and everything is fine... so far.

If the next search will be startet now by a term contained in the database,
it ain't give me any results! Why?!
I've tried it several times and I really startet to believe that there must
be something about the fetch tools in nutch?

Also when I start a search inside Luke... it does not work directly. I've to
specify the field in which I'd expect the search term. Normal?

The field attributes which will be added to the document are quiet the same
as it does Nutch by default. So there are the exactly same fields(equal
attributes and names). But they are not searchable... why?

best regars,
Patrick

Re: How to add database to an existing nutch index?

Posted by Timo Scheuer <ti...@dfki.de>.
Am Donnerstag, 27. Juli 2006 12:15 schrieb Patrick Kratzenstein:
> If the next search will be startet now by a term contained in the database,
> it ain't give me any results! Why?!
> I've tried it several times and I really startet to believe that there must
> be something about the fetch tools in nutch?

Have you also extended the nutch search to your additional fields?


> Also when I start a search inside Luke... it does not work directly. I've
> to specify the field in which I'd expect the search term. Normal?

Yes, the Lucene query syntax differs from Nutch queries. Nutch queries have to 
be "translated" to Lucene queries.


> The field attributes which will be added to the document are quiet the same
> as it does Nutch by default. So there are the exactly same fields(equal
> attributes and names). But they are not searchable... why?

Which parameters have you used for adding them? Some fields require 
Field.Index.UN_TOKENIZED others require Field.Index.TOKENIZED


Cheers,
Timo.