You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by John Reidy <jo...@reidy.com> on 2005/12/07 04:55:48 UTC

Returning all hits in a document

Hi,
I am in the process of looking through the source code, however if 
someone has done this, I would appreciate any help.

I have a reqirement to provide full text searching of a relatively small 
number ~500 fairly large MS Word and PDF documents. A key feature would 
be the display of all of the search term matches, together with the 
context, so a user would see all occurrences of a search term in a document.

If this isn't available as an option to a nutch search, I appreciate 
that this is probably more of a lucene question.

Regards

John Reidy.


Re: Returning all hits in a document

Posted by Andrzej Bialecki <ab...@getopt.org>.
John Reidy wrote:

> Hi,
> I am in the process of looking through the source code, however if 
> someone has done this, I would appreciate any help.
>
> I have a reqirement to provide full text searching of a relatively 
> small number ~500 fairly large MS Word and PDF documents. A key 
> feature would be the display of all of the search term matches, 
> together with the context, so a user would see all occurrences of a 
> search term in a document.
>
> If this isn't available as an option to a nutch search, I appreciate 
> that this is probably more of a lucene question.


Depending on your further requirements, it could be as simple as 
changing the configuration in nutch-default.xml / nutch-site.xml to 
allow infinitely long summaries (searcher.summary.context and 
searcher.summary.length).

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com