You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by John Reidy <jo...@reidy.com> on 2005/12/07 04:55:48 UTC
Returning all hits in a document
Hi,
I am in the process of looking through the source code, however if
someone has done this, I would appreciate any help.
I have a reqirement to provide full text searching of a relatively small
number ~500 fairly large MS Word and PDF documents. A key feature would
be the display of all of the search term matches, together with the
context, so a user would see all occurrences of a search term in a document.
If this isn't available as an option to a nutch search, I appreciate
that this is probably more of a lucene question.
Regards
John Reidy.
Re: Returning all hits in a document
Posted by Andrzej Bialecki <ab...@getopt.org>.
John Reidy wrote:
> Hi,
> I am in the process of looking through the source code, however if
> someone has done this, I would appreciate any help.
>
> I have a reqirement to provide full text searching of a relatively
> small number ~500 fairly large MS Word and PDF documents. A key
> feature would be the display of all of the search term matches,
> together with the context, so a user would see all occurrences of a
> search term in a document.
>
> If this isn't available as an option to a nutch search, I appreciate
> that this is probably more of a lucene question.
Depending on your further requirements, it could be as simple as
changing the configuration in nutch-default.xml / nutch-site.xml to
allow infinitely long summaries (searcher.summary.context and
searcher.summary.length).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com