You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by "ned@bcit" <ne...@yahoo.com> on 2007/12/02 07:32:25 UTC

Re: Basic question about indexing

I am working on the first question myself so can't really help you with that
but as for your second question there are plugins implemented which you can
copy and modify to serve your purposes. 

- index-more plugin : adds additional fields into the lucene index.
- query-more plugin: which will allow you to search the lucene fields by
entering: 
         [field-name]:[field-value] your-query

Hope this helps.




taknev wrote:
> 
> Hi folks,
> 
> I would deeply appreciate if someone can shed light on how to solve a
> specific search  I am trying to accomplish with
> Nutch.
> 
> I am currently ABLE to do the following:
> 
> Use Nutch to crawl a directory in the local filesystem ( linux) (The local
> directory has html files)
> 
> When I run bin/nutch crawl urls  -dir crawllocalfs, it successfully crawls
> the directory and I can see the search results using the WAR file in
> Tomcat.
> 
> The HTML files is raw text with the usual html tags. The HTML text has
> useful sections which I would like to capture in a way so that I can run a
> an advanced search in those fields only.
> 
> I don't understand how the following can be accomplished:
> 
> 1) How to extract specific parts of the HTML so that it can be grouped in
> certain fields in the Lucene Index using nutch. 
> 
> 
> 2) How to perform an advanced search on the specific fields which are
> indexed in Nutch as it has a very basic search interface.
> 
> I am nutch newbie as you can tell and will appreciate adivse on how to
> approach this issue?
> 
> 
> Regards,
> Taknev
> 
> 

-- 
View this message in context: http://www.nabble.com/Basic-question-about-indexing-tf4899457.html#a14113051
Sent from the Nutch - User mailing list archive at Nabble.com.