You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Robert Sanford <rs...@trefs.com> on 2006/07/25 17:10:11 UTC

Scanning the database

Running Nutch 0.7.2 but I'm willing to move up to 0.8 if need be.

I have created an "Intranet" crawl using the file containing a list of
URIs and the list of regex to allow in conf/crawl-urlfilter.txt. Using
search.jsp I get lots and lots of good results so I'm quite happy so
far.

But, I want to do a specialized search that will return to me a simple
list of domains in which my search keys are found rather than the more
complete list of pages where they keys are found.

Where in the code would I start looking for examples of querying the
database?

rjsjr

Re: Scanning the database

Posted by Stefan Neufeind <ap...@stefan-neufeind.de>.
Robert Sanford wrote:
> Running Nutch 0.7.2 but I'm willing to move up to 0.8 if need be.
> 
> I have created an "Intranet" crawl using the file containing a list of
> URIs and the list of regex to allow in conf/crawl-urlfilter.txt. Using
> search.jsp I get lots and lots of good results so I'm quite happy so
> far.
> 
> But, I want to do a specialized search that will return to me a simple
> list of domains in which my search keys are found rather than the more
> complete list of pages where they keys are found.
> 
> Where in the code would I start looking for examples of querying the
> database?

You mean domains as "www.example.com" and no pages? That value is part
of the site-value. So I guess you might just want to set hitsPerSite to
1 and just display the domainnames without the individual pages. That
should be fine I guess.


Regards,
 Stefan