You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Robert Sanford <rs...@trefs.com> on 2006/07/25 17:10:11 UTC
Scanning the database
Running Nutch 0.7.2 but I'm willing to move up to 0.8 if need be.
I have created an "Intranet" crawl using the file containing a list of
URIs and the list of regex to allow in conf/crawl-urlfilter.txt. Using
search.jsp I get lots and lots of good results so I'm quite happy so
far.
But, I want to do a specialized search that will return to me a simple
list of domains in which my search keys are found rather than the more
complete list of pages where they keys are found.
Where in the code would I start looking for examples of querying the
database?
rjsjr
Re: Scanning the database
Posted by Stefan Neufeind <ap...@stefan-neufeind.de>.
Robert Sanford wrote:
> Running Nutch 0.7.2 but I'm willing to move up to 0.8 if need be.
>
> I have created an "Intranet" crawl using the file containing a list of
> URIs and the list of regex to allow in conf/crawl-urlfilter.txt. Using
> search.jsp I get lots and lots of good results so I'm quite happy so
> far.
>
> But, I want to do a specialized search that will return to me a simple
> list of domains in which my search keys are found rather than the more
> complete list of pages where they keys are found.
>
> Where in the code would I start looking for examples of querying the
> database?
You mean domains as "www.example.com" and no pages? That value is part
of the site-value. So I guess you might just want to set hitsPerSite to
1 and just display the domainnames without the individual pages. That
should be fine I guess.
Regards,
Stefan