You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Marcin Okraszewski <ok...@gmail.com> on 2007/04/25 22:41:05 UTC

Can I make a custom web searcher with Nutch?

Hi,
Nutch seems to be a very powerfull tool. But I'm not sure if I could
customize it that much, to meet my requirements. I would like to
create a web searcher which:

1. Crowls entire site, but keeps only selected pages for searching.
The dermination if page should be indexed would be based on content
(XPath expression).
2. Pick additional fields from pages with XPath expressions.
3. The fields from 2. would be used for sorting and filtering search
results. Some of them would be numerical. They should be displayed in
search results in separate columns.
4. Page rank do not need to be influenced by links. Just content
search would be enough.
5. XPaths would be configurable per web site.

Is it possible to customize Nutch to do this? Or I should rather
create a custom solution with Lucene?

Thanks for help.
Marcin Okraszewski

Can I make a custom web searcher with Nutch?

Posted by Marcin Okraszewski <ok...@o2.pl>.
Hi,
Nutch seems to be a very powerful tool. But I'm not sure if I could
customize it that much, to meet my requirements. I would like to
create a web searcher which:

1. Crawls entire site, but keeps only selected pages for searching.
The determination if page should be indexed would be based on content
(XPath expression).
2. Pick additional fields from pages with XPath expressions.
3. The fields from 2. would be used for sorting and filtering search
results. Some of them would be numerical. They should be displayed in
search results in separate columns.
4. Page rank do not need to be influenced by links. Just content
search would be enough.
5. XPaths would be configurable per web site.

Is it possible to customize Nutch to do this? Or I should rather
create a custom solution with Lucene?

Thanks for help.
Marcin Okraszewski