You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Mr Shore <sh...@gmail.com> on 2008/01/24 09:58:11 UTC

tough question:how to costomize indexer like this?

hi all.
I'm a newbie in nutch 9,
but now comes a problem when I want to change nutch indexer like follows:
I make nutch crawl the web sites with a depth of 2
and I want the indexer to do things according to the depth of crawl
which means that
----------------------------------------------------------------------
if depth == 1
just store this page to a path without any change
if depth == 2
index this page
-----------------------------------------------------------------------
and the relation between pages of depth 1 nad 2 is like follows:
when we query,the result should be according to the indexes of pages of
depth 2,
but what we first got is page 1, from which links to page 2
did I make it clear to all?
and I've tried to rewrite the indexer but in vain to find that all key
operations are written in hadoop whose src code can't be seen
any advice will be greatly appreciated