You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vijith <vi...@gmail.com> on 2012/05/15 20:07:42 UTC

Block irrelevant urls

Hi,

If my nutch plugin finds the outlinks (some or all) irrelevant, how can I
stop them from being added to the crawldb.



-- 
*Thanks & Regards*
*
*
*Vijith V*

Re: Block irrelevant urls

Posted by Vijith <vi...@gmail.com>.
What if I set the status of the outlinks to DB_GONE.
Will nutch try to download those links once the fetch_interval has elapsed
??
coz i saw this post -
http://lucene.472066.n3.nabble.com/Problem-with-DB-GONE-status-td623105.html

On Tue, May 15, 2012 at 11:37 PM, Vijith <vi...@gmail.com> wrote:

> Hi,
>
> If my nutch plugin finds the outlinks (some or all) irrelevant, how can I
> stop them from being added to the crawldb.
>
>
>
> --
> *Thanks & Regards*
> *
> *
> *Vijith V*
>
>
>


-- 
*Thanks & Regards*
*
*
*Vijith V*