You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vijith <vi...@gmail.com> on 2012/05/15 20:07:42 UTC
Block irrelevant urls
Hi,
If my nutch plugin finds the outlinks (some or all) irrelevant, how can I
stop them from being added to the crawldb.
--
*Thanks & Regards*
*
*
*Vijith V*
Re: Block irrelevant urls
Posted by Vijith <vi...@gmail.com>.
What if I set the status of the outlinks to DB_GONE.
Will nutch try to download those links once the fetch_interval has elapsed
??
coz i saw this post -
http://lucene.472066.n3.nabble.com/Problem-with-DB-GONE-status-td623105.html
On Tue, May 15, 2012 at 11:37 PM, Vijith <vi...@gmail.com> wrote:
> Hi,
>
> If my nutch plugin finds the outlinks (some or all) irrelevant, how can I
> stop them from being added to the crawldb.
>
>
>
> --
> *Thanks & Regards*
> *
> *
> *Vijith V*
>
>
>
--
*Thanks & Regards*
*
*
*Vijith V*