You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Gal Nitzan <gn...@usa.net> on 2005/09/26 21:00:37 UTC

is there any way to prune webdb?

Hi,

Few questions:

1. Is there any way to remove/prune unwanted url from webdb without 
deleting all webdb and than updatedb?

2. After using prune, must I use updatedb to update the webdb

3. Is there a way to remove unwanted records from fetchlist ?

4. Does generate use regex-urlfilter in the process?

5. I noticed fetcher fetches pages in the fetchlist though it should not 
because of a rule in the regex-urlfilter how come?

Thanks,

Gal

Re: is there any way to prune webdb?

Posted by Tim Archambault <jo...@gmail.com>.
Did you get an answer to this? I'd like to know how to remove urls I know
longer want to crawl as well.

On 9/26/05, Gal Nitzan <gn...@usa.net> wrote:
>
> Hi,
>
> Few questions:
>
> 1. Is there any way to remove/prune unwanted url from webdb without
> deleting all webdb and than updatedb?
>
> 2. After using prune, must I use updatedb to update the webdb
>
> 3. Is there a way to remove unwanted records from fetchlist ?
>
> 4. Does generate use regex-urlfilter in the process?
>
> 5. I noticed fetcher fetches pages in the fetchlist though it should not
> because of a rule in the regex-urlfilter how come?
>
> Thanks,
>
> Gal
>