You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Jay Pound <we...@poundwebhosting.com> on 2005/08/01 20:18:00 UTC

nutch prune

How do I write my queries file for pruning my database, to only .com .edu
.org .us etc... only us sites?
Thanks,
Jay Pound



Re: nutch prune

Posted by Matthias Jaekle <ja...@eventax.de>.
Hi Jay,
I think with the current version you could only prune segments.
We have once written a class to prune the db.
Maybe you could use this and add a function to delete pages according to 
the urlfilter. I have attached our class.
Matthias
-- 
http://www.eventax.com - eventax GmbH
http://www.umkreisfinder.de - Die Suchmaschine für Lokales und Events


Jay Pound schrieb:

> How do I write my queries file for pruning my database, to only .com .edu
> .org .us etc... only us sites?
> Thanks,
> Jay Pound