You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Patricio Galeas <pg...@yahoo.de> on 2011/10/01 15:52:26 UTC
Some problems with PruneIndexTool ...
Hello,
I'm running the PruneIndexTool to remove some unwanted URLs from my index, but it doesn't work.
I use :
bin/nutch org.apache.nutch.tools.PruneIndexTool /nutch/local/my_crawl/index -queries queries.txt -output pruned.txt
where:
queries.txt hat the following entries:
site:topsy.com
site:osdir.com
site:www.cez.cz
site:biblecourses.com
site:bbftv.tv
site:autoavangarde.org
site:www.volkswagen.com
site:premiere21.com
After execute the command, pruned.txt contains a lot of URLs with the pruned sites, but when I run a new query all pruned sites are still in the results.
What I'm doing wrong?
Thanks
Patricio