You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jixi <ji...@hotmail.co.uk> on 2007/12/03 20:02:25 UTC

Exlude pages from search results

Hi all,

I've had a good read through the archive and I can't find an answer to this
and would really appreciate any advice you can offer.

I need to exclude certain pages (ideally by pattern EG buy*) from the search
results but NOT from the crawl. I need them in the crawl as those pages link
to the pages I really want.

I've tried to keep it brief and simple but please do ask if I've not given
enough detail. My knowledge of Java is minimal but I can learn if I need
to.... I hope!

Any help would be very much appreciated.

Regards
Jix
-- 
View this message in context: http://www.nabble.com/Exlude-pages-from-search-results-tf4938661.html#a14136541
Sent from the Nutch - User mailing list archive at Nabble.com.


Nutch URL filter help

Posted by ajaxtrend <te...@yahoo.com>.
Hello Group,
                   I am trying to fell all URLs from http://xyz.org, where url structure is http://xyz.org/2007/12/23 pattern.
   
  In urls/my.txt file contains : http://xyz.org
  in conf/crawl-urlfilter.txt has filder crawl-urlfilter.txt : +^http://indianeconomy.org/[0-9]{4}/[0-9]{2}/[0-9]{2}/\\w*
   
  But nutch still fetches other URL from http://xyz.org too like http://abc.com etc....
   
  I am not sure, whether I am doing anything wrong and I would appreciate your help on this.
   
  regards,
  Ranjan

       
---------------------------------
Get easy, one-click access to your favorites.  Make Yahoo! your homepage.