You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Feng Ji <fe...@gmail.com> on 2006/09/06 03:45:47 UTC

filter urls from search result

Hi there,

I want to filter out particular ursl from search result.

And I try to use segement merger to do it;

Firstly, I put target urls in regex-urlfiter.txt and automaton-urlfiter.txt,
as "-http://abc.com/".

then, run "nutch/mergesegs" and "nutch/index", but the search page still
show the urls I want to filter out.

Any idea and which step I missed?

thanks,

Michael,

Re: filter urls from search result

Posted by steveb <sr...@gmail.com>.
Michael, 

Did you remember to specify the -filter option when you ran nutch/mergesegs
?

-filter         filter out URL-s prohibited by current URLFilters



Feng Ji wrote:
> 
> Hi there,
> 
> I want to filter out particular ursl from search result.
> 
> And I try to use segement merger to do it;
> 
> Firstly, I put target urls in regex-urlfiter.txt and
> automaton-urlfiter.txt,
> as "-http://abc.com/".
> 
> then, run "nutch/mergesegs" and "nutch/index", but the search page still
> show the urls I want to filter out.
> 
> Any idea and which step I missed?
> 
> thanks,
> 
> Michael,
> 
> 

-- 
View this message in context: http://www.nabble.com/filter-urls-from-search-result-tf2224405.html#a6169863
Sent from the Nutch - User forum at Nabble.com.