You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Tomas Ukkonen <to...@helsinki.fi> on 2009/03/24 18:04:35 UTC

Problems writing QueryFilter plugin

Hi,

I have been trying to write url exclusion filter plugin (QueryFilter) 
for dropping listed URLs away from results but for some reason it 
doesn't seem to work. 


The following plugin is configured to process 'url' fields.
I have checked that my plugin is correctly loaded by nutch(wax) and 
my filter() function is: 


--->>--->>--->>--->>--->>--->>--->>--->>--->>--->>--->>--->>--->>--->>

public BooleanQuery filter(Query input, BooleanQuery output)
  throws QueryException
{
	ListIterator iter = exclusionList.listIterator();

	while(iter.hasNext()){
		String url = (String)iter.next();

		Term term = new Term(URL_FIELD, url)			
		org.apache.lucene.search.Query urlExclusion = new
TermQuery(term);
		urlExclusion.setBoost(0.0f); // I have also tried to use 1.0

		debug("excluded term: '"+term.field()+"' :
'"+term.text()+"'");

		output.add(urlExclusion, BooleanClause.Occur.MUST_NOT);
	}
	
	return output;
}

<<---<<---<<---<<---<<---<<---<<---<<---<<---<<---<<---<<---<<---<<---<<


Also, debug() call tells me that the terms has a correct form 
Term("url", "http://www.domain.com:1234/dir/file.txt"). But for some 
reason the excluded URLs still appear in search results.

* Can some one tell me reason for this?

  It appears that currently the plugin has no 
  effect on results at all.


If I exclude URLs explicitly by doing a query like: 
'<some> <search> <terms> -url:"http://www.domain.com:1234/dir/file.txt"'
then the listed URLs are correctly filtered away from results.

The nutch version I'm using is nutch-1.0-dev. I'm quite sure that 
all related settings in nutch-site.xml and plugin.xml should be correct.



Thanks in advance, 

-- 
Tomas Ukkonen
Information Systems Specialist
Kansalliskirjasto / 
The National Library of Finland
phone +358-50-4150557
email tomas.ukkonen@helsinki.fi
www   http://www.kansalliskirjasto.fi 
      http://www.nationallibrary.fi