You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by stack <st...@archive.org> on 2005/06/01 01:54:57 UTC

Hard-coding of dedupField in OpenSearchServlet

The OpenSearchServlet has a hardcoding of 'site' as the field to use 
deduping search results.  I'd like to be able to dedup search results on 
fields other than just 'site'. 

For example, we have collections that may have multiple instances of an 
url in the index.  For such collections,  sometimes we want queries to 
turn up all instances of the url in the search results.  For other query 
types, we only want one instance of a particular url showing in the 
search results.  I can prevent the duplicates showing by running a query 
w/ search results deduped on the 'url' field.

Attached is a suggested patch.  New query parameters 'dedupField' and 
'hitsPerDup' are introduced.  'dedupField' allows specifying a field 
other than 'site' for deduping (It defaults to 'site').  'hitsPerDup' 
subsumes 'hitsPerSite' (If 'hitsPerSite' is present, and there is no 
hitsPerDup in the query, we'll take the 'hitsPerSite' as 'hitsPerDup' 
value).

If the patch is amenable, should I work up a matching patch for search.jsp?

Good stuff,
St.Ack

P.S. Query parameter names are taken from names of NutchBean params 
passed on the search method. Perhaps 'hitsPerDup' should be 'hitsPerDedup'?