You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by stack <st...@archive.org> on 2005/06/01 01:54:57 UTC
Hard-coding of dedupField in OpenSearchServlet
The OpenSearchServlet has a hardcoding of 'site' as the field to use
deduping search results. I'd like to be able to dedup search results on
fields other than just 'site'.
For example, we have collections that may have multiple instances of an
url in the index. For such collections, sometimes we want queries to
turn up all instances of the url in the search results. For other query
types, we only want one instance of a particular url showing in the
search results. I can prevent the duplicates showing by running a query
w/ search results deduped on the 'url' field.
Attached is a suggested patch. New query parameters 'dedupField' and
'hitsPerDup' are introduced. 'dedupField' allows specifying a field
other than 'site' for deduping (It defaults to 'site'). 'hitsPerDup'
subsumes 'hitsPerSite' (If 'hitsPerSite' is present, and there is no
hitsPerDup in the query, we'll take the 'hitsPerSite' as 'hitsPerDup'
value).
If the patch is amenable, should I work up a matching patch for search.jsp?
Good stuff,
St.Ack
P.S. Query parameter names are taken from names of NutchBean params
passed on the search method. Perhaps 'hitsPerDup' should be 'hitsPerDedup'?