You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Ron van der Vegt (JIRA)" <ji...@apache.org> on 2016/03/02 14:51:18 UTC
[jira] [Created] (NUTCH-2237) DeduplicationJob: Add extra order
criteria based on slug
Ron van der Vegt created NUTCH-2237:
---------------------------------------
Summary: DeduplicationJob: Add extra order criteria based on slug
Key: NUTCH-2237
URL: https://issues.apache.org/jira/browse/NUTCH-2237
Project: Nutch
Issue Type: Improvement
Reporter: Ron van der Vegt
Currently user can elect the main document when signatures are the same on score, url lenght and fetchtime. The quality of the slug, based mainly on the amount of meaningful characters, could give users more flexibility to make a difference between slugified urls and urls based on page id.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)