You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Ron van der Vegt (JIRA)" <ji...@apache.org> on 2016/03/02 14:51:18 UTC

[jira] [Created] (NUTCH-2237) DeduplicationJob: Add extra order criteria based on slug

Ron van der Vegt created NUTCH-2237:
---------------------------------------

             Summary: DeduplicationJob: Add extra order criteria based on slug
                 Key: NUTCH-2237
                 URL: https://issues.apache.org/jira/browse/NUTCH-2237
             Project: Nutch
          Issue Type: Improvement
            Reporter: Ron van der Vegt


Currently user can elect the main document when signatures are the same on score, url lenght and fetchtime. The quality of the slug, based mainly on the amount of meaningful characters, could give users more flexibility to make a difference between slugified urls and urls based on page id.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)