You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (Closed) (JIRA)" <ji...@apache.org> on 2011/12/20 12:31:31 UTC

[jira] [Closed] (NUTCH-1011) Normalize duplicate slashes in URL's

     [ https://issues.apache.org/jira/browse/NUTCH-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma closed NUTCH-1011.
--------------------------------


Bulk close of resolved issues of 1.4. bulkclose-1.4-20111220
                
> Normalize duplicate slashes in URL's
> ------------------------------------
>
>                 Key: NUTCH-1011
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1011
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.4, nutchgora
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4, nutchgora
>
>         Attachments: NUTCH-1011-1.4-2.patch, NUTCH-1011-all-3.patch
>
>
> Many websites produce faulty URL's with multiple slashes e.g. http://cocoon.apache.org///////////////////////1.x/dynamic.html
> This can be really nasty if the number of slashes varies, resulting in many URL's actually pointing to the same page and generating new (unique) URL's to the same or other duplicate pages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira