You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 18:38:12 UTC

[jira] [Commented] (NUTCH-802) Problems managing outlinks with large url length

    [ https://issues.apache.org/jira/browse/NUTCH-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13551968#comment-13551968 ] 

Lewis John McGibbney commented on NUTCH-802:
--------------------------------------------

+1 for marking as won't fix. No-one seems to have touched this in ages. If someone wishes to address it in the future they can open a new issue with the more appropriate solution.
                
> Problems managing outlinks with large url length
> ------------------------------------------------
>
>                 Key: NUTCH-802
>                 URL: https://issues.apache.org/jira/browse/NUTCH-802
>             Project: Nutch
>          Issue Type: Bug
>          Components: parser
>            Reporter: Pablo Aragón
>            Assignee: Andrzej Bialecki 
>              Labels: nutch, outlink, parse, parseoutputformat
>         Attachments: ParseOutputFormat.patch
>
>
> Nutch can get idle during the collection of outlinks if  the URL address of the outlink is too large.
> The maximum sizes of an URL for the main web servers are:
>     * Apache: 4,000 bytes
>     * Microsoft Internet Information Server (IIS): 16, 384 bytes
>     * Perl HTTP::Daemon: 8.000 bytes
> URL adress sizes bigger than 4000 bytes are problematic, so the limit should be set in the nutch-default.xml configuration file.
> I attached a patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira