You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:42:13 UTC
[jira] [Updated] (NUTCH-802) Problems managing outlinks with large
url length
[ https://issues.apache.org/jira/browse/NUTCH-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-802:
---------------------------------------
Fix Version/s: 1.7
> Problems managing outlinks with large url length
> ------------------------------------------------
>
> Key: NUTCH-802
> URL: https://issues.apache.org/jira/browse/NUTCH-802
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Reporter: Pablo Aragón
> Assignee: Andrzej Bialecki
> Labels: nutch, outlink, parse, parseoutputformat
> Fix For: 1.7
>
> Attachments: ParseOutputFormat.patch
>
>
> Nutch can get idle during the collection of outlinks if the URL address of the outlink is too large.
> The maximum sizes of an URL for the main web servers are:
> * Apache: 4,000 bytes
> * Microsoft Internet Information Server (IIS): 16, 384 bytes
> * Perl HTTP::Daemon: 8.000 bytes
> URL adress sizes bigger than 4000 bytes are problematic, so the limit should be set in the nutch-default.xml configuration file.
> I attached a patch
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira