You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Marco Ebbinghaus (JIRA)" <ji...@apache.org> on 2018/10/23 14:28:00 UTC

[jira] [Updated] (NUTCH-2666) increase default value for http.content.limit

     [ https://issues.apache.org/jira/browse/NUTCH-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marco Ebbinghaus updated NUTCH-2666:
------------------------------------
    Description: 
The default value for http.content.limit in nutch-default.xml (The length limit for downloaded content using the http://
 protocol, in bytes. If this value is nonnegative (>=0), content longer
 than it will be truncated; otherwise, no truncation at all. Do not
 confuse this setting with the file.content.limit setting.) is set to 64kb. Maybe this default value should be increased as many pages today are greater than 64kb.

The description might also be updated as this is not only the case for the http protocol, but also for https.

  was:
The default value for http.content.limit (The length limit for downloaded content using the http://
 protocol, in bytes. If this value is nonnegative (>=0), content longer
 than it will be truncated; otherwise, no truncation at all. Do not
 confuse this setting with the file.content.limit setting.) is set to 64kb. Maybe this default value should be increased as many pages today are greater than 64kb.

The description might also be updated as this is not only the case for the http protocol, but also for https.


> increase default value for http.content.limit
> ---------------------------------------------
>
>                 Key: NUTCH-2666
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2666
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.15
>            Reporter: Marco Ebbinghaus
>            Priority: Minor
>
> The default value for http.content.limit in nutch-default.xml (The length limit for downloaded content using the http://
>  protocol, in bytes. If this value is nonnegative (>=0), content longer
>  than it will be truncated; otherwise, no truncation at all. Do not
>  confuse this setting with the file.content.limit setting.) is set to 64kb. Maybe this default value should be increased as many pages today are greater than 64kb.
> The description might also be updated as this is not only the case for the http protocol, but also for https.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)