You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by cihat güzel <c....@gmail.com> on 2013/08/26 10:00:43 UTC

http.content.limit

Hi all,

if (http.getMaxContent() >= 0
      && contentLength > http.getMaxContent())   // limit download size
      contentLength  = http.getMaxContent();
.......
    for (int i = in.read(bytes); i != -1 && length + i <= contentLength; i
= in.read(bytes)) {

      out.write(bytes, 0, i);
      length += i;
    }

So nutch works like that: If "http.content.limit < contentLength" then
truncate the content.  Then If isTruncated() is true at ParserJob, do not
parse.

Why the content is read? I think we should pass fetch, If
"http.content.limit < contentLength".

If you think so I can implement patch.

Re: http.content.limit

Posted by cihat güzel <c....@gmail.com>.

The code block is available in "protocol-http/HTTPResponse.java" . Similar
code is available in "protocol-httpclient"


2013/8/26 cihat güzel <c....@gmail.com>

> Hi all,
>
> if (http.getMaxContent() >= 0
>       && contentLength > http.getMaxContent())   // limit download size
>       contentLength  = http.getMaxContent();
> .......
>     for (int i = in.read(bytes); i != -1 && length + i <= contentLength; i
> = in.read(bytes)) {
>
>       out.write(bytes, 0, i);
>       length += i;
>     }
>
> So nutch works like that: If "http.content.limit < contentLength" then
> truncate the content.  Then If isTruncated() is true at ParserJob, do not
> parse.
>
> Why the content is read? I think we should pass fetch, If
> "http.content.limit < contentLength".
>
> If you think so I can implement patch.
>