You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by cihat güzel <c....@gmail.com> on 2013/08/26 10:00:43 UTC
http.content.limit
Hi all,
if (http.getMaxContent() >= 0
&& contentLength > http.getMaxContent()) // limit download size
contentLength = http.getMaxContent();
.......
for (int i = in.read(bytes); i != -1 && length + i <= contentLength; i
= in.read(bytes)) {
out.write(bytes, 0, i);
length += i;
}
So nutch works like that: If "http.content.limit < contentLength" then
truncate the content. Then If isTruncated() is true at ParserJob, do not
parse.
Why the content is read? I think we should pass fetch, If
"http.content.limit < contentLength".
If you think so I can implement patch.
Re: http.content.limit
Posted by cihat güzel <c....@gmail.com>.
The code block is available in "protocol-http/HTTPResponse.java" . Similar
code is available in "protocol-httpclient"
2013/8/26 cihat güzel <c....@gmail.com>
> Hi all,
>
> if (http.getMaxContent() >= 0
> && contentLength > http.getMaxContent()) // limit download size
> contentLength = http.getMaxContent();
> .......
> for (int i = in.read(bytes); i != -1 && length + i <= contentLength; i
> = in.read(bytes)) {
>
> out.write(bytes, 0, i);
> length += i;
> }
>
> So nutch works like that: If "http.content.limit < contentLength" then
> truncate the content. Then If isTruncated() is true at ParserJob, do not
> parse.
>
> Why the content is read? I think we should pass fetch, If
> "http.content.limit < contentLength".
>
> If you think so I can implement patch.
>