You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by "Susam Pal (JIRA)" <ji...@apache.org> on 2007/09/26 20:54:50 UTC

[jira] Commented: (NUTCH-560) protocol-httpclient reading more bytes than http.content.limit

    [ https://issues.apache.org/jira/browse/NUTCH-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530519 ] 

Susam Pal commented on NUTCH-560:
---------------------------------

I analysed 'protocol-http' and it behaves almost in the same manner. While buffering, we can not stop reading after exactly 'http.content.limit' bytes have been read. It would be one iteration after the limit, when the limit check tells that we have exceeded the limit. So, this doesn't seem like a bug. However, it doesn't take care of reading till 'Content-Length' bytes, which NUTCH-559 is doing.

> protocol-httpclient reading more bytes than http.content.limit
> --------------------------------------------------------------
>
>                 Key: NUTCH-560
>                 URL: https://issues.apache.org/jira/browse/NUTCH-560
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.9.0, 1.0.0
>            Reporter: Joseph M.
>
> I modified protocol-httpclient HttpResponse.java to download files to file system. If I set http.content.limit to 5000... it fetches around 5500 to 6000 bytes instead and downloads it to file system. There is calculation mistake in calculateTryToRead() function.
> {code}
>         int tryAndRead = calculateTryToRead(totalRead);
>         while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 && tryAndRead > 0) {
>           totalRead += bufferFilled;
>           out.write(buffer, 0, bufferFilled);
>           tryAndRead = calculateTryToRead(totalRead);
>         }{code}
> while loop stops when calculateTryToRead() returns -ve or 0.
>   {code}private int calculateTryToRead(int totalRead) {
>     int tryToRead = Http.BUFFER_SIZE;
>     if (http.getMaxContent() <= 0) {
>       return http.BUFFER_SIZE;
>     } else if (http.getMaxContent() - totalRead < http.BUFFER_SIZE) {
>       tryToRead = http.getMaxContent() - totalRead;
>     }
>     return tryToRead;
>   }{code}
> It is returning -ve when totalRead > http.getMaxContent(). So more bytes than http.content.limit is read before breaking while loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.