You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Joseph M. (JIRA)" <ji...@apache.org> on 2007/09/25 12:34:50 UTC

[jira] Created: (NUTCH-560) protocol-httpclient reading more bytes than http.content.limit

protocol-httpclient reading more bytes than http.content.limit
--------------------------------------------------------------

                 Key: NUTCH-560
                 URL: https://issues.apache.org/jira/browse/NUTCH-560
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 0.9.0, 1.0.0
            Reporter: Joseph M.


I modified protocol-httpclient HttpResponse.java to download files to file system. If I set http.content.limit to 5000... it fetches around 5500 to 6000 bytes instead and downloads it to file system. There is calculation mistake in calculateTryToRead() function.

{code}
        int tryAndRead = calculateTryToRead(totalRead);
        while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 && tryAndRead > 0) {
          totalRead += bufferFilled;
          out.write(buffer, 0, bufferFilled);
          tryAndRead = calculateTryToRead(totalRead);
        }{code}

while loop stops when calculateTryToRead() returns -ve or 0.

  {code}private int calculateTryToRead(int totalRead) {
    int tryToRead = Http.BUFFER_SIZE;
    if (http.getMaxContent() <= 0) {
      return http.BUFFER_SIZE;
    } else if (http.getMaxContent() - totalRead < http.BUFFER_SIZE) {
      tryToRead = http.getMaxContent() - totalRead;
    }
    return tryToRead;
  }{code}

It is returning -ve when totalRead > http.getMaxContent(). So more bytes than http.content.limit is read before breaking while loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (NUTCH-560) protocol-httpclient reading more bytes than http.content.limit

Posted by "Doğacan Güney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doğacan Güney closed NUTCH-560.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.0
         Assignee: Doğacan Güney

Fixed as part of NUTCH-559.

> protocol-httpclient reading more bytes than http.content.limit
> --------------------------------------------------------------
>
>                 Key: NUTCH-560
>                 URL: https://issues.apache.org/jira/browse/NUTCH-560
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.9.0, 1.0.0
>            Reporter: Joseph M.
>            Assignee: Doğacan Güney
>             Fix For: 1.0.0
>
>
> I modified protocol-httpclient HttpResponse.java to download files to file system. If I set http.content.limit to 5000... it fetches around 5500 to 6000 bytes instead and downloads it to file system. There is calculation mistake in calculateTryToRead() function.
> {code}
>         int tryAndRead = calculateTryToRead(totalRead);
>         while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 && tryAndRead > 0) {
>           totalRead += bufferFilled;
>           out.write(buffer, 0, bufferFilled);
>           tryAndRead = calculateTryToRead(totalRead);
>         }{code}
> while loop stops when calculateTryToRead() returns -ve or 0.
>   {code}private int calculateTryToRead(int totalRead) {
>     int tryToRead = Http.BUFFER_SIZE;
>     if (http.getMaxContent() <= 0) {
>       return http.BUFFER_SIZE;
>     } else if (http.getMaxContent() - totalRead < http.BUFFER_SIZE) {
>       tryToRead = http.getMaxContent() - totalRead;
>     }
>     return tryToRead;
>   }{code}
> It is returning -ve when totalRead > http.getMaxContent(). So more bytes than http.content.limit is read before breaking while loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-560) protocol-httpclient reading more bytes than http.content.limit

Posted by "Susam Pal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530519 ] 

Susam Pal commented on NUTCH-560:
---------------------------------

I analysed 'protocol-http' and it behaves almost in the same manner. While buffering, we can not stop reading after exactly 'http.content.limit' bytes have been read. It would be one iteration after the limit, when the limit check tells that we have exceeded the limit. So, this doesn't seem like a bug. However, it doesn't take care of reading till 'Content-Length' bytes, which NUTCH-559 is doing.

> protocol-httpclient reading more bytes than http.content.limit
> --------------------------------------------------------------
>
>                 Key: NUTCH-560
>                 URL: https://issues.apache.org/jira/browse/NUTCH-560
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.9.0, 1.0.0
>            Reporter: Joseph M.
>
> I modified protocol-httpclient HttpResponse.java to download files to file system. If I set http.content.limit to 5000... it fetches around 5500 to 6000 bytes instead and downloads it to file system. There is calculation mistake in calculateTryToRead() function.
> {code}
>         int tryAndRead = calculateTryToRead(totalRead);
>         while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 && tryAndRead > 0) {
>           totalRead += bufferFilled;
>           out.write(buffer, 0, bufferFilled);
>           tryAndRead = calculateTryToRead(totalRead);
>         }{code}
> while loop stops when calculateTryToRead() returns -ve or 0.
>   {code}private int calculateTryToRead(int totalRead) {
>     int tryToRead = Http.BUFFER_SIZE;
>     if (http.getMaxContent() <= 0) {
>       return http.BUFFER_SIZE;
>     } else if (http.getMaxContent() - totalRead < http.BUFFER_SIZE) {
>       tryToRead = http.getMaxContent() - totalRead;
>     }
>     return tryToRead;
>   }{code}
> It is returning -ve when totalRead > http.getMaxContent(). So more bytes than http.content.limit is read before breaking while loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.