You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Meghna Kukreja <om...@gmail.com> on 2006/09/13 21:30:35 UTC

Bug in Nutch?

Hi,

I set the http.content.limit to -1 to not truncate any data being
fetched, however if the fetched data was compressed (http response
header Content-Encoding: gzip) then Nutch was not able to uncompress
this data. If i set http.content.limit to its default value of 65536,
Nutch did not have any problem. I debugged nutch in eclipse and I
think the problem is in GZIPUtils.java in the loop:
 if ((written + size) > sizeLimit) {
            outStream.write(buf, 0, sizeLimit - written);
            break;
 }
 It should truncate the data only if sizeLimit >=0, so the above loop
should read:
 if ((written + size) > sizeLimit && sizeLimit >=0) {
            outStream.write(buf, 0, sizeLimit - written);
            break;
 }

 Has anyone seen this before and is this solution correct?

 Thanks,
 Meghna

Bug in Nutch?

Posted by Meghna Kukreja <om...@gmail.com>.

Hi,

I set the http.content.limit to -1 to not truncate any data being
fetched, however if the fetched data was compressed (http response
header Content-Encoding: gzip) then Nutch was not able to uncompress
this data. If i set http.content.limit to its default value of 65536,
Nutch did not have any problem. I debugged nutch in eclipse and I
think the problem is in GZIPUtils.java in the loop:
 if ((written + size) > sizeLimit) {
            outStream.write(buf, 0, sizeLimit - written);
            break;
 }
 It should truncate the data only if sizeLimit >=0, so the above loop
should read:
 if ((written + size) > sizeLimit && sizeLimit >=0) {
            outStream.write(buf, 0, sizeLimit - written);
            break;
 }

 Has anyone seen this before and is this solution correct?

 Thanks,
 Meghna