You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "simone frenzel (JIRA)" <ji...@apache.org> on 2011/08/23 16:51:29 UTC

[jira] [Created] (NUTCH-1089) short compressed pages caused Exception

short compressed pages caused Exception  
-----------------------------------------

                 Key: NUTCH-1089
                 URL: https://issues.apache.org/jira/browse/NUTCH-1089
             Project: Nutch
          Issue Type: Bug
            Reporter: simone frenzel


Hi,

tested nutch on compressed pages, and on pages with Basic Auth and compression. On short compressed pages this Exception is thrown: 

2011-08-19 17:06:55,190 ERROR httpclient.Http - java.io.IOException: unzipBestEffort returned null
2011-08-19 17:06:55,190 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:310)
2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:163)
2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138)
2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)


In same cases Basic Auth failt also. 

Works fine with the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1089) short compressed pages caused Exception

Posted by "simone frenzel (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

simone frenzel updated NUTCH-1089:
----------------------------------

    Attachment: HttpResponsePatch.patch

> short compressed pages caused Exception  
> -----------------------------------------
>
>                 Key: NUTCH-1089
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1089
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: simone frenzel
>              Labels: patch
>         Attachments: HttpResponsePatch.patch
>
>
> Hi,
> tested nutch on compressed pages, and on pages with Basic Auth and compression. On short compressed pages this Exception is thrown: 
> 2011-08-19 17:06:55,190 ERROR httpclient.Http - java.io.IOException: unzipBestEffort returned null
> 2011-08-19 17:06:55,190 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:310)
> 2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:163)
> 2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
> 2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138)
> 2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
> In same cases Basic Auth failt also. 
> Works fine with the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (NUTCH-1089) short compressed pages caused Exception

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche resolved NUTCH-1089.
----------------------------------

    Resolution: Fixed

1.4 Committed revision 1160753.
trunk Committed revision 1160754

Thanks Simone!

> short compressed pages caused Exception  
> -----------------------------------------
>
>                 Key: NUTCH-1089
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1089
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: simone frenzel
>              Labels: patch
>         Attachments: HttpResponsePatch.patch
>
>
> Hi,
> tested nutch on compressed pages, and on pages with Basic Auth and compression. On short compressed pages this Exception is thrown: 
> 2011-08-19 17:06:55,190 ERROR httpclient.Http - java.io.IOException: unzipBestEffort returned null
> 2011-08-19 17:06:55,190 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:310)
> 2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:163)
> 2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
> 2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138)
> 2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
> In same cases Basic Auth failt also. 
> Works fine with the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (NUTCH-1089) short compressed pages caused Exception

Posted by "Julien Nioche (Closed) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche closed NUTCH-1089.
--------------------------------


NUTCH-1089, NUTCH-990 and NUTCH-1112 were all related to the same issue which has been fixed thanks to Simone's patch.
                
> short compressed pages caused Exception  
> -----------------------------------------
>
>                 Key: NUTCH-1089
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1089
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: simone frenzel
>              Labels: patch
>         Attachments: HttpResponsePatch.patch
>
>
> Hi,
> tested nutch on compressed pages, and on pages with Basic Auth and compression. On short compressed pages this Exception is thrown: 
> 2011-08-19 17:06:55,190 ERROR httpclient.Http - java.io.IOException: unzipBestEffort returned null
> 2011-08-19 17:06:55,190 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:310)
> 2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:163)
> 2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:154)
> 2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138)
> 2011-08-19 17:06:55,191 ERROR httpclient.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:628)
> In same cases Basic Auth failt also. 
> Works fine with the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira