You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/06/11 14:01:00 UTC

[jira] [Commented] (NUTCH-2576) HTTP protocol plugin based on okhttp

    [ https://issues.apache.org/jira/browse/NUTCH-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508096#comment-16508096 ] 

ASF GitHub Bot commented on NUTCH-2576:
---------------------------------------

sebastian-nagel commented on issue #328: NUTCH-2576 HTTP protocol implementation based on okhttp
URL: https://github.com/apache/nutch/pull/328#issuecomment-396253528
 
 
   All TODOs addressed in last commits. The unit tests for NUTCH-2549 are ported now. Some failed and are ignored for now. This seems acceptable because okhttp only supports HTTP/1.1 and HTTP/2 [but not HTTP/1.0](http://square.github.io/okhttp/3.x/okhttp/okhttp3/OkHttpClient.Builder.html#protocols-java.util.List-).
   - no HTTP status line (that conforms only to HTTP/0.9)
   - no multi-line headers (deprecated with HTTP/1.1])
   - also HTTP status line needs to be syntactically correct
   - ignoring errors reading non-200 payload: exception could eventually be caught
   ```
   % grep -A1 -i testcase build/protocol-okhttp/test/TEST-org.apache.nutch.protocol.okhttp.TestBadServerResponses.txt
   Testcase: testBadHttpServer took 0.444 sec
   Testcase: testNoStatusLine took 0.083 sec
           Caused an ERROR
   --
   Testcase: testOverlongHeader took 0.116 sec
   Testcase: testContentLengthNotANumber took 0.13 sec
   Testcase: testHeaderSpellChecking took 0.101 sec
   Testcase: testMultiLineHeader took 0.074 sec
           FAILED
   --
   Testcase: testHeaderWithColon took 0.103 sec
           Caused an ERROR
   --
   Testcase: testChunkedContent took 0.09 sec
   Testcase: testRequestNotStartingWithSlash took 0.077 sec
   Testcase: testIgnoreErrorInRedirectPayload took 0.073 sec
           Caused an ERROR
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> HTTP protocol plugin based on okhttp
> ------------------------------------
>
>                 Key: NUTCH-2576
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2576
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin, protocol
>            Reporter: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.15
>
>
> [Okhttp|http://square.github.io/okhttp/] is an Apache2-licensed http library which supports HTTP/2. [~jnioche]'s implementation [storm-crawler#443|https://github.com/DigitalPebble/storm-crawler/issues/443] proves that it should be straightforward to implement a Nutch protocol plugin using okhttp. A recent HTTP protocol implementation should also fix (most of) the issues reported in NUTCH-2549.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)