You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/01/09 10:13:01 UTC

[jira] [Commented] (NUTCH-2760) protocol-okhttp: properly record HTTP version in request message header

    [ https://issues.apache.org/jira/browse/NUTCH-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011653#comment-17011653 ] 

ASF GitHub Bot commented on NUTCH-2760:
---------------------------------------

sebastian-nagel commented on pull request #489: NUTCH-2760 protocol-okhttp: properly record HTTP version in request message header
URL: https://github.com/apache/nutch/pull/489
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> protocol-okhttp: properly record HTTP version in request message header
> -----------------------------------------------------------------------
>
>                 Key: NUTCH-2760
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2760
>             Project: Nutch
>          Issue Type: Bug
>          Components: plugin, protocol
>    Affects Versions: 1.16
>            Reporter: Sebastian Nagel
>            Priority: Minor
>              Labels: patch-available
>             Fix For: 1.17
>
>
> The HTTP version in the request message tracked by the plugin protocol-okhttp ({{store.http.request=true}}) is not the version sent in the request but that received from the response.
> Note that the HTTP version sent in the request may differ from that sent back in the response. One example (tracked using wget):
> {noformat}
> > wget -d https://www.kp.ru/daily/27061/4129507/
> ...
> ---request begin---
> GET /daily/27061/4129507/ HTTP/1.1
> User-Agent: Wget/1.20.3 (linux-gnu)
> Accept: */*
> Accept-Encoding: identity
> Host: www.kp.ru
> Connection: Keep-Alive
> ---request end---
> HTTP request sent, awaiting response... 
> ---response begin---
> HTTP/1.0 200 OK
> ...
> {noformat}
> protocol-http uses the response version ("HTTP/1.0") also for the request:
> {noformat}
> > bin/nutch parsechecker -Dstore.http.headers=true -Dstore.http.request=true \
>      -Dplugin.includes='protocol-okhttp|parse-html' https://www.kp.ru/daily/27061/4129507/
> ...
> _request_=GET /daily/27061/4129507/ HTTP/1.0
> ...
> _response.headers_=HTTP/1.0 200 OK
> ...
> {noformat}
> The protocol-http tracks the versions correctly:
> {noformat}
> > bin/nutch parsechecker -Dstore.http.headers=true -Dstore.http.request=true \
>      -Dplugin.includes='protocol-http|parse-html' https://www.kp.ru/daily/27061/4129507/
> ...
> _request_=GET /daily/27061/4129507/ HTTP/1.1
> ...
> _response.headers_=HTTP/1.0 200 OK
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)