You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/01/09 10:16:00 UTC

[jira] [Resolved] (NUTCH-2760) protocol-okhttp: properly record HTTP version in request message header

     [ https://issues.apache.org/jira/browse/NUTCH-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel resolved NUTCH-2760.
------------------------------------
      Assignee: Sebastian Nagel
    Resolution: Fixed

Merged and verified that protocol versions in request and response headers are now correct using protocol-okhttp.

> protocol-okhttp: properly record HTTP version in request message header
> -----------------------------------------------------------------------
>
>                 Key: NUTCH-2760
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2760
>             Project: Nutch
>          Issue Type: Bug
>          Components: plugin, protocol
>    Affects Versions: 1.16
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>              Labels: patch-available
>             Fix For: 1.17
>
>
> The HTTP version in the request message tracked by the plugin protocol-okhttp ({{store.http.request=true}}) is not the version sent in the request but that received from the response.
> Note that the HTTP version sent in the request may differ from that sent back in the response. One example (tracked using wget):
> {noformat}
> > wget -d https://www.kp.ru/daily/27061/4129507/
> ...
> ---request begin---
> GET /daily/27061/4129507/ HTTP/1.1
> User-Agent: Wget/1.20.3 (linux-gnu)
> Accept: */*
> Accept-Encoding: identity
> Host: www.kp.ru
> Connection: Keep-Alive
> ---request end---
> HTTP request sent, awaiting response... 
> ---response begin---
> HTTP/1.0 200 OK
> ...
> {noformat}
> protocol-http uses the response version ("HTTP/1.0") also for the request:
> {noformat}
> > bin/nutch parsechecker -Dstore.http.headers=true -Dstore.http.request=true \
>      -Dplugin.includes='protocol-okhttp|parse-html' https://www.kp.ru/daily/27061/4129507/
> ...
> _request_=GET /daily/27061/4129507/ HTTP/1.0
> ...
> _response.headers_=HTTP/1.0 200 OK
> ...
> {noformat}
> The protocol-http tracks the versions correctly:
> {noformat}
> > bin/nutch parsechecker -Dstore.http.headers=true -Dstore.http.request=true \
>      -Dplugin.includes='protocol-http|parse-html' https://www.kp.ru/daily/27061/4129507/
> ...
> _request_=GET /daily/27061/4129507/ HTTP/1.1
> ...
> _response.headers_=HTTP/1.0 200 OK
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)