You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2020/01/09 10:16:00 UTC
[jira] [Resolved] (NUTCH-2760) protocol-okhttp: properly record
HTTP version in request message header
[ https://issues.apache.org/jira/browse/NUTCH-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel resolved NUTCH-2760.
------------------------------------
Assignee: Sebastian Nagel
Resolution: Fixed
Merged and verified that protocol versions in request and response headers are now correct using protocol-okhttp.
> protocol-okhttp: properly record HTTP version in request message header
> -----------------------------------------------------------------------
>
> Key: NUTCH-2760
> URL: https://issues.apache.org/jira/browse/NUTCH-2760
> Project: Nutch
> Issue Type: Bug
> Components: plugin, protocol
> Affects Versions: 1.16
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Minor
> Labels: patch-available
> Fix For: 1.17
>
>
> The HTTP version in the request message tracked by the plugin protocol-okhttp ({{store.http.request=true}}) is not the version sent in the request but that received from the response.
> Note that the HTTP version sent in the request may differ from that sent back in the response. One example (tracked using wget):
> {noformat}
> > wget -d https://www.kp.ru/daily/27061/4129507/
> ...
> ---request begin---
> GET /daily/27061/4129507/ HTTP/1.1
> User-Agent: Wget/1.20.3 (linux-gnu)
> Accept: */*
> Accept-Encoding: identity
> Host: www.kp.ru
> Connection: Keep-Alive
> ---request end---
> HTTP request sent, awaiting response...
> ---response begin---
> HTTP/1.0 200 OK
> ...
> {noformat}
> protocol-http uses the response version ("HTTP/1.0") also for the request:
> {noformat}
> > bin/nutch parsechecker -Dstore.http.headers=true -Dstore.http.request=true \
> -Dplugin.includes='protocol-okhttp|parse-html' https://www.kp.ru/daily/27061/4129507/
> ...
> _request_=GET /daily/27061/4129507/ HTTP/1.0
> ...
> _response.headers_=HTTP/1.0 200 OK
> ...
> {noformat}
> The protocol-http tracks the versions correctly:
> {noformat}
> > bin/nutch parsechecker -Dstore.http.headers=true -Dstore.http.request=true \
> -Dplugin.includes='protocol-http|parse-html' https://www.kp.ru/daily/27061/4129507/
> ...
> _request_=GET /daily/27061/4129507/ HTTP/1.1
> ...
> _response.headers_=HTTP/1.0 200 OK
> ...
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)