You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/08/07 16:30:00 UTC

[jira] [Commented] (NUTCH-2814) HttpDateFormat's internal time zone may change after parsing a date

    [ https://issues.apache.org/jira/browse/NUTCH-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173264#comment-17173264 ] 

ASF GitHub Bot commented on NUTCH-2814:
---------------------------------------

sebastian-nagel opened a new pull request #546:
URL: https://github.com/apache/nutch/pull/546


   - reset time zone to GMT after parsing a date
   - add unit test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> HttpDateFormat's internal time zone may change after parsing a date
> -------------------------------------------------------------------
>
>                 Key: NUTCH-2814
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2814
>             Project: Nutch
>          Issue Type: Bug
>          Components: protocol
>    Affects Versions: 1.17
>            Reporter: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.18
>
>
> In the Common Crawl WARC files I've observed that the If-modified-since header is sent in varying time zones:
> {noformat}
> If-Modified-Since: Tue, 25 Feb 2020 03:33:21 MSK
> If-Modified-Since: Sun, 22 Sep 2019 04:41:48 GMT
> If-Modified-Since: Mon, 18 Nov 2019 12:06:19 KRAT
> If-Modified-Since: Tue, 21 Jan 2020 02:10:22 UTC
> If-Modified-Since: Fri, 18 Oct 2019 20:23:57 BST
> If-Modified-Since: Sun, 20 Oct 2019 08:39:26 CEST
> If-Modified-Since: Fri, 15 Nov 2019 12:56:38 EST
> If-Modified-Since: Mon, 30 Mar 2020 09:10:33 GMT
> If-Modified-Since: Mon, 30 Mar 2020 05:18:36 GMT
> If-Modified-Since: Fri, 28 Feb 2020 03:09:16 PST
> If-Modified-Since: Thu, 21 Nov 2019 10:16:19 YEKT
> If-Modified-Since: Thu, 14 Nov 2019 18:01:05 EET
> If-Modified-Since: Thu, 14 Nov 2019 16:46:43 UTC
> If-Modified-Since: Sun, 17 Nov 2019 13:14:28 UTC
> If-Modified-Since: Tue, 25 Feb 2020 21:46:10 GMT
> If-Modified-Since: Wed, 16 Oct 2019 19:03:31 UTC
> If-Modified-Since: Thu, 14 Nov 2019 09:07:13 EST
> If-Modified-Since: Thu, 09 Apr 2020 12:21:53 EEST
> If-Modified-Since: Sat, 28 Mar 2020 19:08:52 CET
> If-Modified-Since: Sun, 23 Feb 2020 12:22:46 CET
> If-Modified-Since: Mon, 21 Oct 2019 03:18:16 PDT
> If-Modified-Since: Fri, 15 Nov 2019 05:41:44 UTC
> If-Modified-Since: Thu, 09 Apr 2020 21:01:32 CEST
> If-Modified-Since: Wed, 11 Dec 2019 11:18:28 KRAT
> If-Modified-Since: Tue, 22 Oct 2019 18:55:54 GMT
> {noformat}
> This actually happens because the time zone of HttpDateFormat's internal SimpleDateFormatter may change when a date is parsed. The next formatting uses the time zone of the last parsed date.
> The usage of "GMT" as time zone is specified in [sec. 7.1.1.1 of RFC 7231|https://tools.ietf.org/html/rfc7231#section-7.1.1.1].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)