You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Thamme Gowda N (JIRA)" <ji...@apache.org> on 2015/11/09 09:57:10 UTC
[jira] [Created] (NUTCH-2164) Inconsistent 'Modified Time' in crawl
db
Thamme Gowda N created NUTCH-2164:
-------------------------------------
Summary: Inconsistent 'Modified Time' in crawl db
Key: NUTCH-2164
URL: https://issues.apache.org/jira/browse/NUTCH-2164
Project: Nutch
Issue Type: Improvement
Components: crawldb, fetcher
Affects Versions: 1.11
Reporter: Thamme Gowda N
Priority: Minor
The 'Modified time' in crawldb is invalid. It is set to (0-Timezone Difference)
*How to verify/reproduce:*
Run 'nutch readdb /path/to/crawldb -dump yy' and then inspect content of 'yy'
The following improvements can be done:
1. Set modified time by DefaultFetchSchedule
2. Set ProtocolStatus.lastModified if modified time is available in protocol response headers
This issue is also discussed in dev mailing lists: http://www.mail-archive.com/dev@nutch.apache.org/msg19803.html#
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)