You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/11/04 17:12:00 UTC

[jira] [Commented] (NUTCH-2242) lastModified not always set

    [ https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239106#comment-16239106 ] 

ASF GitHub Bot commented on NUTCH-2242:
---------------------------------------

Omkar20895 opened a new pull request #238: NUTCH-2242 Injector to stop if job fails to avoid loss of CrawlDb
URL: https://github.com/apache/nutch/pull/238
 
 
   - Added Job status checks in the classes: Injector, ReadHostDb, CrawlCompletionStats, ProtocolStatusStatistics, SitemapProcessor and DomainStatistics. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> lastModified not always set
> ---------------------------
>
>                 Key: NUTCH-2242
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2242
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.11
>            Reporter: Jurian Broertjes
>            Priority: Minor
>             Fix For: 1.13
>
>         Attachments: NUTCH-2242.patch
>
>
> I observed two issues:
> - When using the DefaultFetchSchedule, CrawlDatum's modifiedTime field is not updated on the first successful fetch. 
> - When a document modification is detected (protocol- or signature-wise), the modifiedTime isn't updated
> I can provide a patch later today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)