You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "lufeng (JIRA)" <ji...@apache.org> on 2013/03/16 16:52:12 UTC

[jira] [Commented] (NUTCH-1533) Implement getPrevModifiedTime(), setPrevModifiedTime(), getBatchId() and setBatchId() accessors in o.a.n.storage.WebPage

    [ https://issues.apache.org/jira/browse/NUTCH-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604309#comment-13604309 ] 

lufeng commented on NUTCH-1533:
-------------------------------

Hi Lewis

Thanks for your reviews.

Issues:

* i see that prevFetchTime is not fed into the schedule#setPageRetrySchedule, so i also not fed prevModifiedTime into it. How do your think about it?

* currently maybe Host table is not affected by batchid. If we want to add a batchId to Host table metadata, maybe we shoud add multiple batchid to it ,because two page from one host maybe have different batchid.

Thanks Lewis.
                
> Implement getPrevModifiedTime(), setPrevModifiedTime(), getBatchId() and setBatchId() accessors in o.a.n.storage.WebPage
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1533
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1533
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.1
>            Reporter: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: NUTCH-1533.patch, NUTCH-1533v2.patch
>
>
> NUTCH-1532 needs to obtain a batchId to add to NutchDocument prior to indexing. This is currently not available as we do not store the information in the WebPage. Additionally, we do not store the other ModifiedTime's but incorrectly set them in o.a.n.crawl.FetchSchedule#setFetchSchedule.
> All the above accessors should be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira