You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/03/02 10:05:59 UTC

[jira] [Issue Comment Edited] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

    [ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220787#comment-13220787 ] 

Markus Jelsma edited comment on NUTCH-1024 at 3/2/12 9:05 AM:
--------------------------------------------------------------

New patch for trunk! This also includes a change to the injector where injected fetchInterval is added to CrawlDatum MD. In AdaptiveFetchSchedule this injected interval overrides anything else. This is useful for sites where you want to use AdaptiveFetchSchedule but still want the generator to select an injected homepage every N hours.
                
      was (Author: markus17):
    New patch for trunk! This also includes a change to the injector where injected fetchInterval is added to CrawlDatum MD. In AdaptiveFetchSchedule this injected interval overrides anything else.
                  
> Dynamically set fetchInterval by MIME-type
> ------------------------------------------
>
>                 Key: NUTCH-1024
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1024
>             Project: Nutch
>          Issue Type: New Feature
>          Components: generator
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch, Nutch.patch, adaptive-mimetypes.txt
>
>
> Add facility to configure default or fixed fetchInterval values by MIME-type. This is useful for conserving resources for files that are known to change frequently or never and everything in between.
> * simple key\tvalue\n configuration file
> * only set fetchInterval for new documents
> * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira