You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/08/15 18:34:56 UTC

[jira] [Commented] (NUTCH-1598) ElasticSearchIndexer to read ImmutableSettings from config

    [ https://issues.apache.org/jira/browse/NUTCH-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741154#comment-13741154 ] 

Lewis John McGibbney commented on NUTCH-1598:
---------------------------------------------

great work Markus. This is dynamite :)
                
> ElasticSearchIndexer to read ImmutableSettings from config
> ----------------------------------------------------------
>
>                 Key: NUTCH-1598
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1598
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 1.7
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.8
>
>         Attachments: NUTCH-1598-1.8.patch
>
>
> In some cases one must configure settings prior to indexing such as discovery.zen.ping.multicast.group or discovery.zen.ping.multicast.port if the node needs to find the cluster somewhere else. This patch allows for a key=value file in Nutch' config that is loaded in ImmutableSettings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: crawl.gen.delay

Posted by feng lu <am...@gmail.com>.
yes, it is used in Nutch 1.x , but never used in Nutch 2.x. because in
Nutch 2.x it will never generate selected url.

the correct expression of crawl.gen.crawl is milliseconds you can check the
Nutch 1.x nutch-default.xml. the property description like this:

<property>
  <name>crawl.gen.delay</name>
  <value>604800000</value>
  <description>
   This value, expressed in milliseconds, defines how long we should keep
the lock on records
   in CrawlDb that were just selected for fetching. If these records are
not updated
   in the meantime, the lock is canceled, i.e. they become eligible for
selecting.
   Default value of this is 7 days (604800000 ms).
  </description>
</property>

Maybe it is wrong.

On Fri, Aug 16, 2013 at 3:17 AM, kaveh minooie <ka...@plutoz.com> wrote:

> crawl.gen.delay





-- 
Don't Grow Old, Grow Up... :-)

crawl.gen.delay

Posted by kaveh minooie <ka...@plutoz.com>.
  is 'crawl.gen.delay' still being used anywhere? cause I can't find 
anything in the source code except for here:

package org.apache.nutch.crawl;

public class GeneratorJob extends NutchTool implements Tool {
   public static final String GENERATOR_TOP_N = "generate.topN";
   public static final String GENERATOR_CUR_TIME = "generate.curTime";
   public static final String GENERATOR_DELAY = "crawl.gen.delay";

, and I think it has the wrong value in the nutch-default.xml file. ( 
the value is in seconds, it should be in days)