You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by patrik <pa...@clipblast.com> on 2007/06/20 06:45:25 UTC

RE: Nutch 0.9 - Generator: 0 records selected for fetching, exiting

Since this thread started I've had the same issue occur. I can confirm
that with a crawldb that contains more hosts than nodes in the cluster
it works fine. The problem crawldb contains 3 different hosts, which
still generates a 0 fetchlist for a cluster of 4 nodes.

pb

-----Original Message-----
From: Vishal Shah [mailto:vishals@rediff.co.in] 
Sent: Wednesday, May 23, 2007 2:44 AM
To: nutch-user@lucene.apache.org
Subject: RE: Nutch 0.9 - Generator: 0 records selected for fetching,
exiting


Hi Ian, Abidari,

  We were having a similar problem as well. The problem in our case was
happening when all the urls are from the same host. If the urls are from
different hosts, the generator was able to generate the list. Otherwise,
the generator creates an empty fetchlist.

  We got around this problem by injecting some dummy urls in the list
that were from a different host. Could you try doing the same thing and
see if the generator works? If it does, then we can check the generator
code to see why this is happening.

Regards,

-vishal.

-----Original Message-----
From: Ian Holsman [mailto:lists@holsman.net] 
Sent: Wednesday, May 23, 2007 11:11 AM
To: nutch-user@lucene.apache.org
Subject: Re: Nutch 0.9 - Generator: 0 records selected for fetching,
exiting



Abidari wrote:
> 
> Ian
>  
> Can you please help with this? I have upgraded to Nutch 0.9. I am able

> to

> run Nutch in a standalone mode, ie without hadoop. But with hadoop I 
> get the
> error "Generator: 0 records selected for fetching, exiting ...". 
> I have performed this step - bin/hadoop dfs -put urls urls.  And upon

> running bin/hadoop dfs -ls, I see that urls is there in the dfs
>  
> Output of Crawl.
>  
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth =  3
> topN = 50
> Injector: starting
> Injector: crawlDb:  crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to  crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector:  done
> Generator: Selecting best-scoring urls due for fetch.
> Generator:  starting
> Generator: segment: crawl/segments/20070419134155
> Generator:  filtering: false
> Generator: topN: 50
> Generator: 0 records selected for  fetching, exiting ... Stopping at 
> depth=0 - no more URLs to fetch. No URLs  to fetch - check your seed 
> list and URL filters. crawl finished:  crawl
> 
> 


Hi Abidari,

I ran into this problem as well.

I'm not sure if it is related, but when I examine the stderr of the
mapper job I see:

log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /opt/nutch/search/logs (Is a directory)
        at java.io.FileOutputStream.openAppend(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:177)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:102)
        at org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
        at
org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163)
        at
org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFi
leAp
pender.java:215)
        at
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256)
        at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java
:132
)
        at
org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java
:96)
        at
org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator
.jav
a:654)
        at
org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator
.jav
a:612)
        at
org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConf
igur
ator.java:509)
        at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.j
ava:
415)
        at
org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.j
ava:
441)
        at
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConver
ter.
java:468)
        at org.apache.log4j.LogManager.<clinit>(LogManager.java:122)
        at org.apache.log4j.Logger.getLogger(Logger.java:104)
        at
org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:2
29)
        at
org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:65)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorA
cces
sorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCons
truc
torAccessorImpl.java:27)
        at
java.lang.reflect.Constructor.newInstance(Constructor.java:494)
        at
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImp
l.ja
va:529)
        at
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImp
l.ja
va:235)
        at
org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
        at
org.apache.hadoop.mapred.TaskTracker.<clinit>(TaskTracker.java:82)
        at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1423)
log4j:ERROR Either File or DatePattern options are not set for appender
[DRFA].


which points to log4j being mis configured.

abidari, did you get any further with this? Andrei any hints??? 
-- 
View this message in context:
http://www.nabble.com/Nutch-0.9---Generator%3A-0-records-selected-for-fe
tchi
ng%2C-exiting-tf3609078.html#a10757841
Sent from the Nutch - User mailing list archive at Nabble.com.