You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Bradford Stephens <br...@gmail.com> on 2008/05/16 23:12:28 UTC

Injector / Generator fails with "can't find rules..."

Greetings,

I'm running the latest trunk of Nutch 0.9 with some patches (like 467
which fixed the small injection list issue). I'm on Ubuntu Server
8.04. For several weeks, my installation has been running correctly.
Now, however, if I try to crawl a list of URLs using the bin/nutch
crawl command, I get the error:

"Stopping at depth = 0, no URLs to fetch"

I looked at the hadoop.log file of the datanode which was running the
, and I stumbled across this:

"WARN regex.RegexURLNormalizer - can't find rules for scope 'inject',
using default"

Could this be cauing the problem? I'm stumped. The crawl-urlfilter.xml
and the automaton-filter.xml (or whatever they are named) have not
changed since my last successful crawl.

My hadoop-site.xml looks like this:
<property>
   <name>mapred.speculative.execution</name>
   <value>false</value>
</property>


<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/visibleuser/search/nutchtest/hadooptmp</value>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://dttest01:54310</value>
</property>

<property>
  <name>mapred.job.tracker</name>
  <value>dttest01:54311</value>
</property>

<property>
  <name>dfs.replication</name>
  <value>3</value>
</property>

Re: Injector / Generator fails with "can't find rules..."

Posted by Bradford Stephens <br...@gmail.com>.

Greetings again:

On a hunch, I double-checked the filters.xml -- it turns out rsync had
not updated anything on the datanodes, so all the URLs were still
being filtered out :) Crisis averted!

Cheers,
Bradford

On Fri, May 16, 2008 at 2:12 PM, Bradford Stephens
<br...@gmail.com> wrote:
> Greetings,
>
> I'm running the latest trunk of Nutch 0.9 with some patches (like 467
> which fixed the small injection list issue). I'm on Ubuntu Server
> 8.04. For several weeks, my installation has been running correctly.
> Now, however, if I try to crawl a list of URLs using the bin/nutch
> crawl command, I get the error:
>
> "Stopping at depth = 0, no URLs to fetch"
>
> I looked at the hadoop.log file of the datanode which was running the
> , and I stumbled across this:
>
> "WARN regex.RegexURLNormalizer - can't find rules for scope 'inject',
> using default"
>
> Could this be cauing the problem? I'm stumped. The crawl-urlfilter.xml
> and the automaton-filter.xml (or whatever they are named) have not
> changed since my last successful crawl.
>
> My hadoop-site.xml looks like this:
> <property>
>   <name>mapred.speculative.execution</name>
>   <value>false</value>
> </property>
>
>
> <property>
>  <name>hadoop.tmp.dir</name>
>  <value>/home/visibleuser/search/nutchtest/hadooptmp</value>
> </property>
>
> <property>
>  <name>fs.default.name</name>
>  <value>hdfs://dttest01:54310</value>
> </property>
>
> <property>
>  <name>mapred.job.tracker</name>
>  <value>dttest01:54311</value>
> </property>
>
> <property>
>  <name>dfs.replication</name>
>  <value>3</value>
> </property>
>