You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Paul van Hoven <pa...@googlemail.com> on 2011/07/10 16:42:47 UTC

Problems with tutorial

I'm completly new to nutch so I downloaded version 1.3 and worked 
through the beginners tutorial at 
http://wiki.apache.org/nutch/NutchTutorial. The first problem was that I 
did not find  the file "conf/crawl-urlfilter.txt" so I omitted that and 
continued with launiching nutch. Therefore I created a plain text file 
in "/Users/toom/Downloads/nutch-1.3/crawled" called "urls.txt" which 
contains the following text:

tom:crawled toom$ cat urls.txt
http://nutch.apache.org/

So after that I invoked nutch by calling
tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled -dir 
/Users/toom/Downloads/nutch-1.3/sites -depth 3 -topN 50
solrUrl is not set, indexing will be skipped...
crawl started in: /Users/toom/Downloads/nutch-1.3/sites
rootUrlDir = /Users/toom/Downloads/nutch-1.3/crawled
threads = 10
depth = 3
solrUrl=null
topN = 50
Injector: starting at 2011-07-07 14:02:31
Injector: crawlDb: /Users/toom/Downloads/nutch-1.3/sites/crawldb
Injector: urlDir: /Users/toom/Downloads/nutch-1.3/crawled
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2011-07-07 14:02:35, elapsed: 00:00:03
Generator: starting at 2011-07-07 14:02:35
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: 
/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238
Generator: finished at 2011-07-07 14:02:39, elapsed: 00:00:04
Fetcher: No agents listed in 'http.agent.name' property.
Exception in thread "main" java.lang.IllegalArgumentException: Fetcher: 
No agents listed in 'http.agent.name' property.
     at 
org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166)
     at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068)
     at org.apache.nutch.crawl.Crawl.run(Crawl.java:135)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
     at org.apache.nutch.crawl.Crawl.main(Crawl.java:54)


I do not understand what happend here, maybe one of you can help me?

Re: Problems with tutorial

Posted by Cupbearer <jc...@inforeverse.com>.

I had this problem also and then saw this part... which answered a TON of
questions for me...

"or
runtime/local/bin/nutch (version >= 1.3) "

Part of the Tutorial.  If you downloaded the tar.gz file like I did then you
needed to find everything in the runtime folder.  Then EVERYTHING else when
they say "bin/nutch" will make sense.


-----

Cupbearer 
Jerry E. Craig, Jr.

--
View this message in context: http://lucene.472066.n3.nabble.com/Problems-with-tutorial-tp3156809p3157625.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Problems with tutorial

Posted by lewis john mcgibbney <le...@gmail.com>.

Hi,

For a 1.3 tutorial please see here [1]. I am in the process of overhauling
the nutch site to accomodate new changes as per 1.3 release.

Thank you

On Sun, Jul 10, 2011 at 3:42 PM, Paul van Hoven <
paul.van.hoven@googlemail.com> wrote:

> I'm completly new to nutch so I downloaded version 1.3 and worked through
> the beginners tutorial at http://wiki.apache.org/nutch/**NutchTutorial<http://wiki.apache.org/nutch/NutchTutorial>.
> The first problem was that I did not find  the file
> "conf/crawl-urlfilter.txt" so I omitted that and continued with launiching
> nutch. Therefore I created a plain text file in
> "/Users/toom/Downloads/nutch-**1.3/crawled" called "urls.txt" which
> contains the following text:
>
> tom:crawled toom$ cat urls.txt
> http://nutch.apache.org/
>
> So after that I invoked nutch by calling
> tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.**3/crawled -dir
> /Users/toom/Downloads/nutch-1.**3/sites -depth 3 -topN 50
> solrUrl is not set, indexing will be skipped...
> crawl started in: /Users/toom/Downloads/nutch-1.**3/sites
> rootUrlDir = /Users/toom/Downloads/nutch-1.**3/crawled
> threads = 10
> depth = 3
> solrUrl=null
> topN = 50
> Injector: starting at 2011-07-07 14:02:31
> Injector: crawlDb: /Users/toom/Downloads/nutch-1.**3/sites/crawldb
> Injector: urlDir: /Users/toom/Downloads/nutch-1.**3/crawled
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-07-07 14:02:35, elapsed: 00:00:03
> Generator: starting at 2011-07-07 14:02:35
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 50
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: Partitioning selected urls for politeness.
> Generator: segment: /Users/toom/Downloads/nutch-1.**3/sites/segments/**
> 20110707140238
> Generator: finished at 2011-07-07 14:02:39, elapsed: 00:00:04
> Fetcher: No agents listed in 'http.agent.name' property.
> Exception in thread "main" java.lang.**IllegalArgumentException: Fetcher:
> No agents listed in 'http.agent.name' property.
>    at org.apache.nutch.fetcher.**Fetcher.checkConfiguration(**
> Fetcher.java:1166)
>    at org.apache.nutch.fetcher.**Fetcher.fetch(Fetcher.java:**1068)
>    at org.apache.nutch.crawl.Crawl.**run(Crawl.java:135)
>    at org.apache.hadoop.util.**ToolRunner.run(ToolRunner.**java:65)
>    at org.apache.nutch.crawl.Crawl.**main(Crawl.java:54)
>
>
> I do not understand what happend here, maybe one of you can help me?
>
>


-- 
*Lewis*

Re: Problems with tutorial

Posted by Markus Jelsma <ma...@openindex.io>.

Hi,

There are a lot of questions on that error:
http://www.google.nl/#hl=nl&source=hp&q=No+agents+listed+in+%27http.agent.name%27+property.&oq=No+agents+listed+in+%27http.agent.name%27+property.&aq=f&aqi=&aql=undefined&gs_sm=e&gs_upl=972l972l0l1l1l0l0l0l0l38l38l1l1&bav=on.2,or.r_gc.r_pw.&fp=62113c346707e160&biw=790&bih=328

Add the agents property to your configuration as per the tutorial:
http://wiki.apache.org/nutch/NutchTutorial

Cheers,

> I'm completly new to nutch so I downloaded version 1.3 and worked
> through the beginners tutorial at
> http://wiki.apache.org/nutch/NutchTutorial. The first problem was that I
> did not find  the file "conf/crawl-urlfilter.txt" so I omitted that and
> continued with launiching nutch. Therefore I created a plain text file
> in "/Users/toom/Downloads/nutch-1.3/crawled" called "urls.txt" which
> contains the following text:
> 
> tom:crawled toom$ cat urls.txt
> http://nutch.apache.org/
> 
> So after that I invoked nutch by calling
> tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled -dir
> /Users/toom/Downloads/nutch-1.3/sites -depth 3 -topN 50
> solrUrl is not set, indexing will be skipped...
> crawl started in: /Users/toom/Downloads/nutch-1.3/sites
> rootUrlDir = /Users/toom/Downloads/nutch-1.3/crawled
> threads = 10
> depth = 3
> solrUrl=null
> topN = 50
> Injector: starting at 2011-07-07 14:02:31
> Injector: crawlDb: /Users/toom/Downloads/nutch-1.3/sites/crawldb
> Injector: urlDir: /Users/toom/Downloads/nutch-1.3/crawled
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-07-07 14:02:35, elapsed: 00:00:03
> Generator: starting at 2011-07-07 14:02:35
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 50
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: Partitioning selected urls for politeness.
> Generator: segment:
> /Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238
> Generator: finished at 2011-07-07 14:02:39, elapsed: 00:00:04
> Fetcher: No agents listed in 'http.agent.name' property.
> Exception in thread "main" java.lang.IllegalArgumentException: Fetcher:
> No agents listed in 'http.agent.name' property.
>      at
> org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166)
>      at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068)
>      at org.apache.nutch.crawl.Crawl.run(Crawl.java:135)
>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>      at org.apache.nutch.crawl.Crawl.main(Crawl.java:54)
> 
> 
> I do not understand what happend here, maybe one of you can help me?

Re: Problems with tutorial

Posted by Emre Çelikten <em...@celikten.name>.

Hello,

Check your urls and regex-urlfilter files. Probably you have a problem 
there, assuming you are using your own links.

On 06/17/2012 05:46 PM, soberchallen wrote:
> Hello, I have the same problem. Have you already solved? The detail is as
> followed!
> *bin/nutch crawl urls -dir crawl -depth 2 -topN 100 -threads 2*
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl
> rootUrlDir = urls
> threads = 2
> depth = 2
> solrUrl=null
> topN = 100
> Injector: starting at 2012-06-17 22:27:39
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2012-06-17 22:27:41, elapsed: 00:00:02
> Generator: starting at 2012-06-17 22:27:41
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 100
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Problems-with-tutorial-tp3156809p3990019.html
> Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Problems with tutorial

Posted by soberchallen <90...@qq.com>.

Hello, I have the same problem. Have you already solved? The detail is as
followed!
*bin/nutch crawl urls -dir crawl -depth 2 -topN 100 -threads 2*
solrUrl is not set, indexing will be skipped...
crawl started in: crawl
rootUrlDir = urls
threads = 2
depth = 2
solrUrl=null
topN = 100
Injector: starting at 2012-06-17 22:27:39
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2012-06-17 22:27:41, elapsed: 00:00:02
Generator: starting at 2012-06-17 22:27:41
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 100
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=0 - no more URLs to fetch.
No URLs to fetch - check your seed list and URL filters.
crawl finished: crawl


--
View this message in context: http://lucene.472066.n3.nabble.com/Problems-with-tutorial-tp3156809p3990019.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Problems with tutorial

Posted by Julien Nioche <li...@gmail.com>.

Have just updated the tutorial, as of 1.3 the files shoudl be changed in
$NUTCH_HOME/runtime/local/conf/ unless you rebuild with ANT


On 12 July 2011 10:43, Paul van Hoven <pa...@googlemail.com> wrote:

> Thanks for the answers. I'm not shure if the 'http.agent.name' is the
> problem since I set it:
>
> This is the configuration I'm using from nutch-1.3/conf/nutch-default.xml:
>
> <!-- HTTP properties -->
>
> <property>
>  <name>http.agent.name</name>
>  <value>MyFirstNutchCrawler</value>
>  <description>HTTP 'User-Agent' request header. MUST NOT be empty -
>  please set this to a single word uniquely related to your organization.
>
>  NOTE: You should also check other related properties:
>
>        http.robots.agents
>        http.agent.description
>        http.agent.url
>        http.agent.email
>        http.agent.version
>
>  and set their values appropriately.
>
>  </description>
> </property>
>
> As I understand the tutorial this should be correct:
> turoial citation "Search for http.agent.name , and give it value
> 'YOURNAME Spider'"
>
>
> I already had that set this way in my first email.
>
>
>
> 2011/7/10 Ing. Yusniel Hidalgo Delgado <yh...@uci.cu>:
> > Paul, I think that your problem is related with 'http.agent.name'
> property. Please, change this property in your configuration file, such as
> describe the tutorial in:
> >
> >
> >
> > Good! You are almost ready to crawl. You need to give your crawler a
> name. This is required.
> >
> >    1. Open up $NUTCH_HOME/conf/nutch-default.xml file
> >    2.
> >
> > Search for http.agent.name , and give it value 'YOURNAME Spider'
> >    3.
> >
> > Optionally you may also set http.agent.url and http.agent.email
> properties.
> >
> > and try again.
> >
> > Grettings
> >
> > ----- Mensaje original -----
> > De: "Paul van Hoven" <pa...@googlemail.com>
> > Para: user@nutch.apache.org
> > Enviados: Domingo, 10 de Julio 2011 7:42:47 GMT -08:00 Tijuana / Baja
> California
> > Asunto: Problems with tutorial
> >
> > I'm completly new to nutch so I downloaded version 1.3 and worked
> > through the beginners tutorial at
> > http://wiki.apache.org/nutch/NutchTutorial. The first problem was that I
> > did not find the file "conf/crawl-urlfilter.txt" so I omitted that and
> > continued with launiching nutch. Therefore I created a plain text file
> > in "/Users/toom/Downloads/nutch-1.3/crawled" called "urls.txt" which
> > contains the following text:
> >
> > tom:crawled toom$ cat urls.txt
> > http://nutch.apache.org/
> >
> > So after that I invoked nutch by calling
> > tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled -dir
> > /Users/toom/Downloads/nutch-1.3/sites -depth 3 -topN 50
> > solrUrl is not set, indexing will be skipped...
> > crawl started in: /Users/toom/Downloads/nutch-1.3/sites
> > rootUrlDir = /Users/toom/Downloads/nutch-1.3/crawled
> > threads = 10
> > depth = 3
> > solrUrl=null
> > topN = 50
> > Injector: starting at 2011-07-07 14:02:31
> > Injector: crawlDb: /Users/toom/Downloads/nutch-1.3/sites/crawldb
> > Injector: urlDir: /Users/toom/Downloads/nutch-1.3/crawled
> > Injector: Converting injected urls to crawl db entries.
> > Injector: Merging injected urls into crawl db.
> > Injector: finished at 2011-07-07 14:02:35, elapsed: 00:00:03
> > Generator: starting at 2011-07-07 14:02:35
> > Generator: Selecting best-scoring urls due for fetch.
> > Generator: filtering: true
> > Generator: normalizing: true
> > Generator: topN: 50
> > Generator: jobtracker is 'local', generating exactly one partition.
> > Generator: Partitioning selected urls for politeness.
> > Generator: segment:
> > /Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238
> > Generator: finished at 2011-07-07 14:02:39, elapsed: 00:00:04
> > Fetcher: No agents listed in 'http.agent.name' property.
> > Exception in thread "main" java.lang.IllegalArgumentException: Fetcher:
> > No agents listed in 'http.agent.name' property.
> > at
> > org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166)
> > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068)
> > at org.apache.nutch.crawl.Crawl.run(Crawl.java:135)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:54)
> >
> >
> > I do not understand what happend here, maybe one of you can help me?
> >
> >
> >
> > --
> >
> >
> >
> >
> --------------------------------------------------------------------------------------------
> > Ing. Yusniel Hidalgo Delgado
> > Participe en COMPUMAT 2011 http://www.mfc.uclv.edu.cu/scmc
> > Participe en INFO 2012 http://www.congreso-info.cu
> > Universidad de las Ciencias Informáticas
> >
> --------------------------------------------------------------------------------------------
> >
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Problems with tutorial

Posted by Paul van Hoven <pa...@googlemail.com>.

Thanks for the answers. I'm not shure if the 'http.agent.name' is the
problem since I set it:

This is the configuration I'm using from nutch-1.3/conf/nutch-default.xml:

<!-- HTTP properties -->

<property>
  <name>http.agent.name</name>
  <value>MyFirstNutchCrawler</value>
  <description>HTTP 'User-Agent' request header. MUST NOT be empty -
  please set this to a single word uniquely related to your organization.

  NOTE: You should also check other related properties:

        http.robots.agents
        http.agent.description
        http.agent.url
        http.agent.email
        http.agent.version

  and set their values appropriately.

  </description>
</property>

As I understand the tutorial this should be correct:
turoial citation "Search for http.agent.name , and give it value
'YOURNAME Spider'"


I already had that set this way in my first email.



2011/7/10 Ing. Yusniel Hidalgo Delgado <yh...@uci.cu>:
> Paul, I think that your problem is related with 'http.agent.name' property. Please, change this property in your configuration file, such as describe the tutorial in:
>
>
>
> Good! You are almost ready to crawl. You need to give your crawler a name. This is required.
>
>    1. Open up $NUTCH_HOME/conf/nutch-default.xml file
>    2.
>
> Search for http.agent.name , and give it value 'YOURNAME Spider'
>    3.
>
> Optionally you may also set http.agent.url and http.agent.email properties.
>
> and try again.
>
> Grettings
>
> ----- Mensaje original -----
> De: "Paul van Hoven" <pa...@googlemail.com>
> Para: user@nutch.apache.org
> Enviados: Domingo, 10 de Julio 2011 7:42:47 GMT -08:00 Tijuana / Baja California
> Asunto: Problems with tutorial
>
> I'm completly new to nutch so I downloaded version 1.3 and worked
> through the beginners tutorial at
> http://wiki.apache.org/nutch/NutchTutorial. The first problem was that I
> did not find the file "conf/crawl-urlfilter.txt" so I omitted that and
> continued with launiching nutch. Therefore I created a plain text file
> in "/Users/toom/Downloads/nutch-1.3/crawled" called "urls.txt" which
> contains the following text:
>
> tom:crawled toom$ cat urls.txt
> http://nutch.apache.org/
>
> So after that I invoked nutch by calling
> tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled -dir
> /Users/toom/Downloads/nutch-1.3/sites -depth 3 -topN 50
> solrUrl is not set, indexing will be skipped...
> crawl started in: /Users/toom/Downloads/nutch-1.3/sites
> rootUrlDir = /Users/toom/Downloads/nutch-1.3/crawled
> threads = 10
> depth = 3
> solrUrl=null
> topN = 50
> Injector: starting at 2011-07-07 14:02:31
> Injector: crawlDb: /Users/toom/Downloads/nutch-1.3/sites/crawldb
> Injector: urlDir: /Users/toom/Downloads/nutch-1.3/crawled
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-07-07 14:02:35, elapsed: 00:00:03
> Generator: starting at 2011-07-07 14:02:35
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 50
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: Partitioning selected urls for politeness.
> Generator: segment:
> /Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238
> Generator: finished at 2011-07-07 14:02:39, elapsed: 00:00:04
> Fetcher: No agents listed in 'http.agent.name' property.
> Exception in thread "main" java.lang.IllegalArgumentException: Fetcher:
> No agents listed in 'http.agent.name' property.
> at
> org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166)
> at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068)
> at org.apache.nutch.crawl.Crawl.run(Crawl.java:135)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:54)
>
>
> I do not understand what happend here, maybe one of you can help me?
>
>
>
> --
>
>
>
> --------------------------------------------------------------------------------------------
> Ing. Yusniel Hidalgo Delgado
> Participe en COMPUMAT 2011 http://www.mfc.uclv.edu.cu/scmc
> Participe en INFO 2012 http://www.congreso-info.cu
> Universidad de las Ciencias Informáticas
> --------------------------------------------------------------------------------------------
>

Re: Problems with tutorial

Posted by "Ing. Yusniel Hidalgo Delgado" <yh...@uci.cu>.

Paul, I think that your problem is related with 'http.agent.name' property. Please, change this property in your configuration file, such as describe the tutorial in: 



Good! You are almost ready to crawl. You need to give your crawler a name. This is required. 

    1. Open up $NUTCH_HOME/conf/nutch-default.xml file 
    2. 

Search for http.agent.name , and give it value 'YOURNAME Spider' 
    3. 

Optionally you may also set http.agent.url and http.agent.email properties. 

and try again. 

Grettings 

----- Mensaje original ----- 
De: "Paul van Hoven" <pa...@googlemail.com> 
Para: user@nutch.apache.org 
Enviados: Domingo, 10 de Julio 2011 7:42:47 GMT -08:00 Tijuana / Baja California 
Asunto: Problems with tutorial 

I'm completly new to nutch so I downloaded version 1.3 and worked 
through the beginners tutorial at 
http://wiki.apache.org/nutch/NutchTutorial. The first problem was that I 
did not find the file "conf/crawl-urlfilter.txt" so I omitted that and 
continued with launiching nutch. Therefore I created a plain text file 
in "/Users/toom/Downloads/nutch-1.3/crawled" called "urls.txt" which 
contains the following text: 

tom:crawled toom$ cat urls.txt 
http://nutch.apache.org/ 

So after that I invoked nutch by calling 
tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled -dir 
/Users/toom/Downloads/nutch-1.3/sites -depth 3 -topN 50 
solrUrl is not set, indexing will be skipped... 
crawl started in: /Users/toom/Downloads/nutch-1.3/sites 
rootUrlDir = /Users/toom/Downloads/nutch-1.3/crawled 
threads = 10 
depth = 3 
solrUrl=null 
topN = 50 
Injector: starting at 2011-07-07 14:02:31 
Injector: crawlDb: /Users/toom/Downloads/nutch-1.3/sites/crawldb 
Injector: urlDir: /Users/toom/Downloads/nutch-1.3/crawled 
Injector: Converting injected urls to crawl db entries. 
Injector: Merging injected urls into crawl db. 
Injector: finished at 2011-07-07 14:02:35, elapsed: 00:00:03 
Generator: starting at 2011-07-07 14:02:35 
Generator: Selecting best-scoring urls due for fetch. 
Generator: filtering: true 
Generator: normalizing: true 
Generator: topN: 50 
Generator: jobtracker is 'local', generating exactly one partition. 
Generator: Partitioning selected urls for politeness. 
Generator: segment: 
/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238 
Generator: finished at 2011-07-07 14:02:39, elapsed: 00:00:04 
Fetcher: No agents listed in 'http.agent.name' property. 
Exception in thread "main" java.lang.IllegalArgumentException: Fetcher: 
No agents listed in 'http.agent.name' property. 
at 
org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166) 
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068) 
at org.apache.nutch.crawl.Crawl.run(Crawl.java:135) 
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
at org.apache.nutch.crawl.Crawl.main(Crawl.java:54) 


I do not understand what happend here, maybe one of you can help me? 



-- 



-------------------------------------------------------------------------------------------- 
Ing. Yusniel Hidalgo Delgado 
Participe en COMPUMAT 2011 http://www.mfc.uclv.edu.cu/scmc 
Participe en INFO 2012 http://www.congreso-info.cu 
Universidad de las Ciencias Informáticas 
--------------------------------------------------------------------------------------------